vpsee.com » Blog Archive » Google 数据中心和服务器的一些信息

[hi@vpsee.com]$ su root -

Google 数据中心和服务器的一些信息

2009年10月22日 | 标签: architecture, server | 作者：vpsee

Google 工程师 Jeff Dean 在刚过去的 LADIS 2009 Workshop 上做了一个 keynote talk：Designs, Lessons and Advice from Building Large Distributed Systems，并且透露了 Google 正在进行一个叫做 “Spanner” 的计划，设计目标是能扩展到1000万台服务器。Google 按照 Servers -> Racks -> Clusters -> Data centers 这样的顺序把服务器从机柜扩展到多个数据中心。VPSee 最近在部署 SunRay 和虚拟化，需要采购更多的服务器，基本选定就 SUN 了，因为有买一台送一台的优惠，剩下的问题就是每台服务器配置多大的处理器、内存和硬盘能充分发挥服务器的能力，达到最佳性价比，这篇 pdf 提到了每台 Google 服务器的配置，CNET 的这篇报道还提供了 Google 服务器的照片。

Google 服务器：

The Google server was 3.5 inches thick–2U, or 2 rack units, in data center parlance. It had two processors, two hard drives, and eight memory slots mounted on a motherboard built by Gigabyte. Google uses x86 processors from both AMD and Intel.

google server

有意思的是 Google 服务器用到了电池，原因是比 UPS 要便宜的多：

“This is much cheaper than huge centralized UPS,” he said. “Therefore no wasted capacity.”

google server

Server: DRAM: 16GB, 100ns, 20GB/s, Disk: 2TB, 10ms, 200MB/s
Rack (80 servers): DRAM: 1TB, 300us, 100MB/s, Disk: 160TB, 11ms, 100MB/s
Clusters (30+ racks): DRAM: 30TB, 500us, 10MB/s, Disk: 4.80PB, 12ms, 10MB/s

一些经验和数据：

1-5% of your disk drives will die
Servers will crash at least twice (2-4% failure rate)

一些每个人都应该知道的数据：

L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns
Mutex lock/unlock 25 ns
Main memory reference 100 ns
Compress 1K bytes with Zippy 3,000 ns
Send 2K bytes over 1 Gbps network 20,000 ns
Read 1 MB sequentially from memory 250,000 ns
Round trip within same datacenter 500,000 ns
Disk seek 10,000,000 ns
Read 1 MB sequentially from disk 20,000,000 ns
Send packet CA->Netherlands->CA 150,000,000 ns

一个新 cluster 通常第一年会发生的事情：

~0.5 overheating (power down most machines in <5 mins, ~1-2 days to recover) ~1 PDU failure (~500-1000 machines suddenly disappear, ~6 hours to come back) ~1 rack-move (plenty of warning, ~500-1000 machines powered down, ~6 hours) ~1 network rewiring (rolling ~5% of machines down over 2-day span) ~20 rack failures (40-80 machines instantly disappear, 1-6 hours to get back) ~5 racks go wonky (40-80 machines see 50% packetloss) ~8 network maintenances (4 might cause ~30-minute random connectivity losses) ~12 router reloads (takes out DNS and external vips for a couple minutes) ~3 router failures (have to immediately pull traffic for an hour) ~dozens of minor 30-second blips for dns ~1000 individual machine failures ~thousands of hard drive failures slow disks, bad memory, misconfigured machines, flaky machines, etc.

发表评论(2 Comments) 分类：Site Reliability | Performance

评论 (2 Comments)

unixhater.com - October 22nd, 2009 11:28 am

不知google服务器两个硬盘组了raid不
gax - October 23rd, 2009 7:30 pm

还是百度疯狂啊。全部换ssd

发表评论

随机

Michael Li: 2023年了，我是刚入门的新手，没想到十年前的帖子竟然还有这么多人在回复没想到mac vs win的话题这么久以前就有我已经买了新款mac哈哈哈哈首页上作者的帖子似乎停在了2015年，是不再更新了吗？
lqs: 看到进度条，震惊了，往下翻，果然评论更新到最近。。。话说这种评论系统是啥啊？怎么防止垃圾信息？
yu: 我还以为是个老贴，没想到还有这么新的消息！！！macmini m1到手后感觉不太习惯，目前就感觉那个终端确实比较好用，其他的感觉好像没太大区别，还有就是因为工作是做前端三维方面的开发，这块感觉还是有高端显卡的游戏本或台式机性价比比较高，macmini m1如果渲染一些bim模型或者大场景会卡顿，目前macmini被我当服务器在用，跑一些简单的服务，感觉没发挥应有价值，各位大佬有什么建议或者教程，欢迎指导一下我
mailer3721: 2023年了，这个帖子还在持续增长看着自己2012年、2014年的回帖很感慨这几年什么样的系统都用过，但目前要换电脑时还是会纠结买ThinkPad好还是买MacBook pro，可谓此恨绵绵无绝期，细细分析，感觉其实每个系统都有让你别扭的不足，关键还是看自己重度使用的场景，投奔什么阵营。
firfor: 我把这些整理到一个mac初始化脚本了，不过主要是java后端开发人员使用。如果有需要的直接拿走吧。代码： https://github.com/jianhong-li/macbookpro-env-init
庆丰大帝: 我已经连任三届了你们还在争吵哪个好 ^ ^
Lorre: dwm我的超人！
lycnsc: 希望继续，不要停话说2022年的 Macbook Air M2 24GB memory 的很爽
russel: 回viktor。早期的帖子反而大家都很和气，越临近现在反而更容易争吵，viktor的评论内容不是反问就是拿自己来比较，踩别人一头，完全看不出来有任何指导性的建议，更像是居高临下的教训，即使你有很高的水平，就这份自大也会让人恶心。
Anonymous: 我也觉吖，XP系统比后面的WIN7~wiN10的字体看起来都舒服。

友链

LinuxTOY

Google 数据中心和服务器的一些信息

评论 (2 Comments)

发表评论

分类

随机

评论

友链

关于