Chinese supercomputer named world's fastest
November 14, 2010 China overtook the United States at the head of the world of supercomputing on Sunday when a survey ranked one of its machines the fastest on the planet.
Tianhe-1, meaning Milky Way, achieved a computing speed of 2,570 trillion calculations per second, earning it the number one spot in the Top 500 (www.top500.org) survey of supercomputers.
The Jaguar computer at a US government facility in Tennessee, which had held the top spot, was ranked second with a speed of 1,750 trillion calculations per second.
Tianhe-1 does its warp-speed "thinking" at the National Centre for Supercomputing in the northern port city of Tianjin -- using mostly chips designed by US companies.
Another Chinese system, the Nebulae machine at the National Supercomputing Centre in the southern city of Shenzhen, came in third.
The United States still dominates, with more than half of the entries in the Top 500 list, but China now boasts 42 systems in the rankings, putting it ahead of Japan, France, Germany and Britain.
It is not the first time that the United States has had its digital crown stolen by an Asian upstart. In 2002, Japan made a machine with more power than the top 20 American computers put together.
The supercomputers on the Top 500 list, which is produced twice a year, are rated based on speed of performance in a benchmark test by experts from Germany and the United States.
More information: http://www.physorg … omputer.htmlThe Jaguar computer at a US government facility in Tennessee, which had held the top spot, was ranked second with a speed of 1,750 trillion calculations per second.
Tianhe-1 does its warp-speed "thinking" at the National Centre for Supercomputing in the northern port city of Tianjin -- using mostly chips designed by US companies.
Another Chinese system, the Nebulae machine at the National Supercomputing Centre in the southern city of Shenzhen, came in third.
The United States still dominates, with more than half of the entries in the Top 500 list, but China now boasts 42 systems in the rankings, putting it ahead of Japan, France, Germany and Britain.
It is not the first time that the United States has had its digital crown stolen by an Asian upstart. In 2002, Japan made a machine with more power than the top 20 American computers put together.
The supercomputers on the Top 500 list, which is produced twice a year, are rated based on speed of performance in a benchmark test by experts from Germany and the United States.
(c) 2010 AFP
Watch Video
Play VideoA Chinese supercomputer has been ranked the world's fastest machine in a list issued by US and European researchers. The move highlights China's rapid progress in the field.
The Tianhe-1A system at the National Supercomputer Center in Tianjin, is capable of sustaining computation at 2.57 quadrillions of calculations per second. As a result, the former number one system -- the US Department of Energy's, Jaguar, in Oak Ridge, is now ranked second.
The third place is also held by a Chinese system called Nebulae, and it's located, at the National Supercomputing Center in south China's city of Shenzhen.
File photo of China's world-leading supercomputer, Tianhe-1A. (Xinhua File Photo) |
Chinese supercomputer ranked world's fastest
WASHINGTON/SAN FRANCISCO, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in a list issued Sunday by U.S. and European researchers, highlighting China 's rapid progress in the field. Detial >>
Highlights of TOP500 supercomputers
WASHINGTON, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in the TOP500 list issued Sunday by U.S. and European researchers. The following are highlights of the list: Detail >>
Highlights of Top10 supercomputers
GTON, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in a list issued Sunday by U.S. and European researchers. The following are highlights from the top 10 supercomputer in the list. Detail >>
Chinese take out U.S. in supercomputer ranking
Top 500 supers: China rides GPUs to world domination
The People's Republic of Petaflops
SC10 If the June edition of the bi-annual ranking of the Top 500 supercomputers in the world represented the dawning of the GPU co-processor as a key component in high performance computing, then the November list is breakfast time. The super centers of the world are smacking their lips for some flop-jacks with OpenCL syrup and some x64 bacon on the side.
China has the most voracious appetite for GPU co-processors, and as expected two weeks ago when the Tianhe-1A super was booted up for the first time, this hybrid CPU-GPU machine installed at the National Supercomputer Center in Tianjin has taken the top spot on the Top 500 list with a comfortable margin. Tianhe-1A's final rating on the Linpack Fortran matrix math benchmark test is 4.7 petaflops of peak theoretical performance spread across its CPUs and GPUs (with about about 70 per cent of that coming from the GPUs) and 2.56 petaflops of sustained performance on the Linpack test.
The Tianhe-1A machine is comprised of 7,168 servers, each equipped with two sockets using Intel's X5670 processors running at 2.93 GHz and one Nvidia Tesla M2050 fanless GPU co-processor. The resulting machine spans 112 racks, and it would make a hell of a box on which to play Crysis.
While 47 per cent of the floating-point oomph in Tianhe-1A disappears into the void where all missed clock cycles go (it's also where missing socks from the dryer cavort), the GPU's flops are relatively inexpensive and the overall machine should offer excellent bang for the buck - provided workloads can scale across the ceepie-geepie of course. The Tianhe-1A super uses a proprietary interconnect called Arch, which was developed by the Chinese government. The Arch switch links the server nodes together using optical-electric cables in a hybrid fat tree configuration and has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.
The Tianhe-1A GPU-GPU hybrid super
This is not the first ceepie-geepie machine that the National Supercomputer Center has put together. A year ago, the Tianhe-1 machine broke onto the Top 500 list using Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs (no Tesla GPUs, but actual graphics cards). This initial "Milky Way" box (that's what "Tianhe" translates to in English) had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. The efficiency of this cluster was 53 per cent, sustained over peak performance.Jaguar dethroned
The "Jaguar" XT5 system at the US Department of Energy's Oak Ridge National Laboratory was knocked out of the top spot by Tianhe-1A, which is what happens when a cat stands still in the GPU era of HPC. The Jaguar machine has 224,162 Opteron cores spinning at 2.6 GHz and delivers 1.76 petaflops of performance on the Linpack test. This Cray machine links Opteron blade servers using its SeaStar2+ interconnect, which has been superseded by the new "Gemini" XE interconnect in the XE6 supers that started rolling out this summer.Number four on the list is also a ceepie-geepie, it is the upgraded Tsubame 2 machine at the Tokyo Institute of Technology. (That's shortened to TiTech rather than TIT, which would be where you'd expect a machine called Milky Way to be located. But we digress). The Tsubame 2 machine is built from Hewlett-Packard's SL390s G7 cookie sheet servers, which made their debut in early October. TiTech announced the Tsubame 2 deal back in May, and this machine includes over 1,400 of these HP servers, each with three M2050 GPUs from Nvidia.
The Tsubame 2 machine has 73,278 cores and is rated at 2.29 peak petaflops and delivered 1.19 petaflops of sustained performance on the Linpack test. That's a 52 percent efficiency, about what the other ceepie-geepies are getting. By the way, the prior Tsubame 1 machine was based on x64 servers from Sun Microsystems, with floating point accelerators from Clearspeed in only some of the nodes. And one more thing: Tsubame 2 runs both Linux and Windows, and according to the Top 500 rankers, both operating systems offer nearly equivalent performance.
In the Hopper
The fifth most-powerful super in the world based on the Linpack tests (at least the ones we know about) is a brand new box called Hopper. Installed at the US DOE's National Energy Research Scientific Computing center, Hopper is a Cray XE6 super using that new Gemini interconnect and twelve-core Opteron 6100 processors - no fancy schmancy GPU co-processors. (Well, at least not yet, anyway.) Hopper has 153,408 cores spinning at 2.1 GHz and delivers 1.05 petaflops of sustained performance with an efficiency of 82 per cent.Number seven on the list, the Roadrunner Opteron blade system at Los Alamos National Laboratory (another DOE site) does use accelerators, but they are IBM's now defunct Cell co-processors, which are based on IBM's Power cores and which have eight vector math units per chip. While the Roadrunner machine demonstrated the viability of co-processors to push up to the petaflops. But Roadrunner is stalled at 1.04 petaflops, is probably not going to be upgraded, and is therefore uninteresting even if it will do lots of good work for the DOE. (If you consider designing nuclear weapons good work, of course.)
Number nine on the list is the BlueGene/P super, named Jugene, built by IBM for the Forschungszentrum Juelich in Germany, which debuted at number three at 825.5 teraflops on the June 2009 list and hasn't changed since then. Rounding out the top ten on the Top 500 list is the Cielo Cray XE6 at Los Alamos, a new box that is rated at 816.6 teraflops of sustained Linpack performance.
GPU is my co-pilot
On the November 2010 list, there are 28 HPC systems that use GPU accelerators, and the researchers who put together the Top 500 for the 36th time - Erich Strohmaier and Horst Simon, computer scientists at Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim - consider IBM's Cell chip a GPU co-processor. On this list, there are sixteen machines that use Cell chips to goose their floating point oomph, with ten using Nvidia GPUs and two using AMD Radeon graphics cards.To get onto the Top 500 list this time around, a machine had to come in at 31.1 teraflops, up from 24.7 teraflops only six months ago. This used to sound like a lot of math power. But these days, it really doesn't. A cluster with 120 of the current Nvidia Tesla GPUs with only half of the flops coming through where the CUDA meets the Fortran compiler will get you on the list. The growth is linear, then on the June list next year, you will need something like 40 teraflops or about 150 of the current generation of GPUs. And with GPU performance on the upswing, maybe the number of GPUs in a ceepie-geepie to get onto the Top 500 list might not require so many GPUs.
Core counting
As has been the case for many years, processors from Intel absolutely dominate the current Top 500 list, with 398 machines (79.6 per cent of the boxes on the list). Of these, 56 machines are using the Xeon 5600 processors, one is still based on 32-bit Xeons, one is based on Core desktop chips, five are based on Itanium processors, and three are based on the new high-end Xeon 7500s.In the November 2010 rankings, there are 57 machines using AMD's Opteron processors, while there are 40 machines using one or another variant of IBM's Power processors. While the machine counts are low for these two families of chips, the core counts sure are not because of the monster systems that are based on Power and Opteron chips.
There are 1.41 million Power cores on the Top 500 list this time around, which was 21.5 per cent of the total 6.53 million cores inside of the 500 boxes and which represented 7.35 aggregate petaflops or 11.2 per cent of the total 65.8 petaflops on the list. There are 1.54 million Opteron cores (23.5 per cent of cores) on the aggregate list for 14.2 peak petaflops (21.6 per cent of total flops)
None of these core counts include the GPU core counts, which is something that the Top 500 people should reconsider, even though in all cases the flops are counted.
Across all processor architectures, there are 365 machines using quad-core processors and 19 already are using CPUs with six or more processors per socket. It is safe to say that the HPC market will eat whatever number of cores the chip makers can bake.
There are two Sparc-based supers on the current Top 500 list and the Earth Simulator super built by NEC for the Japanese government is still barely on the list (and will probably be knocked off on the next list in June 2011).
Xeon rides the wave
Having said all of that, the 391 machines using Intel's Xeon processors represent the belly of the Top 500 list. With a total of 3.5 million cores (53.5 per cent of the total core count on the list) and 43.2 petaflops of number-crunching oomph (65.8 per cent of total flops), the Xeon is the champion of the top-end HPC world. Of course, the Xeon CPUs is getting credit for flops that are being done by GPUs in many cases.By operating system, Linux in its various incarnations dominates the list, with 450 out of 500 machines running it. Unix accounted for 20 machines, Windows five machines, and the remainder were running mixed operating systems. If Microsoft wanted to catch a new wave, it would work to get the best possible GPU runtime and programming tools to market. Just tweaking the MPI stack in Windows HPC Server 2008 R2 to get rough parity with Linux is not going to make a dent at the big supercomputer centers of the world. Then again, Microsoft is trying to move into the HPC arena from the technical workstation up, and it has other advantages that Linux platforms do not in this regard.
IBM has the most systems on the November 2010 Top 500 list, with 199 boxes (39.8 per cent of the total) and 17.1 petaflops (26 per cent of the total flops on the list) of aggregate peak performance on the Linpack test. Big Blue is followed up by Hewlett-Packard, with 158 machines and 11.7 petaflops, which works out to 31.6 per cent of machines and 17.8 per cent of total flops. Cray has only 29 machines on the current super ranking, which is 5.8 per cent of machines but 16.3 per cent of peak floating point power. Silicon Graphics has 22 machines on the list, which is 4.4 per cent of boxes and 4.5 per cent of aggregate flops. Dell has 20 boxes on the list and its hand in a few mixed boxes as well, and Oracle, Fujitsu, NEC, and Hitachi all have a handful of machines, too.
Supercomputing is inherently political (especially so given where the funding for the upper echelon of the Top 500 list comes from), and countries most certainly measure each other up in their HPC centers. The United States leads with machine count, at 275 machines with a combined 31.5 petaflops, and China has jumped well ahead of Japan to become the solid number two, with 42 machines and 12.8 petaflops in total across those machines. Japan has 26 machines that add up to 4.6 petaflops, and Germany's 26 machines have an aggregate of 3.5 petaflops. The United Kingdom is close behind with 24 machines, for a total of 2.2 petaflops, followed by Russia with 11 machines and 1.1 petaflops. ®