Monday, November 15, 2010

Chinese take out U.S. in supercomputer ranking, named world's fastest

Chinese supercomputer named world's fastest

November 14, 2010
China overtook the United States at the head of the world of supercomputing on Sunday when a survey ranked one of its machines the fastest on the planet.
Tianhe-1, meaning Milky Way, achieved a computing speed of 2,570 trillion calculations per second, earning it the number one spot in the Top 500 ( survey of supercomputers.

The Jaguar computer at a US government facility in Tennessee, which had held the top spot, was ranked second with a speed of 1,750 trillion calculations per second.

Tianhe-1 does its warp-speed "thinking" at the National Centre for in the northern port city of Tianjin -- using mostly chips designed by US companies.

Another Chinese system, the Nebulae machine at the National Supercomputing Centre in the southern city of Shenzhen, came in third.

The still dominates, with more than half of the entries in the Top 500 list, but now boasts 42 systems in the rankings, putting it ahead of , France, Germany and Britain.

It is not the first time that the United States has had its digital crown stolen by an Asian upstart. In 2002, Japan made a machine with more power than the top 20 American computers put together.

The supercomputers on the Top 500 list, which is produced twice a year, are rated based on speed of performance in a benchmark test by experts from Germany and the United States.

More information: http://www.physorg … omputer.html
(c) 2010 AFP 

Watch Video

Play Video

A Chinese supercomputer has been ranked the world's fastest machine in a list issued by US and European researchers. The move highlights China's rapid progress in the field.

The Tianhe-1A system at the National Supercomputer Center in Tianjin, is capable of sustaining computation at 2.57 quadrillions of calculations per second. As a result, the former number one system -- the US Department of Energy's, Jaguar, in Oak Ridge, is now ranked second.

The third place is also held by a Chinese system called Nebulae, and it's located, at the National Supercomputing Center in south China's city of Shenzhen.

File photo of China's world-leading supercomputer, Tianhe-1A. (Xinhua File
Related Reading:
Chinese supercomputer ranked world's fastest
WASHINGTON/SAN FRANCISCO, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in a list issued Sunday by U.S. and European researchers, highlighting China 's rapid progress in the field. Detial >>
Highlights of TOP500 supercomputers
WASHINGTON, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in the TOP500 list issued Sunday by U.S. and European researchers. The following are highlights of the list: Detail >>
Highlights of Top10 supercomputers
GTON, Nov. 14 (Xinhua) -- A Chinese supercomputer was ranked the world's fastest machine in a list issued Sunday by U.S. and European researchers. The following are highlights from the top 10 supercomputer in the list. Detail >>

Chinese take out U.S. in supercomputer ranking

The Jaguar has fallen from the top of the food chain.

When the Top 500 list of the world's most powerful supercomputers is released today, the Cray XT5 system at Oak Ridge National Laboratory and run by the University of Tennessee, called "Jaguar," will drop to No. 2 after a year of eating the lunch of every other supercomputer in the world. In its place will stand Tianhe-1A, a system built by China's National University of Defense Technology, located at the National Supercomputing Center in Tianjin.

Tianhe-1A achieved a performance level of 2.67 petaflop/s (quadrillions of calculations per second). Jaguar achieved 1.75 petaflop/s. Third place went to another Chinese-built system, called Nebulae, which achieved 1.27 petaflop/s.

And while the news of China's achievement is not exactly a surprise, the supercomputing community in the U.S. is looking at it two ways: as both as an assurance that U.S. software and components are still elite in their field, and a wake-up call that the country's prestige in high-performance computing is not a given.

"This is what everybody expected. What the Chinese have done is they're exploiting the power of GPUs (graphic processing unit) which are...awfully close to being uniquely suited to this particular benchmark," said Bill Gropp, computer science professor at the University of Illinois Urbana-Champagne, and co-principal investigator of the Blue Waters project, another supercomputer in the works.

The benchmark he's speaking of is the Linpack, which tests the performance of a system for solving a dense system of linear equations. It's measured in calculations or floating point operations per second, hence flop/s. Not everyone in this field agrees it's the best possible way to compare machines, but it is one way.

By using GPUs to accelerate the performance of the Tianhe-1A, the machine can achieve more floating point operations per second.

"The way most of us look at the Chinese machine, is it's very good at this particular problem (the Linpack benchmark), but not problems the user community is interested in," said Gropp.

For those worried that this is a blow to the United States' leadership in supercomputing, it's actually not a huge cause for alarm if you consider the provenance of the pieces of the Chinese system. Tianhe-1A is a Linux computer built from components from Intel and Nvidia, points out Charlie Zender, professor of Earth Systems Science at the University of California at Irvine.

A timeline of supercomputing speed. (Click to enlarge.)
A timeline of supercomputing speed. (Click to enlarge.)
(Credit: AMD)
"So we find ourselves admiring an achievement that certainly couldn't have been done without the know-how of Silicon Valley...and an operating system designed mostly by the United States and Europe," Zender said. "It's a time for reflection that we are now at a stage where a country that's motivated and has the resources can take off-the-shelf components and assemble the world's fastest supercomputer."

Supercomputers will likely get faster every year, points out Jeremy Smith, director of the Center for Molecular Biophysics at the University of Tennessee, so China's rise to the top this month isn't the end of the story. The list will likely be reordered again in June, when the next edition of the Top500 is released.

"What you find historically with these supercomputers is they become the normal machines five or 10 years later that everybody uses," said Smith, who oversees some projects run on Jaguar. "The Jaguar machine that we're so amazed at right now, it could be every university or company has one" eventually.

And of course these high-performance computer systems aren't just made to race each other, most scientists in the field would argue. They're made to solve complex problems, with eventual real-world consequences like climate change and alternative fuel production.

Smith argues that research like what's being done on Jaguar to solve the problem of superconductivity at high temperatures couldn't necessarily be done on Tianhe-1A effectively because it requires very efficient computing and coming up with the software on a computer to do that well is difficult.

But what China has accomplished is still important for supercomputing, argues Gropp, who called the number of flop/s Tianhe-1A achieved "remarkable."

"I don't want to downplay what they've done," he said. "It's like pooh-poohing the original Toyota. The first Toyota was a pile of junk. But a few years later they were eating our lunch."

It's not the first time that a non-U.S. machine has topped the rankings--the Japanese NEC Earth Simulator did it in 2004. The U.S. of course bounced back, and as of today has 275, or more than half of the systems, on the Top 500 list. China is next with 42 systems, and Japan and Germany are tied with 26 each. Still, there is concern that China's focused concentration of resources on supercomputing is fomenting a threat to the U.S.' long-term dominance there. But just trying to score the highest on the Linpack benchmark--something that any group of researchers with enough money could do fairly easily--is short-sighted.

"What we should be focusing on is not losing our leadership and being able to apply computing to a broad range of science and engineering problems," said Gropp, who is also deputy director of research at UI's Institute for Advanced Computing Applications and Technologies.

The Presidential Council of Advisors on Science and Technology (PCAST) is currently working on a report that addresses this exact topic, and didn't have a comment when contacted. Recently PCAST did release a draft of a document that calls for more funding for scientific computing very soon after news of Tianhe-1A's speed began to spread. And President Barack Obama weighed in briefly on the topic in a speech two weeks ago, calling for increased science funding specifically for high-performance computing.

But it's not as if the supercomputing community in the U.S has been sitting still while China sneaked up behind them. There are other projects in the works at U.S. labs that are planning on blowing Jaguar and Tianhe-1A out of the water in terms of speed.

Currently the University of Illinois Urbana-Champagne and the National Science Foundation is building Blue Waters, a supercomputer that researchers say will be the fastest in the world when it is turned on sometime next year.

The Department of Energy, which owns Oak Ridge's Jaguar supercomputer, is already looking at moving from the current peta-scale computing (a quadrillion floating point operations per second) to exa-scale computing (a quintillion floating point operations per second), a speed a thousand times faster than Jaguar is currently capable of processing at. It's a goal that's still a ways out there, but the work is under way.

"To get there in the next five to 10 years, to get to 10 million cores in one room, is a major technical challenge," noted University of Tennessee's Jeremy Smith. "It's going to be fundamentally different than before. It's a hardware problem, and getting the software working is a major challenge indeed."

For more statistics on the systems in the Top500 list, please see

Erica Ogg is a CNET News reporter who covers Apple, HP, Dell, and other PC makers, as well as the consumer electronics industry. She's also one of the hosts of CNET News' Daily Podcast. In her non-work life, she's a history geek, a loyal Dodgers fan, and a mac-and-cheese connoisseur. E-mail Erica.

Top 500 supers: China rides GPUs to world domination

The People's Republic of Petaflops

SC10 If the June edition of the bi-annual ranking of the Top 500 supercomputers in the world represented the dawning of the GPU co-processor as a key component in high performance computing, then the November list is breakfast time. The super centers of the world are smacking their lips for some flop-jacks with OpenCL syrup and some x64 bacon on the side.

China has the most voracious appetite for GPU co-processors, and as expected two weeks ago when the Tianhe-1A super was booted up for the first time, this hybrid CPU-GPU machine installed at the National Supercomputer Center in Tianjin has taken the top spot on the Top 500 list with a comfortable margin. Tianhe-1A's final rating on the Linpack Fortran matrix math benchmark test is 4.7 petaflops of peak theoretical performance spread across its CPUs and GPUs (with about about 70 per cent of that coming from the GPUs) and 2.56 petaflops of sustained performance on the Linpack test.

The Tianhe-1A machine is comprised of 7,168 servers, each equipped with two sockets using Intel's X5670 processors running at 2.93 GHz and one Nvidia Tesla M2050 fanless GPU co-processor. The resulting machine spans 112 racks, and it would make a hell of a box on which to play Crysis.

While 47 per cent of the floating-point oomph in Tianhe-1A disappears into the void where all missed clock cycles go (it's also where missing socks from the dryer cavort), the GPU's flops are relatively inexpensive and the overall machine should offer excellent bang for the buck - provided workloads can scale across the ceepie-geepie of course. The Tianhe-1A super uses a proprietary interconnect called Arch, which was developed by the Chinese government. The Arch switch links the server nodes together using optical-electric cables in a hybrid fat tree configuration and has a bi-directional bandwidth of 160 Gb/sec, a latency for a node hop of 1.57 microseconds, and an aggregate bandwidth of more than 61 Tb/sec.

China's Tianahe-1A SupercomputerThe Tianhe-1A GPU-GPU hybrid super

This is not the first ceepie-geepie machine that the National Supercomputer Center has put together. A year ago, the Tianhe-1 machine broke onto the Top 500 list using Intel Xeon chips and Advanced Micro Devices Radeon HD 4870 GPUs (no Tesla GPUs, but actual graphics cards). This initial "Milky Way" box (that's what "Tianhe" translates to in English) had 71,680 cores and had a peak theoretical performance of 1.2 petaflops and a sustained performance of 563.1 teraflops. The efficiency of this cluster was 53 per cent, sustained over peak performance.

Jaguar dethroned

The "Jaguar" XT5 system at the US Department of Energy's Oak Ridge National Laboratory was knocked out of the top spot by Tianhe-1A, which is what happens when a cat stands still in the GPU era of HPC. The Jaguar machine has 224,162 Opteron cores spinning at 2.6 GHz and delivers 1.76 petaflops of performance on the Linpack test. This Cray machine links Opteron blade servers using its SeaStar2+ interconnect, which has been superseded by the new "Gemini" XE interconnect in the XE6 supers that started rolling out this summer.

If Oak Ridge moved to twelve-core Opteron 6100 processors and the XE6 interconnect, it could have doubled the performance of Jaguar and held into the Top 500 heavyweight title. One other thing to note: The Jaguar machine is 75.5 per cent efficient on the Linpack benchmark, a lot better than the Tianhe-1A ceepie-geepie.

The "Nebulae" ceepie-geepie built from six-core Intel Xeon 5650 processors and Nvidia M2050 GPUs that made its debut on the June 2010 Top 500 list got knocked down from number 2 to number 3 on the list. The Nebulae machine, which is a blade server design from Chinese server maker Dawning, is installed at the National Supercomputing Center in Shenzhen. It is rated at 1.27 sustained petaflops at 43 per cent efficiency against peak theoretical performance.

Number four on the list is also a ceepie-geepie, it is the upgraded Tsubame 2 machine at the Tokyo Institute of Technology. (That's shortened to TiTech rather than TIT, which would be where you'd expect a machine called Milky Way to be located. But we digress). The Tsubame 2 machine is built from Hewlett-Packard's SL390s G7 cookie sheet servers, which made their debut in early October. TiTech announced the Tsubame 2 deal back in May, and this machine includes over 1,400 of these HP servers, each with three M2050 GPUs from Nvidia.

The Tsubame 2 machine has 73,278 cores and is rated at 2.29 peak petaflops and delivered 1.19 petaflops of sustained performance on the Linpack test. That's a 52 percent efficiency, about what the other ceepie-geepies are getting. By the way, the prior Tsubame 1 machine was based on x64 servers from Sun Microsystems, with floating point accelerators from Clearspeed in only some of the nodes. And one more thing: Tsubame 2 runs both Linux and Windows, and according to the Top 500 rankers, both operating systems offer nearly equivalent performance.

In the Hopper

The fifth most-powerful super in the world based on the Linpack tests (at least the ones we know about) is a brand new box called Hopper. Installed at the US DOE's National Energy Research Scientific Computing center, Hopper is a Cray XE6 super using that new Gemini interconnect and twelve-core Opteron 6100 processors - no fancy schmancy GPU co-processors. (Well, at least not yet, anyway.) Hopper has 153,408 cores spinning at 2.1 GHz and delivers 1.05 petaflops of sustained performance with an efficiency of 82 per cent.

If it is not yet obvious, there is a bottleneck in getting parallel supercomputer nodes to talk through their networking stacks running on their x64 processors and out over the PCI-Express 2.0 bus. If Nvidia or AMD want to do something useful, embedding a baby x64 processor inside of a GPU co-processor along with a switchable 10 Gigabit Ethernet or 40 Gb/sec InfiniBand port would make a very interesting baby server node. Throw in cache coherence between the x64 and GPU processors and maybe getting to 50 petaflops won't seem like such a big deal.

The Bull Tera-100 super at the Commissariat a l'Energie Atomique in France, is based on Intel's Xeon 7500 high-end processors and Bull's bullx supercomputer blades and ranks sixth in the world. The machine uses QDR InfiniBand to lash the nodes together, and is rated at 1.05 petaflops. This machine does not have GPUs in it from either AMD or Nvidia, and neither does number eight, the Kraken XT5 super from Cray that is owned by the University of Tennessee and which is operated by DOE's Oak Ridge National Laboratory. Kraken delivers 831.7 teraflops of sustained Linpack performance, unchanged from when it came onto the list a year ago.

Number seven on the list, the Roadrunner Opteron blade system at Los Alamos National Laboratory (another DOE site) does use accelerators, but they are IBM's now defunct Cell co-processors, which are based on IBM's Power cores and which have eight vector math units per chip. While the Roadrunner machine demonstrated the viability of co-processors to push up to the petaflops. But Roadrunner is stalled at 1.04 petaflops, is probably not going to be upgraded, and is therefore uninteresting even if it will do lots of good work for the DOE. (If you consider designing nuclear weapons good work, of course.)

Number nine on the list is the BlueGene/P super, named Jugene, built by IBM for the Forschungszentrum Juelich in Germany, which debuted at number three at 825.5 teraflops on the June 2009 list and hasn't changed since then. Rounding out the top ten on the Top 500 list is the Cielo Cray XE6 at Los Alamos, a new box that is rated at 816.6 teraflops of sustained Linpack performance.

GPU is my co-pilot

On the November 2010 list, there are 28 HPC systems that use GPU accelerators, and the researchers who put together the Top 500 for the 36th time - Erich Strohmaier and Horst Simon, computer scientists at Lawrence Berkeley National Laboratory, Jack Dongarra of the University of Tennessee, and Hans Meuer of the University of Manheim - consider IBM's Cell chip a GPU co-processor. On this list, there are sixteen machines that use Cell chips to goose their floating point oomph, with ten using Nvidia GPUs and two using AMD Radeon graphics cards.

The Linpack Fortran matrix benchmark was created by Dongarra and colleagues Jim Bunch, Cleve Moler, and Pete Stewart back in the 1970s to gauge the relative number-crunching performance of computers and is the touchstone for ranking supercomputers.

There are three questions that will be on the minds of people at the SC10 supercomputing conference in New Orleans this week. The first is: Can the efficiency of ceepie-geepie supers be improved? The second will be: Does it matter if it can't? And the third will be: At what point in our future will GPUs be standard components in parallel supers, just like parallel architectures now dominate supercomputing and have largely displaced vector and federated RISC machines?

To get onto the Top 500 list this time around, a machine had to come in at 31.1 teraflops, up from 24.7 teraflops only six months ago. This used to sound like a lot of math power. But these days, it really doesn't. A cluster with 120 of the current Nvidia Tesla GPUs with only half of the flops coming through where the CUDA meets the Fortran compiler will get you on the list. The growth is linear, then on the June list next year, you will need something like 40 teraflops or about 150 of the current generation of GPUs. And with GPU performance on the upswing, maybe the number of GPUs in a ceepie-geepie to get onto the Top 500 list might not require so many GPUs.

Core counting

As has been the case for many years, processors from Intel absolutely dominate the current Top 500 list, with 398 machines (79.6 per cent of the boxes on the list). Of these, 56 machines are using the Xeon 5600 processors, one is still based on 32-bit Xeons, one is based on Core desktop chips, five are based on Itanium processors, and three are based on the new high-end Xeon 7500s.
In the November 2010 rankings, there are 57 machines using AMD's Opteron processors, while there are 40 machines using one or another variant of IBM's Power processors. While the machine counts are low for these two families of chips, the core counts sure are not because of the monster systems that are based on Power and Opteron chips.

There are 1.41 million Power cores on the Top 500 list this time around, which was 21.5 per cent of the total 6.53 million cores inside of the 500 boxes and which represented 7.35 aggregate petaflops or 11.2 per cent of the total 65.8 petaflops on the list. There are 1.54 million Opteron cores (23.5 per cent of cores) on the aggregate list for 14.2 peak petaflops (21.6 per cent of total flops)

None of these core counts include the GPU core counts, which is something that the Top 500 people should reconsider, even though in all cases the flops are counted.

Across all processor architectures, there are 365 machines using quad-core processors and 19 already are using CPUs with six or more processors per socket. It is safe to say that the HPC market will eat whatever number of cores the chip makers can bake.

There are two Sparc-based supers on the current Top 500 list and the Earth Simulator super built by NEC for the Japanese government is still barely on the list (and will probably be knocked off on the next list in June 2011).

Xeon rides the wave

Having said all of that, the 391 machines using Intel's Xeon processors represent the belly of the Top 500 list. With a total of 3.5 million cores (53.5 per cent of the total core count on the list) and 43.2 petaflops of number-crunching oomph (65.8 per cent of total flops), the Xeon is the champion of the top-end HPC world. Of course, the Xeon CPUs is getting credit for flops that are being done by GPUs in many cases.

In terms of core count, there are 289 machines that have between 4,096 and 8,192 cores, and 96 machines that have from 8,192 to 16,384 cores. You need more than 1,000 cores to make the list, and there are only two boxes that have fewer than 2,048 cores and only 61 have between 2,048 and 4,096 cores. The system count drops off pretty fast above this core count, with 52 machines having more than 16,384 cores.

The Top 500 list is pretty evenly split between Ethernet, with 226 machine, and InfiniBand of various speeds, at 226 machines. The remaining machines are a smattering of Myrinet, Quadrics, Silicon Graphics NUMAlink, and Cray SeaStar and Gemini interconnects. There were seven machines on the list using 10 Gigabit Ethernet for lashing nodes in parallel supers together, and 29 used 40 Gb/sec (QDR) InfiniBand

By operating system, Linux in its various incarnations dominates the list, with 450 out of 500 machines running it. Unix accounted for 20 machines, Windows five machines, and the remainder were running mixed operating systems. If Microsoft wanted to catch a new wave, it would work to get the best possible GPU runtime and programming tools to market. Just tweaking the MPI stack in Windows HPC Server 2008 R2 to get rough parity with Linux is not going to make a dent at the big supercomputer centers of the world. Then again, Microsoft is trying to move into the HPC arena from the technical workstation up, and it has other advantages that Linux platforms do not in this regard.

IBM has the most systems on the November 2010 Top 500 list, with 199 boxes (39.8 per cent of the total) and 17.1 petaflops (26 per cent of the total flops on the list) of aggregate peak performance on the Linpack test. Big Blue is followed up by Hewlett-Packard, with 158 machines and 11.7 petaflops, which works out to 31.6 per cent of machines and 17.8 per cent of total flops. Cray has only 29 machines on the current super ranking, which is 5.8 per cent of machines but 16.3 per cent of peak floating point power. Silicon Graphics has 22 machines on the list, which is 4.4 per cent of boxes and 4.5 per cent of aggregate flops. Dell has 20 boxes on the list and its hand in a few mixed boxes as well, and Oracle, Fujitsu, NEC, and Hitachi all have a handful of machines, too.

Supercomputing is inherently political (especially so given where the funding for the upper echelon of the Top 500 list comes from), and countries most certainly measure each other up in their HPC centers. The United States leads with machine count, at 275 machines with a combined 31.5 petaflops, and China has jumped well ahead of Japan to become the solid number two, with 42 machines and 12.8 petaflops in total across those machines. Japan has 26 machines that add up to 4.6 petaflops, and Germany's 26 machines have an aggregate of 3.5 petaflops. The United Kingdom is close behind with 24 machines, for a total of 2.2 petaflops, followed by Russia with 11 machines and 1.1 petaflops. ®

Newscribe : get free news in real time

No comments: