Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Authors > Matt_K

The Server Room Blog

2 Posts authored by: Matt_K
0

Nehalem-EX: Big Memory for Big Science

I was at SuperComputing’09 last week in Portland, Oregon. I talked with some brilliant people, and saw some fantastic stuff.

It was good timing on my part because last week Intel also announced that it would offer a 6-core, frequency-optimized version of its Nehalem-EX product due out next year. This part is intended for use in tackling some of the types of high performance computing (HPC) workloads prominently displayed at SC’09.

Most people know that the majority of HPC workloads today are based on clusters of relatively small-memory, 2-socket systems. That is because most HPC workloads may be broken into smaller, discrete units of work that can be efficiently processed using such clusters. For these workloads the primary hardware capability selection criterion is typically a balance of both memory bandwidth and compute FLOPs (floating point operations per second).

But there are other types of HPC workloads. Specifically, those that deal with very large datasets (some as large as a terabyte) or those that have to deal with non-sequential memory access. This means the workloads simply aren’t easily divisible--or it is inefficient to do so-- into the relatively small memory footprints used in traditional clustered 2-socket HPC solutions. Examples of these types of bigger memory applications can be found in a variety of fields such as weather prediction, manufacturing structure analysis, and financial services.

The high-speed processing requirements and size of these workloads put a greater premium on system memory capacity/bandwidth than on compute FLOPs.

If the larger dataset won’t fit into available memory, and dividing up the dataset to spread across multiple nodes cannot easily be done, then data has to be moved in and out memory to hard disk.  But using hard disk drives (which are many times slower than RAM memory) can drastically impair performance.

There are now two better alternatives to the use of hard drivers. One is SSDs and the other is having a larger memory footprint. Solid State Drives have fairly high data density vs RAM, but much faster access than hard-disk drives--albeit still markedly slower than RAM. Another solution is to simply have more capacity of the faster RAM. This last one is what the Nehalem-EX HPC part is aimed at.

Nehalem-EX is the Expandable Class of Nehalem. The Expandable Class brings all the goodness of the Nehalem architecture (Xeon 5500 product line) to the HPC market, but in the form of a “super node” that has greater: a) core/thread count, b) socket scaling (up to 256), c) I/O and memory capacity (up to 1 terabyte in a 4 socket system) and bandwidth at capacity, d) reliability features, e) and other features.

The 6-core frequency-optimized Nehalem-EX part has also been tuned to offer the highest core frequency possible for this chip.   In creating this part, Intel is meeting the needs of the HPC community that want higher scalar performance along with the benefit of large memory capacity and bandwidth per core.

Of course the 8-core version of NHM-EX is still an option for those HPC workloads that scale well with more cores while still looking for the high memory capacity of the expandable class.

By having both 8-core and frequency optimized 6-core versions of the NHM-EX class of processors means HPC researchers have greater choice in selecting the processor best suited for their specific workloads.

After talking with some of the researchers at SC’09 last week I’m really excited to see how the Nehalem-EX “super node” will deliver the necessary compute and memory capabilities to help those researchers solve some of their biggest challenges.

0 Comments Permalink
2

A MONSTER CHIP IS COMING. The next generation of MP processor is targeted for production later this year, and by all accounts it is going to be a monster. Nehalem-EX is part of the Nehalem family of processors, but compared to its siblings it has the highest cores/threads count, largest shared cache, highest CPU-to-CPU bandwidth, highest I/O bandwidth, highest memory capacity, highest memory bandwidth, greatest scalability, and highest level of Reliability/Availability/Serviceability. It’s expected to bring a gargantuan, unprecedented leap in capabilities and performance--the biggest leap in all of Xeon product history.

 

IT’S TARGETED AT “BIG BOXES”. Big box servers are multiprocessor systems using the most capable processors and platform components. These systems are targeted at applications and usages that require the largest memory footprints, the highest amounts of single-box processing power (for workloads that don’t decompose well into lots of independent threads) and/or advanced levels of RAS. Such systems are typically the best choice for large databases, ERP apps, Business Intelligence apps, large-scale server consolidation and business-critical virtualization, mission critical applications and large scale high performance computing.

 

IT USES THE SAME PROCESSING TECHNOLOGY AS THE SUCCESSFUL XEON 5500, BUT MORE OF IT. Just like with Xeon 5500, the Nehalem micro-architecture brings improved single-threaded performance via IPC (Instructions per Clock) enhancements and Intel’s Hi-k 45nm manufacturing process. Greater multi-threaded performance comes via Hyper-Threading and more cores. But while the Xeon 5500 has up to 4 cores/16threads per socket, the Nehalem-EX monster doubles that to 8 cores/16 threads.

 

HAS A BEEFIER MEMORY AND INTERCHIP COMMUNICATION SUBSYSTEMS. Monster thread processing capabilities require monster size feeding to bring out the best performance. Nehalem-EX’s raw processing potential is made viable by a heavy duty memory subsystem and inter-chip communication system.

Nehalem-EX has 24MB of shared level 3 cache--that’s 50% more than the current Xeon 7400 and 200% more than Xeon 5500. The memory channel bandwidth was increased to 9-times that of Xeon 7400. And it’s all attached to up to 16 DIMM slots per socket (that’s 64DIMMs slots for 4 sockets)—double the current generation of Xeon 7400.

In a multi-socket system, processors need to communicate with each other in order to most efficiently coordinate their shared workload. They also need lots of I/O bandwidth. Nehalem-EX has four QuickPath Interconnects on every socket--double that of Xeon 5500. The four QPI links enable Nehalem-EX processors to be directly connected to each other in a 4 socket system. This offers significant performance advantage over a so-called ring architecture wherein some processor-to-processor communication must go through an intermediary processor. The extra QPIs also mean that there’s plenty of CPU to I/O bandwidth.

 

EXPECTED TO BRING THE GREATEST LEAP FORWARD IN XEON PERFORMANCE EVER. On key server performance benchmarks (e.g. SPEC_int_rate, SPEC_floating point_rate, TPC-C, etc) Xeon 5500 using Nehalem technology brought gains of over 100-200% greater than prior generation. Generational gains of this magnitude come along just about once a decade. Nehalem-EX’s generation-to-generation performance gains are expected to be substantially higher than those of Xeon 5500. We’ve already seen measured memory bandwidth of 9X vs. prior generation. That’s an early indication of the level by which new performance records will be set when this monster chip comes to market.

Related Topics:

NHM-EX Press Fact Sheet

NHM-EX May 26th Press Briefing Video – condensed version

IBM 8Socket Demo Video

 

NHM-EX--A New Standard

2 Comments Permalink

Filter Blog

By author: By date: By tag: