Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > memory

The Server Room Blog

3 Posts tagged with the memory tag
0

Nehalem-EX: Big Memory for Big Science

I was at SuperComputing’09 last week in Portland, Oregon. I talked with some brilliant people, and saw some fantastic stuff.

It was good timing on my part because last week Intel also announced that it would offer a 6-core, frequency-optimized version of its Nehalem-EX product due out next year. This part is intended for use in tackling some of the types of high performance computing (HPC) workloads prominently displayed at SC’09.

Most people know that the majority of HPC workloads today are based on clusters of relatively small-memory, 2-socket systems. That is because most HPC workloads may be broken into smaller, discrete units of work that can be efficiently processed using such clusters. For these workloads the primary hardware capability selection criterion is typically a balance of both memory bandwidth and compute FLOPs (floating point operations per second).

But there are other types of HPC workloads. Specifically, those that deal with very large datasets (some as large as a terabyte) or those that have to deal with non-sequential memory access. This means the workloads simply aren’t easily divisible--or it is inefficient to do so-- into the relatively small memory footprints used in traditional clustered 2-socket HPC solutions. Examples of these types of bigger memory applications can be found in a variety of fields such as weather prediction, manufacturing structure analysis, and financial services.

The high-speed processing requirements and size of these workloads put a greater premium on system memory capacity/bandwidth than on compute FLOPs.

If the larger dataset won’t fit into available memory, and dividing up the dataset to spread across multiple nodes cannot easily be done, then data has to be moved in and out memory to hard disk.  But using hard disk drives (which are many times slower than RAM memory) can drastically impair performance.

There are now two better alternatives to the use of hard drivers. One is SSDs and the other is having a larger memory footprint. Solid State Drives have fairly high data density vs RAM, but much faster access than hard-disk drives--albeit still markedly slower than RAM. Another solution is to simply have more capacity of the faster RAM. This last one is what the Nehalem-EX HPC part is aimed at.

Nehalem-EX is the Expandable Class of Nehalem. The Expandable Class brings all the goodness of the Nehalem architecture (Xeon 5500 product line) to the HPC market, but in the form of a “super node” that has greater: a) core/thread count, b) socket scaling (up to 256), c) I/O and memory capacity (up to 1 terabyte in a 4 socket system) and bandwidth at capacity, d) reliability features, e) and other features.

The 6-core frequency-optimized Nehalem-EX part has also been tuned to offer the highest core frequency possible for this chip.   In creating this part, Intel is meeting the needs of the HPC community that want higher scalar performance along with the benefit of large memory capacity and bandwidth per core.

Of course the 8-core version of NHM-EX is still an option for those HPC workloads that scale well with more cores while still looking for the high memory capacity of the expandable class.

By having both 8-core and frequency optimized 6-core versions of the NHM-EX class of processors means HPC researchers have greater choice in selecting the processor best suited for their specific workloads.

After talking with some of the researchers at SC’09 last week I’m really excited to see how the Nehalem-EX “super node” will deliver the necessary compute and memory capabilities to help those researchers solve some of their biggest challenges.

0 Comments Permalink
2

A MONSTER CHIP IS COMING. The next generation of MP processor is targeted for production later this year, and by all accounts it is going to be a monster. Nehalem-EX is part of the Nehalem family of processors, but compared to its siblings it has the highest cores/threads count, largest shared cache, highest CPU-to-CPU bandwidth, highest I/O bandwidth, highest memory capacity, highest memory bandwidth, greatest scalability, and highest level of Reliability/Availability/Serviceability. It’s expected to bring a gargantuan, unprecedented leap in capabilities and performance--the biggest leap in all of Xeon product history.

 

IT’S TARGETED AT “BIG BOXES”. Big box servers are multiprocessor systems using the most capable processors and platform components. These systems are targeted at applications and usages that require the largest memory footprints, the highest amounts of single-box processing power (for workloads that don’t decompose well into lots of independent threads) and/or advanced levels of RAS. Such systems are typically the best choice for large databases, ERP apps, Business Intelligence apps, large-scale server consolidation and business-critical virtualization, mission critical applications and large scale high performance computing.

 

IT USES THE SAME PROCESSING TECHNOLOGY AS THE SUCCESSFUL XEON 5500, BUT MORE OF IT. Just like with Xeon 5500, the Nehalem micro-architecture brings improved single-threaded performance via IPC (Instructions per Clock) enhancements and Intel’s Hi-k 45nm manufacturing process. Greater multi-threaded performance comes via Hyper-Threading and more cores. But while the Xeon 5500 has up to 4 cores/16threads per socket, the Nehalem-EX monster doubles that to 8 cores/16 threads.

 

HAS A BEEFIER MEMORY AND INTERCHIP COMMUNICATION SUBSYSTEMS. Monster thread processing capabilities require monster size feeding to bring out the best performance. Nehalem-EX’s raw processing potential is made viable by a heavy duty memory subsystem and inter-chip communication system.

Nehalem-EX has 24MB of shared level 3 cache--that’s 50% more than the current Xeon 7400 and 200% more than Xeon 5500. The memory channel bandwidth was increased to 9-times that of Xeon 7400. And it’s all attached to up to 16 DIMM slots per socket (that’s 64DIMMs slots for 4 sockets)—double the current generation of Xeon 7400.

In a multi-socket system, processors need to communicate with each other in order to most efficiently coordinate their shared workload. They also need lots of I/O bandwidth. Nehalem-EX has four QuickPath Interconnects on every socket--double that of Xeon 5500. The four QPI links enable Nehalem-EX processors to be directly connected to each other in a 4 socket system. This offers significant performance advantage over a so-called ring architecture wherein some processor-to-processor communication must go through an intermediary processor. The extra QPIs also mean that there’s plenty of CPU to I/O bandwidth.

 

EXPECTED TO BRING THE GREATEST LEAP FORWARD IN XEON PERFORMANCE EVER. On key server performance benchmarks (e.g. SPEC_int_rate, SPEC_floating point_rate, TPC-C, etc) Xeon 5500 using Nehalem technology brought gains of over 100-200% greater than prior generation. Generational gains of this magnitude come along just about once a decade. Nehalem-EX’s generation-to-generation performance gains are expected to be substantially higher than those of Xeon 5500. We’ve already seen measured memory bandwidth of 9X vs. prior generation. That’s an early indication of the level by which new performance records will be set when this monster chip comes to market.

Related Topics:

NHM-EX Press Fact Sheet

NHM-EX May 26th Press Briefing Video – condensed version

IBM 8Socket Demo Video

 

NHM-EX--A New Standard

2 Comments Permalink
6

Ever find yourself in a new location staring hopelessly at a map, wondering where you are?  Then to make matters worse, you call someone on your cell phone and can’t describe where you are so they can help? I think we’ve all been there more than once…

Since the Intel Xeon® 5500 processors launched in March, I’ve been getting a bunch of questions (including from the Ask An Expert community [http://communities.intel.com/message/12284#12284] in the Server Room) about DDR3 memory and how best to configure your server platforms to optimize performance.  Many times, folks are having a hard time just getting the conversation started, so here are a couple of tips to get you going.  The good thing is that DDR3 memory picks up where DDR2 memory leaves off in terms of speed, so you know you’ll be moving forward!

  1. Figure out how much memory you need.  With multi-core CPUs now mainstream in servers, you need enough memory to keep these compute engines fed.  One metric you might look at is “GB per CPU core” or “GB per socket” for your existing servers, and then project your memory requirements from there.

  1. Start with DDR3 1066 memory, as that will deliver a good balance of memory performance and capacity. 

ð        If you need more bandwidth (and willing to give up some capacity), use DDR3 1333

ð        If you need maximum capacity (and willing to give up some bandwidth), use DDR3 800

  1. Match your CPU to your memory speed because the faster memory does require a faster processor.  Check out page 11 of the product brief for the quick reference table.

  1. Wherever possible, fill up as many memory channels as possible, and populate all channels evenly (same type, size and number of DIMMs). 

ð        Most two-socket Xeon® 5500 platforms will have a total of 6 memory channels, so aligning your memory requirements to a multiple of 6 GB will optimize memory performance for most application environments.  

ð        However, you can mix/match memory types if your requirements call for something that is not a multiple of 6.

  1. For Server application environments, always go with ECC supported memory.  Decide between Registered (RDIMM) and Unbuffered DIMMs with ECC (UDIMM ECC).

ð        RDIMM provide greatest flexibility across DIMM sizes and availability

ð        UDIMM ECC provide a lower cost alternative if you are using 1 GB or 2 GB DIMMs

You will still want to check with your system vendor on the specifics, such as memory configurations and DIMM types and options supported for a given server, but hopefully this helps you pointed in the right direction.

If you are still lost, ask me a question on this blog or Ask An Expert in the Server Room.

6 Comments Permalink

Filter Blog

By author: By date: By tag: