Home > Intel Communities > Open Port IT Community > The Server Room > Blog > 2009 > June > 09
Currently Being Moderated
2

Looks like the Intel® Xeon® processor 5500 series is making lots of noise in HPC.  The QPI and integrated memory controller are really providing the boost necessary to make it an all around performance leader for HPC applications.  With all this performance why did Intel add a third memory channel?

The third memory channel enables the platform to support a boat load of memory.  Matter-of-fact, up to 192GB can be supported in a two socket configuration.  It wasn’t too long ago when only 32GB was supported in a dual socket configuration.  By having the ability to support so much memory you can now meet the needs of almost every HPC application.  The 5500 series is intended for all server markets, but let’s face it, with the design changes Intel made with the new architecture the server segment gaining the most benefit appears to be HPC. 

It seemed like yesterday when the only way to have access to large memory configurations was through expensive, proprietary SMP systems.  The HPC market for large SMP systems is still out there but it is shrinking…fast.  Today, we are clustering low cost solutions to create some of the most powerful systems in the world.  Standard components are leading to lower and lower system costs, delivering a price/performance advantage alternative solutions cannot meet.

Now that a single dual socket node can support up to 192GB’s it is important to understand how to get there.  First, to enable 192GB you need 16GB DIMMs x 12 memory slots.  There will be a premium for a 16GB DIMM.  Knowing the options and determining the best, most cost effective solution is going to be dependent upon your environment.  When a large memory node is required, do you purchase the 16GB DIMM’s or go up to a Multi-socket solution?  If I decide to scale back on the memory (use 4GB or 8GB DIMMs instead of 16GB DIMMs) what is the performance impact to my application?  If I am cost sensitive, will the lower cost outweigh the lack of performance?  Can I use SSD’s (Solid State Disk drives) to compensate for any performance loss due to lower memory capacity?  There are many questions to think about when deciding the right configuration for your application and environment and I certainly can’t answer them here.

Let’s not forget the third memory channel enables a different set of optimal memory configurations.  Think x3 when deciding on how much memory to install into your node; 12GB, 24GB, 48GB, etc.  What happens when you don’t use an optimal configuration?  Well it depends, in most cases the impact is minimal, but let me add a bit of context around minimal:

·         Low bandwidth sensitivity (more dependent upon the processor for performance)

        E.g. Monte Carlo, Black-Scholes (financial modeling), BLAST (bioinformatics), AMBER (molecular dynamics)

        Expect less than a 2% difference between memory configurations*

Ÿ  Medium bandwidth sensitivity (somewhat balanced between memory and CPU usage)

        E.g. CFD, Explicit FEA, Implicit FEA (with robust I/O system)

        Expect approx. 5% degradation for non-optimal symmetrical configurations*

Ÿ  High bandwidth sensitivity (high access to the system memory)

        E.g. WRF (weather), POP (climate), MILC (physics), Reservoir Simulation

        Expect approx. 10% degradation for non-optimal symmetrical configurations*

The results are interesting.  In all three cases above, the degraded performance is always better than the performance you would have with only two memory channels.

When you hear about performance impact of non-optimal memory you can see by the examples above, it is application dependent and will not have a severe impact on your overall system performance.   

The Intel Xeon processor 5500 series offers support for huge memory nodes with the addition of the third memory channel.  Memory configurations in multiples of three are ideal, but if you decide to stay with a power of two configuration the performance should still exceed that of a solution based upon only two memory channels.

*Based upon Intel internal measurements



Add a comment Leave a comment on this blog post.
Jun 9, 2009 10:15 PM Guest Nahid Alam  says:

Considering all these technical achievements, the 'big' job is to decide on the 'right' configuration and tuning your applications on that.

Jun 13, 2009 1:32 AM tingshen tingshen    says in response to Nahid Alam:

I was trying to figure out if I shall go for faster UP CPU & lesser memory configuration (6.4GT/s + DDR3-1333Mhz) or slower MP CPUs & more memory configuration (5.86GT/s + DDR3-1066Mhz or 4.8GT/s + DDR3-800Mhz). This is the question about using 1 DIMM per channel or 2-3 DIMMs per channel, which is giving the best optimum performance out there?

 

Some assumptions:

Let's say we have a fixed cost to purchase a new server for virtualization (Hyper-V or Xen) and databases (SQL2008) today, Intel SSDs are used as the storage so bottleneck may not be at the storage level, what is the best configuration at CPU/Memory level?

 

When you have faster QPI speed, does that mean your applications will execute faster thus requires lesser memory to handle complex calculation or heavy DBMS I/O? Does that mean you only need lesser but faster memory with faster processor?

 

It is always cheaper to go for slower MP processors with many more RAMs if we were to use 2-3 DIMMs per channels for Nehalem set up. So what's the key that can make the huge performance impact among all these "interesting" configuration the moment Intel move the memory controller into the CPU?

 

I do find it very rigid to have very restricted memory configurations. Intel shall make all the DIMMs be shared among all processors, and just make more channels available in a more open concept. We never need to be bothered about this kind of CPU/memory configuration thingy while buying servers in the past.....