Skip navigation

The Data Stack

5 Posts authored by: M_Haedrich

When Microsoft shipped version 3 of the Windows HPC Server operating system last month, the company called the release its most ambitious yet, comparing it to “the third stage of a rocket firing.” Indeed, like giving a rocket the final boost it needs before launch, I believe version 3, formally dubbed Windows HPC Server 2008 R2, will propel high-performance computing further into the mainstream.


The HPC market is poised for significant growth. IDC predicts that HPC spending will increase by one-third over the next five years from USD$8.6 billion in 2009 to $11.7 billion in 2014. But while there’s a lot of pent-up demand for supercomputing, some organizations have shied away from it because of the complexity of migrating from a single-node workstation to a multi-node cluster.


Windows HPC Server 2008 R2 reduces this barrier to entry by making HPC easier to use and more accessible. Because the operating system is based on the Windows platform, users can take advantage of the tools they’re already familiar with; system administrators can use Windows-based tools to deploy and manage HPC solutions; developers can build HPC applications by using an integrated set of development tools anchored around the Microsoft Visual Studio 2010 development system; and end users can access HPC resources by using the technologies they already know how to use. Suddenly, it’s a lot easier for organizations to migrate from a workstation to a cluster environment.


At the end of the day, HPC is all about performance and the ability to get your job done faster. Used in combination with the Intel® Xeon® Processor 5600 series, Windows HPC Server 2008 R2 offers organizations the opportunity to maximize performance at a lower total cost of ownership with greater energy savings. It’s a winning combination.


Two interesting examples of how Intel Xeon processors and Windows HPC Server 2008 R2 can help drive HPC into the mainstream are the SGI Octane III and Cray CX1 supercomputers:


SGI recently announced its support for Windows HPC Server 2008 R2 in its personal supercomputer Octane III, saying it wants to extend desk-side computational abilities to a broader audience of technical computer users. Likewise, Cray based its Cray CX1 line of desk-side supercomputers on Intel Xeon processors and Windows HPC Server 2008 R2, designing them to provide top performance and “ease of everything” features at an affordable price. Among the users of the Cray supercomputer is the Laboratory of Neuro Imaging at UCLA, where scientists wanted a simple tool that they could use to better understand brain structure and function. Says Rico Magsipoc, the lab’s CTO, “Having the power of the Cray supercomputer that is simple and compact is very attractive and necessary, considering the physical constraints we face in our data centers today.”


By making supercomputing easy to use, Windows HPC Server 2008 R2 helps companies overcome a key barrier to HPC adoption. The operating system is part of the Microsoft Technical Computing Initiative, which is aimed at advancing technology to better measure, monitor and model the way the world behaves. My prediction is that Windows HPC Server 2008 R2 will significantly increase the market for high-performance computing. By how much, I can’t say. But I do believe it will have a substantial impact on the lower end of the market, which in turn should push HPC growth above and beyond what analysts are forecasting today.

Looks like the Intel® Xeon® processor 5500 series is making lots of noise in HPC.  The QPI and integrated memory controller are really providing the boost necessary to make it an all around performance leader for HPC applications.  With all this performance why did Intel add a third memory channel?

The third memory channel enables the platform to support a boat load of memory.  Matter-of-fact, up to 192GB can be supported in a two socket configuration.  It wasn’t too long ago when only 32GB was supported in a dual socket configuration.  By having the ability to support so much memory you can now meet the needs of almost every HPC application.  The 5500 series is intended for all server markets, but let’s face it, with the design changes Intel made with the new architecture the server segment gaining the most benefit appears to be HPC. 

It seemed like yesterday when the only way to have access to large memory configurations was through expensive, proprietary SMP systems.  The HPC market for large SMP systems is still out there but it is shrinking…fast.  Today, we are clustering low cost solutions to create some of the most powerful systems in the world.  Standard components are leading to lower and lower system costs, delivering a price/performance advantage alternative solutions cannot meet.

Now that a single dual socket node can support up to 192GB’s it is important to understand how to get there.  First, to enable 192GB you need 16GB DIMMs x 12 memory slots.  There will be a premium for a 16GB DIMM.  Knowing the options and determining the best, most cost effective solution is going to be dependent upon your environment.  When a large memory node is required, do you purchase the 16GB DIMM’s or go up to a Multi-socket solution?  If I decide to scale back on the memory (use 4GB or 8GB DIMMs instead of 16GB DIMMs) what is the performance impact to my application?  If I am cost sensitive, will the lower cost outweigh the lack of performance?  Can I use SSD’s (Solid State Disk drives) to compensate for any performance loss due to lower memory capacity?  There are many questions to think about when deciding the right configuration for your application and environment and I certainly can’t answer them here.

Let’s not forget the third memory channel enables a different set of optimal memory configurations.  Think x3 when deciding on how much memory to install into your node; 12GB, 24GB, 48GB, etc.  What happens when you don’t use an optimal configuration?  Well it depends, in most cases the impact is minimal, but let me add a bit of context around minimal:

·         Low bandwidth sensitivity (more dependent upon the processor for performance)

        E.g. Monte Carlo, Black-Scholes (financial modeling), BLAST (bioinformatics), AMBER (molecular dynamics)

        Expect less than a 2% difference between memory configurations*

Ÿ  Medium bandwidth sensitivity (somewhat balanced between memory and CPU usage)

        E.g. CFD, Explicit FEA, Implicit FEA (with robust I/O system)

        Expect approx. 5% degradation for non-optimal symmetrical configurations*

Ÿ  High bandwidth sensitivity (high access to the system memory)

        E.g. WRF (weather), POP (climate), MILC (physics), Reservoir Simulation

        Expect approx. 10% degradation for non-optimal symmetrical configurations*

The results are interesting.  In all three cases above, the degraded performance is always better than the performance you would have with only two memory channels.

When you hear about performance impact of non-optimal memory you can see by the examples above, it is application dependent and will not have a severe impact on your overall system performance.   

The Intel Xeon processor 5500 series offers support for huge memory nodes with the addition of the third memory channel.  Memory configurations in multiples of three are ideal, but if you decide to stay with a power of two configuration the performance should still exceed that of a solution based upon only two memory channels.

*Based upon Intel internal measurements

Intel® has just launched their latest server processor, the Intel® Xeon® processor 5500 series. It really is a breakthrough processor for Intel and a clearly phenomenal solution for HPC. I was watching a keynote presentation this week and our Vice President was downright giddy about it. What makes this processor such a phenomenal solution for HPC? The answer is really easy; it expands capabilities and shortens users’ time to results. The real question is how does this processor perform so much better than other solutions out there? This answer is a bit more complicated but really fun to answer. Here we go…

Intel® QuickPath Interconnect (QPI) – This is the technology that has replaced the front side bus used in previous generation Xeon® processors. Our previous generation architecture had a bandwidth of 21 GB/s vs. the QPI bandwidth of 46.1 GB/s. This is a speedup of 2.2X, very impressive. For applications that require lots of I/O this is huge. It’s like going from a country back road to an expressway!

Integrated memory controller – Intel has moved the memory controller from the MCH (memory controller Hub) into the processor.  In addition to integrating the memory controller, Intel is now using native DDR3 with speeds up to 1333MHz and three memory channelsper processor; this is a total of 6 memory channels and 64 GB/s of total memory bandwidth for a 2S HPC node.  This is a 3x jump in memory bandwidth from theprevious generation memory controller which only supported speeds up to 1066MHz and 4 memory channels. By integrating the memory controller you are now in closer contact with the processor for lower latency reads and writes.  Intel added two additional memory channel (one per socket) to increase memory capacity and increase the speed to faster reads and writes. 

Energy efficient design – The new Intel® Xeon® processor 5500 series has the dynamic capability of turning off cores when not required. There are more power states and has the ability to transition between power states faster than ever before. Net, net this means less power consumption. By consuming less power and providing world class performance Intel has created a solution that cries out HPC!

By taking advantage of the power saving, Intel has introduced another feature called Intel® Turbo Boost Technology. Intel® Turbo Boost Technology automatically increases processor frequency to boost application performance if thermal headroom is available. Depending on the environment Turbo Boost can increase the processor frequency by as much as 400 MHz!

Another technology supported in the Intel® Xeon processor 5500 series is Hyper-Threading. Intel® Hyper-Threading Technology enables users to run multiple threads on each processing core to increase total application performance while requiring only a fraction of the power that would be necessary to support additional cores. For highly threaded HPC applications this is showing performance gains over 25%.

The Intel® Xeon® processor 5500 series is considered a general purpose processor. However, a closer look at the features and capabilities show that this is one heck of an HPC solution. You can’t help but think Intel knew HPC was an important market segment for servers and they had this in mind as they created the architecture and developed the features.

Well, is Intel pounding their chest…again! They should be. The introduction of the Intel® Xeon® processor 5500 series is breakthrough architecture for HPC users. The industry hasn’t seen generation to generation performance gains like this since the Pentium® Pro was introduced back in the mid 90’s. Congratulations Intel and go ahead and pound that chest, you deserve it!

That’s right, now you can buy a supercomputer that fits right under your desk. The PSC’s (Personal supercomputer) of today would have been #1 on the Top500 in November of 1996. ASCI Red would have been the first system to overtake the performance these small supercomputers can provide today.

So why do you need one of these high performance bad boys? Well, if you are trying to keep up with technology, beat out your competition, and do it at the lowest cost possible then you just better think about buying one. Whether you are in manufacturing, engineering, financial services or life sciences, the benefits offered are huge. You can now simulate vs constructing expensive prototypes, you can do more work at your desk vs. waiting to schedule the job on the oversubscribed cluster and most importantly, it provides the competitive advantage you most dearly need to keep up with your customer’s demands for lower pricing.

There are a couple of very interesting solutions on the market right now that should be considered. One is the Cray CX1. This little monster can support up to 16 quad core processors! With Nehalem soon to launch, that is one heck of a lot of performance.

The CX1 is also ICR (Intel Cluster Ready) certified. This certification helps to ensure end users the system will provide a positive out-of-box experience. When you install the system and turn it on, it just works. You are maximizing your investment. The last thing a small business needs is to make the investment and then spend days getting the system up and running.

With today’s economy, many businesses are reducing cost and putting off capital expenditures. You have decided to be like many other businesses and wait just one more year to upgrade your current computing system. Your competition decided not to wait. Instead, they purchased a PSC that was Intel Cluster Ready Certified and are now more productive than ever before. They made the investment and are now turning out new designs faster and at a lower cost than ever before. You scratch your head wondering how they do it. As you try to save your business they are growing and winning new business. Sometimes, being aggressive in a difficult time is the prudent thing to do…

Still wondering if this is right for me? Concerned you don’t have the IT staff to support such as beast? You don’t have the budget? Is there software out there I can use? All are good questions/concerns, but the ICR certified PSC minimizes if not eliminates the need for an IT staff. The PSC is one of the most affordable cluster solutions on the market today. It plugs right into your wall socket…you have tamed the beast! As for software, if you are purchasing an ICR certified system, then there are numerous applications available and most likely, one you are already familiar with.

When you are getting ready to make your next workstation or high-end PC purchase, I strongly recommend you consider one of the new kids on the block, the PSC. If it is Intel Cluster Ready Certified, you can rest assured the solution you do buy will just work.


I was recently going through IDC market data1 and realized HPC now represents 29% of the server market.  This is a significant number that has grown from ~20% to 29% in only 6 months.  The first thing you are probably saying to yourself is ‘why do I care?'  Well, take it from me; coming from Intel, it is a big deal.  Have you ever tried negotiating with someone when you have very little leverage?  You don't get too far do you?  That's how HPC has been up until now.  When trying to design products for HPC the question comes back, what is the return on investment (ROI)?  When we were just a sliver of the market, it was hard to justify.   Now that we are 29%, guess what, we have leverage.  So guess what HPC community?  You can now say it loud and say it proud.  Tell us what you need to make the HPC community stronger, faster and more efficient.  We're listening.



Performance continues to be top on almost everyone's list.  Intel was the first to introduce quad core and we are now launching our first 45nm processors on November 12th.  The new manufacturing process has enabled Intel to almost double the transistor count vs the current 65nm process.  We are increasing the cache and bus frequency to deliver better performance for HPC applications.  How will the added cache and faster bus benefit the commercial HPC applications?  Well, it should most certainly provide faster results for their customers and that is always a good thing.  As we deliver faster processors we continue to investigate the core count.  When we first introduced the Core micro-architecture to HPC, we also introduced quad core.  This has been seen as a very good progression in our silicon roadmap.  What are some of the benefits multi-core brings to market?  By adding cores, we are able to maintain power envelopes while increasing performance.  Are the increased cores helping the HPC market?  My immediate response is, of course, they are.  After talking to ISV's I begin to second guess my immediate response.  Commercial applications are licensed to customers by processor, by core, or by MPI instances.  If application performance does not scale with the increased core count, does it make sense to use a quad core processor?  Will a dual core processor work better than a quad core for certain applications?  As we drill down on this dilemma, we are quickly realizing the answer is sometimes.  Sometimes a dual core will be better and sometimes a quad core will be better.  There is no simple answer to cores and what is the right number for your environment. 



Another area of growing interest in HPC is the performance between nodes.  Our latest chipset, the Intel® 5400, offers PCI Express generation 2.  This will provide twice the bandwidth of generation 1 and is ideal for quad data rate InfiniBandTM.  The Gen 2 will also provide great support for visualization applications.  The Intel® 5400 chipset and the Intel® Xeon® processor 5400 series create our first HPC designed platform.  As we progress on to our next generation we need to ensure the HPC voice is being heard.  As InfiniBand continues to grow in the HPC market does it eventually replace GbE as the interconnect of choice?  Does an HPC optimized product support IB down?  What about Gigabit Ethernet?  How about 10G Ethernet?  What is the market willing to pay for the increased performance? 



There are lots of questions that need to be answered when creating an optimized HPC platform.  One thing is for sure; HPC is now a big time player and can't be ignored.  If you support high performance computers in your data center, make sure your wants and needs are being heard.



1IDC Q2 Server tracker and IDC Q2 Qview



Filter Blog

By date: By tag: