Home > Intel Communities > Open Port IT Community > The Server Room > Blog > 2009 > March > 31
0

Datacenter Dynamic Power Management – Intelligent Power Management on Intel Xeon® 5500

With newly released Intel Xeon® 5500 Processor family, it comes with a new breed of datacenter power management technology - Intel® Intelligent Power Node Manager (Node Manager in short).

As a former datacenter engineering manager, I had personal experience of the management issues at datacenters, especially dealing with power allocations and cooling – we often assumed the worse case scenario as we could not predict when the server power consumption will peak. When it did peak, we had no way to control it. It is like driving with blindfold and hope for the best outcome. The safest bet was to make the road as wide as possible - leave enough headroom for the power budget, so that we would not run into power issues. But it resuled in under utilized power, or stranded power, that is quite a waste.

Over the course of last several years, we met with many IPDC (internet portal datacenter) companies. We heard over and over again of their datacenter power management challenges, which was even worse than I experienced. Many of the IPDC companies we talked with leased racks from datacenter service providers under strict power limits per rack. The number of servers per rack they can fit had direct impact to their bottomline. They did not want to under-populate the racks, as they had to pay more rent for the same amount of servers; they could not over-populate the racks as it would be over the power limits. Their power management issues could be best summerized as the following:

·        Over-allocation of power: Power allocation to servers does not match actual server power consumption. Power is typically allocated for worst case scenario based on server nameplate. Static allocation of power budget based on worst case scenario leads to inefficiencies and does not maximize use of available power capacity and rack space.

·        Under-population of rack space: As a direct result of the over-allocation problem, there is a lot of empty space on racks. When the business needs more compute capacity, they have to pay more for additional racks. There are not enough datacenter spaces for them to rent. As a result, they had to go to other cities even other countries – increased operational cost and supporting staff.

·        No capacity planning: There is not effective means to forecast and optimize power and performance dynamically at rack level. To improve power utilization, datacenters needs to track actual power and cooling consumption and dynamically adjust workload and power distribution for optimal performance at rack and datacenter levels.

This is where the Node Manager comes to play. Let’s take a look at what Node Manager and its companion software tool provided by Intel for rack and group level power management – Intel® Data Center Manager (DCM) will do:

Intel® Intelligent Power Node Manager (Node Manager)

Node Manager is an out-of-band (OOB) power management policy engine embedded in Intel server chipsets. Processors carry the capability to regulate their power consumption through the manipulation of the P- and T-states. Node Manager works with the BIOS and OS power management (OSPM) to perform this manipulation and dynamically adjust platform power to achieve maximum performance and power for a single node. Node Manager has the following features:

·        Dynamic Power Monitoring: Measures actual power consumption of a server platform within acceptable error margin of +/- 10%. Node Manager gathers information from PSMI instrumented power supplies, provides real-time power consumption data singly or as a time series, and reports through IPMI interface.

·        Platform Power Capping: Sets platform power to a targeted power budget while maintaining maximum performance for the given power level. Node Manager receives power policy from an external management console through IPMI interface and maintains power at targeted level by dynamically adjusting CPU p-states.

·        Power Threshold Alerting: Node Manager monitors platform power against targeted power budget. When the target power budget cannot be maintained, Node Manager sends out alerts to the management console

Intel® Data Center Manager (DCM)

DCM is software technology that provides power and thermal monitoring and management for servers, racks and groups of servers in datacenters. It builds on Node Manager and customers existing management consoles to bring platform power efficiency to End Users. DCM implements group level policies that aggregate node data across the entire rack or data center to track metrics, historical data and provide alerts to IT managers. This allows IT managers to establish group level power policies to limit consumption while dynamically DCM provides allows data centers to increase rack density, manage power peaks, and right size the power and cooling infrastructure. It is a software development kit (SDK) designed to plug-in to software management console products. It also has a reference user interface which was used in this POC as proxy for a management software product. Key DCM features are:

·        Group (server, rack, row, PDU and logical group) level monitoring and aggregation of power and thermals

·        Log and query for trend data for upto one year

·        Policy driven intelligent group power capping

·        User defined group level power alerts and notifications

·        Support of distributed architectures (across multiple racks)

What the combination of DCM and Node Manager will do to datacenter power management? Here is the magic part… With the DCM at group and rack level setting policies, Node Manager can dynamically report the power consumed by a server and adjust it within certain range, so that the overall power consumption of the rack or a particular server group could be managed within a given target. Why this is important? Let me use a real example to explain it:

IPDC Company XYZ (a name I cannot disclose in public) has a mission critical workload at their datacenter that runs 24x7 and there are workload fluctuations during the day. The CPU utilization is mostly at 50~60%, with few cases that it will jump to 100%, typical for datacenter operations. To be on the safe side, the current solution is to do a pre-qualification of the Xeon® 5400 server for the worst case at 100% CPU utilization which ran at ~300W. They used 300W for power allocation, which was considered significantly lower than the nameplate value of the power supply (650W).

With Xeon® 550, for the same workload at 100% throughput, the platform power consumption goes down to 230W, a 70W reduction from the previous generation CPU – a good reason to switch to a new platform due to the advance intelligent power optimization features on Xeon® 5500. But the story does not end there…

On top of that, we further analyze the effect of power capping using Node Manager and DCM. After many tests, we noticed that if we cap at 170W and the performance of impact for workload at 60% CPU utilization and blow is almost negligible. This means, that we 170W power capping, the platform can deliver the same level of services most of the time, with 50W less (230W-170W) power consumption. For occasional spike that is above 60% CPU utilization, there will be some performance impact. However, since the Company XYZ operates at below 60% CPU utilization most of the time, the performance impacts are tolerable. As a result, we can squeeze more power from the power allocation using the dynamic power management feature of Node Manager and DCM.

What does this mean to the Company XYZ? Well, we can do the math. The rack they lease today has the limit of 2,200W/rack. With the current Xeon® 5400 servers, they can put upto 7 servers per rack at 300W per server. With Xeon® 5500, they can safely put 9 servers at 230W per server – a 28% increase of the server density on the rack. Top it up, by using Node Manager and DCM to manage the power at rack level with power limit of 2,200W and dynamically adjust the power allocation among the servers, we can put at least 12 servers at an average of 170W power allocation per server – a 71% increase of the server density comparing with the situation today! This means a great saving for the Company XYZ. In this case, the power consumption of each server on the rack could go above 170W, or lower than 170W. DCM dynamically adjusts the power capping policy while holding the line for entire rack power consumption below 2,200W.

Of course, the power management result varies from workload to workload. There has to be workload-based optimization in order to achieve the best result. Also, we assume that the datacenter should be able to provide sufficient cooling for devices that consume power within the given power limit. Even though, the result we get from this test could not be applied universally to all IPDC customers, we have finally had a platform that can dynamically and intelligently monitor and adjust the platform power based on workload. For datacenter managers, you can manage power at rack level and datacenter level with optimized power allocation to fully utilize the datacenter power. Are you ready to give it a try?

0 Comments Permalink
3

Yesterday – Intel officially launched the Intel® Xeon® 5500 processor (formerly codenamed “Nehalem”) for servers and workstations. One of the most exciting uses of this new platform will be as a key building block in cloud computing infrastructure. Whether you’ve bought into the hype of cloud computing or are a jaded IT realist – you can’t afford to pass up this list of 10 reasons the Intel Xeon 5500 processor is perfect for the cloud.

 

  1. Efficiency. To get the greatest efficiency – the leaders of large-scale Internet providers place their datacenters next to hydroelectric power or other low-cost energy sources. Each watt saved flows straight to the bottom line. Similarly – cloud computing companies intensely scrutinize their server purchases – weighing some variation of this question: how much performance (and by extension, revenue) can I squeeze out of the equipment – versus the cost of procurement and operations. This is the essence of “efficiency”. And now – with Intel’s new Xeon 5500 processor – there’s great news for anyone building efficient cloud infrastructure. The Xeon 5500 can deliver up to 2.25X the computing performance at a similar system power envelope compared to Intel’s previous generation Xeon 5400 series1. (By the way – the Xeon 5400 is no efficiency slouch – as it’s been leading the industry-standard SpecPower results for two socket systems since the benchmark was created.2) Need more evidence of Xeon 5500 efficiency? Look no further than the amazing results announced by IBM – a score of 1860, which is a 64% leap over the previous high score for a two socket system.3 Results like this clearly demonstrate that the Xeon 5500 has the extremely efficient performance that cloud operators are seeking.
  2. Virtualization performance. If a cloud service provider has leveraged a virtualization layer in its architecture - the performance of virtual machines and the ratio of VMs to servers are key concerns. Enter the Xeon 5500 which boasts a stellar jump in virtualization performance, up to 2 times the previous generation Xeon 5400 series4 allowing virtualized clouds to squeeze even more capability out of their infrastructure.
  3. Adaptable. Cloud computing environments tend to be highly dynamic as usage ebbs and flows during the day, some applications scale rapidly while some shut down, and so on. To meet such shifting demand – it’s critical to have adaptable cloud building blocks. And here Intel’s Xeon 5500 shines: this processor has unique new intelligence to increase performance when needed (Intel Turbo Boost) and to reduce power consumption when demand falls (Intel Intelligent Power Management Technology).
  4. Designed for higher operating temperatures. Across the datacenter industry – there’s growing interest in the notion of running datacenters at warmer temperatures to conserve energy. For cloud computing mega-datacenters, this concept has been in practice for several years. But it’s not just the datacenter staff that needs to handle warmer climates - the equipment must tolerate the conditions as well. Intel’s Xeon 5500 has been designed to run at higher temperatures providing one more piece of the puzzle to enable more efficient cloud infrastructure environments5.
  5. 50% lower idle power. Cloud computing providers – like airlines and phone companies – need to run at the highest utilization possible to maintain a healthy P&L. Yet there are times when usage – and thus server utilization – drops and at these times, cloud service providers desire processors with low power consumption. The Xeon 5500 processor now boasts an idle power that’s up to 50% lower than the prior generation systems, reducing energy costs6.
  6. Advanced power management. Intel has incorporated special platform level power technologies into the Xeon 5500 platform – which open new avenues to managing server energy consumption beyond what’s already built into the processor. Intel Intelligent Power Node Manager is a power control policy engine that dynamically adjusts platform power to achieve the optimum performance-power ratio for each server. By setting user-defined platform energy policies – Node Manager can enable datacenter operators to increase server rack density while staying within a given power threshold. While results vary based on the type of application and server – Intel demonstrated up to 20% improvement in rack density by using Node Manager in a recent proof-of-concept with Baidu, a leading search engine7.
  7. High Performance Memory Architecture. Cloud computing and other highly scalable Internet services are often relying on workloads where it makes more sense to keep large volumes of memory in DRAM, close to the CPU, rather than on slower, more distant hard drives. “Memcached” – a distributed caching system used by many leading Internet companies – is but one example. The Intel Xeon 5500 offers several exciting memory architecture benefits over the previous generation: (1) Up to 3.5X the memory bandwidth8 by leveraging an integrated memory controller and Intel Quick Path Interconnect (QPI), (2) supports a larger memory footprint (144GB versus 128GB), and (3) DIMMs and QPI links automatically move to lower power states when not active. In these new caching and distributed workloads, where large memory architectures are crucial, the Intel Xeon 5500 offers real advantages.
  8. Perfect when paired with SSDs. Few technologies get datacenter gurus more excited than solid state drives – which can offer impressive performance gains over their rotating hard drive cousins at far lower energy consumption. But with SSDs that can read 1000 times more data into the CPU versus a HDD – you want a ravenous processing beast to handle the traffic. And – you’re catching on to the blog theme – the Xeon 5500 can provide up to 72% better performance using SSDs than even the previous generation Xeon systems9. Intel Xeon 5500 is truly a perfect engine to complement SSDs.
  9. Ideal for optimized server boards. For cloud infrastructure – where every watt is a pernicious tax – you need more than just an extremely efficient processor such as the Xeon 5500. You also need an optimized server platform that has been stripped of every unneeded feature, configured with world-class energy efficient components, and designed for reduced airflow that minimizes the use of fans. One such product is an Intel server motherboard – codenamed “Willowbrook” which has an impressively low idle power below 70W, considering it’s a dual Xeon 5500 performance rocket10.
  10. A competitive lever for cloud operators. Lastly, for a service provider scaling out its infrastructure – systems based on Intel Xeon 5500 processors could offer a competitive advantage versus service providers whose servers are 2 to 3 years old. Because of the performance leaps in Intel server processors in the past few generations – Intel Xeon 5500 based servers can handle the same performance load as up to three times the number of 3-year old dual core servers11. The benefit is clear: providing the same performance level but with far fewer servers means a leg-up on those service providers with more antiquated, less efficient infrastructure.

 

If you have made it through this lengthy top 10 list – you should have a better sense for the advantages of Intel’s latest processor for cloud computing environments. Of course, the best way to really see the benefits is to get an Intel Xeon 5500 based system from your preferred vendor and test with your own code.

 

1 - 11For Footnotes, Performance Background, and Legal information, please refer to the attached document.

3 Comments Permalink

Filter Blog

By author: By date: By tag: