Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > power_capping

The Server Room Blog

4 Posts tagged with the power_capping tag
0

I am consistently amazed by the stories I hear from customers and in industry publications about the power issues that data centers are facing these days.  Given the increased compute demand, decreasing budgets and power & cooling resource constraints, data centers simply cannot continue to operate as they have in the past.  These challenges are especially true for Cloud deployments, where the sheer scale of the installations magnifies any resource utilization inefficiencies – especially power – and reduces the TCO benefits promised.   Data Center Managers need new levels of understanding and control of their power resources in order to allocate capacity to seamlessly meet the needs of their customers, and instrumentation is evolving to provide those new capabilities that are required.

 

 

At its core, instrumentation is all about sources of data and points on control, and can be at the individual component level, coordinated server level, aggregated group level or even integrated into the facility and building management system level.   At IDF in SFO, you will see a wealth of demos and sessions that will highlight how OEMs and ISVs can use a wealth of instrumentation points - starting with Intel Xeon Processor 5500 features - to develop and deliver innovative management and power management capabilities that can be used to run a Cloud environment is a more efficient manner.  If you are at IDF, stop by one of the following sessions to learn more about instrumentation.

 

 

  • ECTS0004 - Improving Data Center Efficiency With Intel® Xeon® Processor Based Instrumentation
  • PDCS002 - Cloud Power Management with Intel® Microarchitecture (Nehalem) Processor-based Platforms
  • Meet The Experts – informal session in the Server Zone during the Tuesday evening Technology Showcase hours
  • Server Zone in the Technology Showcase to see the power monitoring and capping demos, including Intel Intelligent Power Node Manager.

 

 

I will be staffing the Meet The Experts event – stop by with your questions and thoughts on instrumentation!  See you at IDF Sept 22-24

 

Dave

0 Comments Permalink
0

The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

 

Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

 

 

The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

     

This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

 

The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

 

DCM-GUI.png

 

The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

     

There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

     

Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

     

The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

 

The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

 

 

 

Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

Combined Power.png

The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

0 Comments Permalink
0

Let’s face it; it’s getting harder to measure server density in rack units, and measuring by compute threads in a rack isn’t getting any easier with the core/thread counts increasing year over year.  I still remember from 12 years ago when Intel was acquiring companies who were really good at piecing together single core multi-processor systems and those systems were literally hanging from engine hoists (for demo purposes) because they were so large… I believe they had eight Intel Pentium Pro processors and 128MB of RAM. In comparison - today’s netbooks have more 4 times that amount of memory, in a base configuration.

Modern server micro-architectures have such a large increase in transistors alone, that it’s hard to equate the exponential growth in the complexity of the systems. While power must still be consumed, the same amount of power can be distributed across several cores and platforms now - which is more power efficient, but it also adds more complexity as the number of nodes increase. But just because you have more nodes, doesn’t mean that you can’t manage their efficiency.

David Ott (from the Intel Software Services Group) presents many of the provisioning/power/manageability problems at hand in the video below (5m16s), and explains how Intel is providing the 'touch points' to manage server platforms:

http://software.intel.com/media/videos/2/1/8/a/0/a/e/218a0aefd1d1a4be65601cc6ddc1520e_player.jpg

 

With the upcoming Intel Xeon 5500 Series Processors, not only do you have a high-performing platform; and in Intel fashion they’re also more power-efficient.  With the capabilities to self-throttle power usage via managed P-states per node or be managed via policies by group, time, etc.  Managing for servers isn’t new, but the way that Intel is doing it is a huge leap ahead in manageability at the node level.

 

So I ask:

  • What manageability tools are you using for your enterprise servers today?
  • Is Intel Node Manager on your (or your OEM's) roadmap to gather information on a ‘per server’ basis?
  • Would more discrete information enable you to run your datacenter more efficiently?
  • What manageability items do you struggle within your own datacenter, and what would you like to see in future platforms?

 

If Power Manageability is new to you, I highly suggest you check out Intel Dynamic Power Datacenter Manger, and if you're running a Linux based server - please check out http://www.lesswatts.org to ensure you have the latest ACPI compliant kernel.

 

And as a fun exit, here’s a video that we shot in one of our labs – further strengthening the need for virtualization

(and more importantly – the need for virtualized networks!)

0 Comments Permalink
1

So are you among the approximately 40% of data center managers that are projected to run out of power or cooling capacity in the next 12-241 months and need new options to deal with ever increasing demand for compute capacity? In my discussions with IT professionals, it’s clear that a “business as usual” approach to the design and operation of the data center is no longer sufficient.

In the coming weeks, you will see a number of bloggers write about using Intel Xeon Processor 5500 (Nehalem) servers to refresh the data center – a concept first discussed on this site back in late 2007 - to more efficiently use limited power, cooling and floor space resources in the data center. Today, I want to touch on another means of addressing these issues at hand - using instrumentation as a source of data and controls to better monitor and manage the data center.

Individual pieces of the data & control picture have steadily come into the mainstream via instrumentation of individual server components. Think processors that allow power & frequency to be modulated. Power Supplies that report system level power consumption. Memory that reports its temperature. Fans that can scale RPMs and power to the actual air flow requirements. Really cool capabilities, but these somewhat fragmented sources of data and control don’t provide the capability to manage at the rack or data center level. The challenge at hand is to take all of these individual points of component instrumentation and develop system and data center level capabilities – what I call extended instrumentation – to provide unique and innovative tools that data center managers need.

One of the more exciting extended instrumentation capabilities that has evolved is power capping. Power limits or caps defined and communicated by console management software are enforced by system level functionality, enabling the ability to limit system power in a dynamic fashion. Applications of the use of power capping range from increasing performance density to temporarily shedding compute load to ride through power or thermal events in the datacenter to enabling power based dynamic resource balancing. Power Capping gives IT managers a tool to squeeze additional compute performance out of their existing data center – making more efficient use of their limited and valuable power, cooling and floor space resources to lower costs, improve availability and extend the life of the current data center.

Are you evaluating this capability? Are you using it already? I’m interested in discussing your thoughts on instrumentation and power capping.

1. http://www.infoworld.com/article/08/03/26/Datacenters-heading-for-cash-crunch_1.html

1 Comments Permalink

Filter Blog

By author: By date: By tag: