Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > power_management

The Server Room Blog

6 Posts tagged with the power_management tag
0

          

Each of the last 3 years, Rich Uhlig, myself and the rest of our colleagues at Intel focused on virtualization technologies, have had the enviable task of participating in two of the technology industry's biggest events. It is always a pleasure to stretch one abilities, work longer hours than you ever thought capable, work on great product introductions, develop new business models and help to redefine an industry while using these events to make your announcements. This week VMWare's VMworld was held in San Francisco with over 11,000 participants focused on virtualization technology. Intel VP and GM Doug Fisher delivered a keynote on "Transforming Flexible Computing", which nicely communicated the message that Rich delivers in the attached video on the Intel Channel on YouTube. We also announced the support of VMWare View and Intel vPro technology with VMWare's Jocelyn Goldfein. This culminates over 2 years worth of work for our engineering and development teams on bringing together 2 of the virtualization industry's leading platforms.

         This announcement is the beginning of an era of Virtualization Flexibility. Each day we are seeing new usage models emerging, virtualization finding new ways to allow users more flexibility in the Data Center, on the handheld and with their desktop form factors. As we approach IDF 2009, both Rich and I, will be hosting courses on these emerging models and architectural directions. Rich will be hosting a course on architecture, while I have the pleasure of hosting a panel with Simon Crosby, Mike Neil, Ed Bugnion, Lew Tucker and Orran Krieger. It is quite a line up. In addition, one of our colleagues, Charlton Barreto has some breakthrough new usage models to demonstrate that we believe are outstanding. All of these will be available in the IDF Virtualization community for the 3rd year in row. I personally feel very fortunate to have the opportunity to work with such interesting and talented individuals everyday. The conferences provide an opportunity for us to share our enthusiasm for technology, our enthusiasm for innovation and our commitment to excellence with the rest of the world. The feedback has been great and required for us to continue to innovate.

         Come see us, tell us and push us to build technology that delivers value in the way you work, live and play. It is a challenge we embrace and we are thankful we have the opportunity to take action.

See you at IDF!

0 Comments Permalink
0

The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

 

Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

 

 

The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

     

This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

 

The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

 

DCM-GUI.png

 

The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

     

There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

     

Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

     

The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

 

The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

 

 

 

Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

Combined Power.png

The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

0 Comments Permalink
2

The recently introduced Intel® Xeon® 5500 Series Processor, formerly code named Nehalem brings a number of power management features that not only improve on energy efficiency over previous generations, such as a more aggressive implementation of power proportional computing.  Depending on the server design, users of Nehalem-based servers can expect idle power consumption that is about half of the power consumed at full load, down from about two thirds in the  previous generation.

A less heralded capability for this new generation of servers is that users can actually adjust the server power consumption and therefore trade off power consumption against performance.  This capability is known as power capping. The power capping range is not insignificant.  For a dual socket server consuming about 300 watt at full load, the capping range is in the order of 100 watts, that is, for a fully loaded server consuming 300 watts, power consumption can ratcheted down to about 200 watts.  The actual numbers depend on the server implementation.

The application of this mechanism for servers deployed in a data center leads to some energy savings.  However, perhaps the most valuable aspect of this technology is the operational flexibility it confers to data center operators.

This value comes from two capabilities:  First, power capping brings predictable power consumption within the specified power capping range, and second, servers implementing power capping offer actual power readouts as a bonus: their power supplies are PMBus(tm) enabled and their historical power consumption can be retrieved through standard APIs.

With actual historical power data, it is possible to optimize the loading of power limited racks, whereas before the most accurate estimation of power consumption came from derated nameplate data.  The nameplate estimation for power consumption is a static measure that requires a considerable safety margin.  This conservative approach to power sizing leads to overprovisioning of power.  This was OK in those times when energy costs were a second order consideration.  That is not the case anymore.

This technology allows dialing the power to be consumed by groups of over  a thousand servers, allowing a power control authority of tens of thousands of watts in data centers.  How does power capping work?  The technology implements power control by taking advantage of the CPU voltage and frequency scaling implemented by the Nehalem architecture.  The CPUs are one of the most power consuming components in a server.  If we can regulate the power consumed by the CPUs we can have an effect on the power consumed by the whole server.  Furthermore, if we can control the power consumed by the thousands of servers in a data center, we'll be able to alter the power consumed in that data center.

Power control for groups of servers is attained by composing power control capabilities of power control of each server.  Likewise, power control for a server is attained by composing CPU power control as illustrated in the figure below.  We will explain each of the constructs in the rest of this article.

hierarchy.png

Conceptually, power control for thousands of servers in a data center is implemented through a series of coordinated set of nested mechanisms.

The lowest level is  implemented through frequency and voltage scaling: laws of physics dictate that for a given architecture, power consumption is proportional to the CPU's frequency and to the square of the voltage use to power the CPU.  There are mechanisms built into the CPU architecture that allow a certain number of discrete combinations of voltage and frequency.  Using the ACPI standard nomenclature, these discrete combinations are called P-states, the highest performing state is nominally identified as P0, and the lower power consumption states are identified as P1, P2 and so on.  A Nehalem CPU supports about ten states, the actual number depending on the processor model.  For the sake of an example, a CPU in P0 may have been assigned a voltage of 1.4 volts and 3.6 GHz, at which point it draws about 100 watts.  As the CPU transitions to lower power states, it may have a state P4 using 1.2 volts running at 2.8 GHz and consuming about 70 watts.

The P-states by themselves can't control the power consumed by a server.  The CPU itself has no mechanisms to measure the power it consumes.   This mechanism is implemented by firmware running in the Nehalem chipset. This firmware implements the Intel(r) Dynamic Node Power Management technology, or Node manager for short..  If what we want is to measure the power consumed by a server, looking only at CPU consumption does not provide the whole picture.  For this purpose, the power supplies in Node Manager-enabled servers provide actual power readings for the whole server.  It is now possible to establish a classic control feedback loop where we compare a target power against the actual power indicated by the power supplies.  The Node Manager code manipulates the P-states up or down until the desired target power is reached.  If the desired power lies between two P-states, the Node Manager code rapidly switches between the two states until the average power consumption meets the set power.  This is an implementation of another classic control scheme, affectionately called bang-bang control for obvious reasons.

NM.png

From a data center perspective, regulating power consumption of just a single server is not an interesting capability.  We need the means to control servers as a group, and just as we were able to obtain power supply readouts for one server, we need to monitor the power for the group of servers to allow meeting a global power target for that group of servers.  This function is provided by a software development kit (SDK), the Intel(r) Data Center Manager or Intel DCM for short. Notice that DCM implements a feedback control mechanism very similar to the mechanism that regulates power consumption for a single server, but at a much larger scale.  Instead of watching one or two power supplies, DCM oversees the power consumption of multiple servers or "nodes", whose number can range up to thousands.

 

dcm.png

 

Intel DCM was purposely architected as an SDK as a building block for industry players to build more sophisticated and valuable capabilities for the benefit of data center operators.  One possible application is shown below, where Intel DCM has been integrated into a Building Management System (BMS) application.  Some Node Manager-enabled servers come with inlet temperature sensors.  This allows the BMS application to monitor the inlet temperature of group of servers, and if the temperature rises above a certain threshold, it can take a number of measures, from throttling back the power consumed to reduce the thermal stress on that particular area of the data center to alerting system operators.  The BMS can also coordinate the power consumed by the server equipment, for instance with the  CRAC fan speeds.

 

DataCenter.png

With this discussion we have barely begun to scratch the  surface of the capabilities from the family of technologies implementing power management.  In subsequent notes we'll dig deeper into each of the components and explore how they are implemented, how these technologies can be extended and the extensive range of uses for which they can be applied.

 

2 Comments Permalink
0

Datacenter Dynamic Power Management – Intelligent Power Management on Intel Xeon® 5500

With newly released Intel Xeon® 5500 Processor family, it comes with a new breed of datacenter power management technology - Intel® Intelligent Power Node Manager (Node Manager in short).

As a former datacenter engineering manager, I had personal experience of the management issues at datacenters, especially dealing with power allocations and cooling – we often assumed the worse case scenario as we could not predict when the server power consumption will peak. When it did peak, we had no way to control it. It is like driving with blindfold and hope for the best outcome. The safest bet was to make the road as wide as possible - leave enough headroom for the power budget, so that we would not run into power issues. But it resuled in under utilized power, or stranded power, that is quite a waste.

Over the course of last several years, we met with many IPDC (internet portal datacenter) companies. We heard over and over again of their datacenter power management challenges, which was even worse than I experienced. Many of the IPDC companies we talked with leased racks from datacenter service providers under strict power limits per rack. The number of servers per rack they can fit had direct impact to their bottomline. They did not want to under-populate the racks, as they had to pay more rent for the same amount of servers; they could not over-populate the racks as it would be over the power limits. Their power management issues could be best summerized as the following:

·        Over-allocation of power: Power allocation to servers does not match actual server power consumption. Power is typically allocated for worst case scenario based on server nameplate. Static allocation of power budget based on worst case scenario leads to inefficiencies and does not maximize use of available power capacity and rack space.

·        Under-population of rack space: As a direct result of the over-allocation problem, there is a lot of empty space on racks. When the business needs more compute capacity, they have to pay more for additional racks. There are not enough datacenter spaces for them to rent. As a result, they had to go to other cities even other countries – increased operational cost and supporting staff.

·        No capacity planning: There is not effective means to forecast and optimize power and performance dynamically at rack level. To improve power utilization, datacenters needs to track actual power and cooling consumption and dynamically adjust workload and power distribution for optimal performance at rack and datacenter levels.

This is where the Node Manager comes to play. Let’s take a look at what Node Manager and its companion software tool provided by Intel for rack and group level power management – Intel® Data Center Manager (DCM) will do:

Intel® Intelligent Power Node Manager (Node Manager)

Node Manager is an out-of-band (OOB) power management policy engine embedded in Intel server chipsets. Processors carry the capability to regulate their power consumption through the manipulation of the P- and T-states. Node Manager works with the BIOS and OS power management (OSPM) to perform this manipulation and dynamically adjust platform power to achieve maximum performance and power for a single node. Node Manager has the following features:

·        Dynamic Power Monitoring: Measures actual power consumption of a server platform within acceptable error margin of +/- 10%. Node Manager gathers information from PSMI instrumented power supplies, provides real-time power consumption data singly or as a time series, and reports through IPMI interface.

·        Platform Power Capping: Sets platform power to a targeted power budget while maintaining maximum performance for the given power level. Node Manager receives power policy from an external management console through IPMI interface and maintains power at targeted level by dynamically adjusting CPU p-states.

·        Power Threshold Alerting: Node Manager monitors platform power against targeted power budget. When the target power budget cannot be maintained, Node Manager sends out alerts to the management console

Intel® Data Center Manager (DCM)

DCM is software technology that provides power and thermal monitoring and management for servers, racks and groups of servers in datacenters. It builds on Node Manager and customers existing management consoles to bring platform power efficiency to End Users. DCM implements group level policies that aggregate node data across the entire rack or data center to track metrics, historical data and provide alerts to IT managers. This allows IT managers to establish group level power policies to limit consumption while dynamically DCM provides allows data centers to increase rack density, manage power peaks, and right size the power and cooling infrastructure. It is a software development kit (SDK) designed to plug-in to software management console products. It also has a reference user interface which was used in this POC as proxy for a management software product. Key DCM features are:

·        Group (server, rack, row, PDU and logical group) level monitoring and aggregation of power and thermals

·        Log and query for trend data for upto one year

·        Policy driven intelligent group power capping

·        User defined group level power alerts and notifications

·        Support of distributed architectures (across multiple racks)

What the combination of DCM and Node Manager will do to datacenter power management? Here is the magic part… With the DCM at group and rack level setting policies, Node Manager can dynamically report the power consumed by a server and adjust it within certain range, so that the overall power consumption of the rack or a particular server group could be managed within a given target. Why this is important? Let me use a real example to explain it:

IPDC Company XYZ (a name I cannot disclose in public) has a mission critical workload at their datacenter that runs 24x7 and there are workload fluctuations during the day. The CPU utilization is mostly at 50~60%, with few cases that it will jump to 100%, typical for datacenter operations. To be on the safe side, the current solution is to do a pre-qualification of the Xeon® 5400 server for the worst case at 100% CPU utilization which ran at ~300W. They used 300W for power allocation, which was considered significantly lower than the nameplate value of the power supply (650W).

With Xeon® 550, for the same workload at 100% throughput, the platform power consumption goes down to 230W, a 70W reduction from the previous generation CPU – a good reason to switch to a new platform due to the advance intelligent power optimization features on Xeon® 5500. But the story does not end there…

On top of that, we further analyze the effect of power capping using Node Manager and DCM. After many tests, we noticed that if we cap at 170W and the performance of impact for workload at 60% CPU utilization and blow is almost negligible. This means, that we 170W power capping, the platform can deliver the same level of services most of the time, with 50W less (230W-170W) power consumption. For occasional spike that is above 60% CPU utilization, there will be some performance impact. However, since the Company XYZ operates at below 60% CPU utilization most of the time, the performance impacts are tolerable. As a result, we can squeeze more power from the power allocation using the dynamic power management feature of Node Manager and DCM.

What does this mean to the Company XYZ? Well, we can do the math. The rack they lease today has the limit of 2,200W/rack. With the current Xeon® 5400 servers, they can put upto 7 servers per rack at 300W per server. With Xeon® 5500, they can safely put 9 servers at 230W per server – a 28% increase of the server density on the rack. Top it up, by using Node Manager and DCM to manage the power at rack level with power limit of 2,200W and dynamically adjust the power allocation among the servers, we can put at least 12 servers at an average of 170W power allocation per server – a 71% increase of the server density comparing with the situation today! This means a great saving for the Company XYZ. In this case, the power consumption of each server on the rack could go above 170W, or lower than 170W. DCM dynamically adjusts the power capping policy while holding the line for entire rack power consumption below 2,200W.

Of course, the power management result varies from workload to workload. There has to be workload-based optimization in order to achieve the best result. Also, we assume that the datacenter should be able to provide sufficient cooling for devices that consume power within the given power limit. Even though, the result we get from this test could not be applied universally to all IPDC customers, we have finally had a platform that can dynamically and intelligently monitor and adjust the platform power based on workload. For datacenter managers, you can manage power at rack level and datacenter level with optimized power allocation to fully utilize the datacenter power. Are you ready to give it a try?

0 Comments Permalink
0

In our previous post we noted that the state of the art power montoring in virtualized environments is much less advanced than power monitoring applied to physical systems.  There is a larger historical context, and economic implications in the planning and operation of data centers that make this problem worth exploring.

Let's look at a similar dynamic in a different context: In the region of the globe where I grew up, water used to be so inexpensive that residential use was not metered.  The water company would charge a fixed amount every month and that was it.  Hence, tenants in an apartment would never see a water bill.  The water bill was a predictable cost component in the total cost of the building and included in the rent.  Water was essentially an infinite resource and reflecting this fact, there were absolutely no incentives in the system for residents to reign in water use.

As the population increased, water became increasingly a more precious and expensive resource.  The water company started installing residential water meters, but bowing to tradition, landlords continued to pay the bills, which was still a very small portion of the overal operating costs.  Tenants still had no incentive to save water because they did not see the water bill.

Today there are very few regions in the world where water can be treated as an infinite resources.  The cost of water increased so much faster than other cost components to the point that landlords decided to expose this cost to tenants.  Hence the practice of tenants paying the specific consumption for the unit they occupy is common today.  Also, because this consumption is exposed at the individual unit level, the historical data can be used as the basis for the implementation of water conservation policies, for instance charging penalty rates for use beyond a certain threshold.

The use of power in the data center has been following a similar trajectory.  For many years the cost of power had been a noise level item in the cost of operating a data center.  It was practical to include the cost of electricity in the bill of the cost of the facilities.  Hence IT managers would never see the energy costs.  This situation is changing as we speak.  See for instance this recent article in Computerworld.

Recent Intel-based server platforms, such as the existing Bensley platform, and more recently, the Nehalem-EP platform to be introduced in March come with instrumented power supplies that allow the monitoring and control of power use at the individual server level.  This information allows compiling a historical record of actual power use that is much more accurate than the more traditional method of using derated nameplate power.

The historical information is useful for data center planning purposes by delivering a much tighter forecast, beneficial in two ways: by reducing the need to over-specify the power designed into the facility or by maximizing the amount of equipment that can be deployed for a fixed amount of power available.

From an operational perspective we can expect ever more aggressive implementations of power proportional computing in servers where we see large variations between power consumed at idle vs. power consumed at full load.  Ten years ago this variation used to be less than 10 percent.  Today 50 percent is not unusual.  Data center operators can expect wider swings in data center power demand.  Server power management technology provides the means to manage these swings, stay within a data center's power envelope, yet maintain existing service level agreements with customers.

There is still one more complication:  with the steep adoption of virtualization in the data center in the past two years starting with consolidation exercises, an increasing portion of business is being transacted using virtualized resources.  Under this new environment, using a physical host as the locus for billing power may not be sufficient anymore, especially in multi-tenant environments, where the cost centers for virtual machines running in a host may reside in different departments or even in different companies.

It is reasonable to expect that this mode of fine grained power management at the virtual machine level will take root in cloud computing and hosted environment where resources are typically deployed as virtualized resources.  Fine grained power monitoring and management makes sense in an environment where energy and carbon footpring is a major TCO component.  To the extent that energy costs are exposed to users along as the MIPS consumed, this information provides the checks and balances and the data to implement rational policies to manage energy consumption.

Based on the considerations above, we see a maturation process for power management practices in a given facility happening in three stages.

  1. Stage 1: Undifferentiated, one bill for the whole facility.  Power hogs and energy efficient equipment are thrown in the same pile.  Metrics to weed out inefficient equipment are hard to come by.
  2. Stage 2: Power monitoring at the physical host level implemented.  Exposes inefficient equipment.  Many installations are feeling the pain of increasing energy cost, but organizational inertia prevents passing costs to IT operations.  Power monitoring at this level may be too coarse grained, too little, too late for environments that are rapidly transitioning to virtualization with inadequate support for multi-tenancy.
  3. Stage 3: Power monitoring encompasses virtualized environments.  This capability would align power monitoring with the unit of delivery of value to customers.
0 Comments Permalink
0

Given the recent intense focus in the industry around data center power management and the furious pace of the adoption of virtualization, it is remarkable that the subject of power management in virtualized environments has received relatively little attention.

 

It is fair to say that power management technology has not caught with virtualization.

 

Here are a few thoughts on this particular subject, which I intend to elaborate in subsequent transmittals.

 

For historical reasons the power management technology available today had its inception in the physical world where watts consumed in a server can be traced to the watts that came through the power utility feeds.  Unfortunately, the semantics of power in virtual  machines have yet to be comprehensively defined to industry consensus.

 

For instance, assume that the operating system running  in a virtual image decides to transition the system to the ACPI S3 state, sleep to memory.  What we have now is the state of the virtual image preserved in the image's memory with the virtual CPU turned off.

 

Assuming that the system is not paravirtualized, the operating system can't tell if it's running in a physical or virtual instance. The effect of transitioning to S3 will be purely local to the virtual machine.  If the intent of the system operator was to transition the machine to S3 to save power, it does not work this way.   The virtual machine still draws resources from the host machine and requires hypervisor attention. Transitioning the host itself to S3 may not be practical as there might be other virtual machines still running, not ready to go to sleep.

 

Consolidation is another technology for reducing data center power consumption by driving up the server utilization rates.  Consolidation for power management is a blunt tool, where applications that used to run in a physical server are now virtualized and squished into a single physical host.  The applications are sometimes strange bedfellows.  Profiling might have been done to make sure they could coexist, as a priori, static exercise with the virtual machine instances treated as black boxes. There is no attempt to look at the workload profiles inside each virtualized instance and in real time.  Power savings come from an almost wishful side effect of repackaging applications formerly running in a dedicated server into virtualized instances.

 

A capability to map power to virtual machines, in both directions, from physical to virtual and virtual to physical would be useful from an operational perspective.  The challenge is twofold, first from a monitoring perspective because there is no commonly agreed method yet to prorate host power consumption to the virtual instances running within, and second from a control perspective.  It would be useful to schedule or assign power consumption to virtual machines, allowing end users tomake a tradeoff between power and performance.  Fine grained power monitoring would allow prorating power costs to application instances, introducing useful pricing checks and balances encouraging energy consumption instead of the more common method today of hiding energy costs in the facility costs.

0 Comments Permalink

Filter Blog

By author: By date: By tag: