Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > acpi

The Server Room Blog

2 Posts tagged with the acpi tag
2

The recently introduced Intel® Xeon® 5500 Series Processor, formerly code named Nehalem brings a number of power management features that not only improve on energy efficiency over previous generations, such as a more aggressive implementation of power proportional computing.  Depending on the server design, users of Nehalem-based servers can expect idle power consumption that is about half of the power consumed at full load, down from about two thirds in the  previous generation.

A less heralded capability for this new generation of servers is that users can actually adjust the server power consumption and therefore trade off power consumption against performance.  This capability is known as power capping. The power capping range is not insignificant.  For a dual socket server consuming about 300 watt at full load, the capping range is in the order of 100 watts, that is, for a fully loaded server consuming 300 watts, power consumption can ratcheted down to about 200 watts.  The actual numbers depend on the server implementation.

The application of this mechanism for servers deployed in a data center leads to some energy savings.  However, perhaps the most valuable aspect of this technology is the operational flexibility it confers to data center operators.

This value comes from two capabilities:  First, power capping brings predictable power consumption within the specified power capping range, and second, servers implementing power capping offer actual power readouts as a bonus: their power supplies are PMBus(tm) enabled and their historical power consumption can be retrieved through standard APIs.

With actual historical power data, it is possible to optimize the loading of power limited racks, whereas before the most accurate estimation of power consumption came from derated nameplate data.  The nameplate estimation for power consumption is a static measure that requires a considerable safety margin.  This conservative approach to power sizing leads to overprovisioning of power.  This was OK in those times when energy costs were a second order consideration.  That is not the case anymore.

This technology allows dialing the power to be consumed by groups of over  a thousand servers, allowing a power control authority of tens of thousands of watts in data centers.  How does power capping work?  The technology implements power control by taking advantage of the CPU voltage and frequency scaling implemented by the Nehalem architecture.  The CPUs are one of the most power consuming components in a server.  If we can regulate the power consumed by the CPUs we can have an effect on the power consumed by the whole server.  Furthermore, if we can control the power consumed by the thousands of servers in a data center, we'll be able to alter the power consumed in that data center.

Power control for groups of servers is attained by composing power control capabilities of power control of each server.  Likewise, power control for a server is attained by composing CPU power control as illustrated in the figure below.  We will explain each of the constructs in the rest of this article.

hierarchy.png

Conceptually, power control for thousands of servers in a data center is implemented through a series of coordinated set of nested mechanisms.

The lowest level is  implemented through frequency and voltage scaling: laws of physics dictate that for a given architecture, power consumption is proportional to the CPU's frequency and to the square of the voltage use to power the CPU.  There are mechanisms built into the CPU architecture that allow a certain number of discrete combinations of voltage and frequency.  Using the ACPI standard nomenclature, these discrete combinations are called P-states, the highest performing state is nominally identified as P0, and the lower power consumption states are identified as P1, P2 and so on.  A Nehalem CPU supports about ten states, the actual number depending on the processor model.  For the sake of an example, a CPU in P0 may have been assigned a voltage of 1.4 volts and 3.6 GHz, at which point it draws about 100 watts.  As the CPU transitions to lower power states, it may have a state P4 using 1.2 volts running at 2.8 GHz and consuming about 70 watts.

The P-states by themselves can't control the power consumed by a server.  The CPU itself has no mechanisms to measure the power it consumes.   This mechanism is implemented by firmware running in the Nehalem chipset. This firmware implements the Intel(r) Dynamic Node Power Management technology, or Node manager for short..  If what we want is to measure the power consumed by a server, looking only at CPU consumption does not provide the whole picture.  For this purpose, the power supplies in Node Manager-enabled servers provide actual power readings for the whole server.  It is now possible to establish a classic control feedback loop where we compare a target power against the actual power indicated by the power supplies.  The Node Manager code manipulates the P-states up or down until the desired target power is reached.  If the desired power lies between two P-states, the Node Manager code rapidly switches between the two states until the average power consumption meets the set power.  This is an implementation of another classic control scheme, affectionately called bang-bang control for obvious reasons.

NM.png

From a data center perspective, regulating power consumption of just a single server is not an interesting capability.  We need the means to control servers as a group, and just as we were able to obtain power supply readouts for one server, we need to monitor the power for the group of servers to allow meeting a global power target for that group of servers.  This function is provided by a software development kit (SDK), the Intel(r) Data Center Manager or Intel DCM for short. Notice that DCM implements a feedback control mechanism very similar to the mechanism that regulates power consumption for a single server, but at a much larger scale.  Instead of watching one or two power supplies, DCM oversees the power consumption of multiple servers or "nodes", whose number can range up to thousands.

 

dcm.png

 

Intel DCM was purposely architected as an SDK as a building block for industry players to build more sophisticated and valuable capabilities for the benefit of data center operators.  One possible application is shown below, where Intel DCM has been integrated into a Building Management System (BMS) application.  Some Node Manager-enabled servers come with inlet temperature sensors.  This allows the BMS application to monitor the inlet temperature of group of servers, and if the temperature rises above a certain threshold, it can take a number of measures, from throttling back the power consumed to reduce the thermal stress on that particular area of the data center to alerting system operators.  The BMS can also coordinate the power consumed by the server equipment, for instance with the  CRAC fan speeds.

 

DataCenter.png

With this discussion we have barely begun to scratch the  surface of the capabilities from the family of technologies implementing power management.  In subsequent notes we'll dig deeper into each of the components and explore how they are implemented, how these technologies can be extended and the extensive range of uses for which they can be applied.

 

2 Comments Permalink
0

Given the recent intense focus in the industry around data center power management and the furious pace of the adoption of virtualization, it is remarkable that the subject of power management in virtualized environments has received relatively little attention.

 

It is fair to say that power management technology has not caught with virtualization.

 

Here are a few thoughts on this particular subject, which I intend to elaborate in subsequent transmittals.

 

For historical reasons the power management technology available today had its inception in the physical world where watts consumed in a server can be traced to the watts that came through the power utility feeds.  Unfortunately, the semantics of power in virtual  machines have yet to be comprehensively defined to industry consensus.

 

For instance, assume that the operating system running  in a virtual image decides to transition the system to the ACPI S3 state, sleep to memory.  What we have now is the state of the virtual image preserved in the image's memory with the virtual CPU turned off.

 

Assuming that the system is not paravirtualized, the operating system can't tell if it's running in a physical or virtual instance. The effect of transitioning to S3 will be purely local to the virtual machine.  If the intent of the system operator was to transition the machine to S3 to save power, it does not work this way.   The virtual machine still draws resources from the host machine and requires hypervisor attention. Transitioning the host itself to S3 may not be practical as there might be other virtual machines still running, not ready to go to sleep.

 

Consolidation is another technology for reducing data center power consumption by driving up the server utilization rates.  Consolidation for power management is a blunt tool, where applications that used to run in a physical server are now virtualized and squished into a single physical host.  The applications are sometimes strange bedfellows.  Profiling might have been done to make sure they could coexist, as a priori, static exercise with the virtual machine instances treated as black boxes. There is no attempt to look at the workload profiles inside each virtualized instance and in real time.  Power savings come from an almost wishful side effect of repackaging applications formerly running in a dedicated server into virtualized instances.

 

A capability to map power to virtual machines, in both directions, from physical to virtual and virtual to physical would be useful from an operational perspective.  The challenge is twofold, first from a monitoring perspective because there is no commonly agreed method yet to prorate host power consumption to the virtual instances running within, and second from a control perspective.  It would be useful to schedule or assign power consumption to virtual machines, allowing end users tomake a tradeoff between power and performance.  Fine grained power monitoring would allow prorating power costs to application instances, introducing useful pricing checks and balances encouraging energy consumption instead of the more common method today of hiding energy costs in the facility costs.

0 Comments Permalink

Filter Blog

By author: By date: By tag: