Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > data_center_manager

The Server Room Blog

4 Posts tagged with the data_center_manager tag
0

In spite of significant gains in server energy efficiency, power consumption in data centers is still trending up.  At the very least, we can make sure that the energy expended yields maximum benefit to the business.  A first step in managing power in the servers in a data center is having a fairly accurate monitoring capability for power consumption.  The second step is to have a number of levers that allow using the monitoring data to carry out an effective power management policy.

 

While we may not be able to stem the overall growth of power consumption in the data center, there are a number of measures we can take immediately:

· Implement a peak shaving capability.  The data center power infrastructure needs to be sized to meet the demands of peak power.  Reducing peaks effectively increase the utilization of the existing power infrastructure.

· Be smart about shifting power consumption peaks. All the watts are not created equal.  The incremental cost of generating an extra watt of power during peak consumption hours is much higher than the same watt generated in the wee hours of the morning.  For most consumer and the smaller commercial accounts flat rate pricing still prevails.  Real time pricing (RTP) and negotiated SLAs will become more common to put the appropriate economic incentives in place.  The incentive of real time pricing is a lower energy bill overall, although the outcome is not guaranteed.  In pilot programs residential consumers have complained that RTP result in higher electricity costs.  With negotiated SLAs the customer can designate a workload to be subject to lower reliability; for instance, instead of 3 9’s, or outages amounting to about 10 hours per year, the low reliability workload can be designated as only 90 percent reliable, and can be out on the average of two hours per day.

· Match the electric power infrastructure in the data center to server workloads to minimize over-provisioning.  This approach assumes the existence of an accurate power consumption monitoring capability.

· Upgrading the electrical power infrastructure to accommodate additional servers is not an option in most data centers today.  Landing additional servers at a facility that's working at the limit of thermal capacity leads to the formation of hot spots, this assuming that electrical capacity limits are not reached first with no room left in certain branch circuits.  Hence measures that work under the existing power infrastructure are to be preferred over alternatives that require additional infrastructure.

 

 

For the purposes data center strategic planning it may make economic sense to grow large data centers in a modular fashion.  If the organization manages a number of data centers, consider making effective use of the existing data centers, and when new construction is justified, redistribute the workloads to the new data center to maximize the use of the new electrical supply infrastructure.

 

Intel has built into its server processor lineup a number of technology ingredients that allow data center operators optimize the utilization of the available power system infrastructure in the data center.

 

 

Newer servers of the Nehalem generation are much more energy efficient, if only because of the side effect of increased performance per watt.  These servers also have a more aggressive implementation of power proportional computing.  Typical idle consumption figures are in the order of 50 percent of peak power consumption.

 

 

Beyond passive mechanisms that do not require explicit operator intervention, the Intel® Intelligent Power Node Manager (Node Manager) technology allows adjusting the power draw of a server and trade off power consumption against performance.  This capability is also known as power capping.  The control range is a function of server loading.  For the Intel SR5520UR baseboard on the 2U chassis, the server will draw about 300 watts at full load and its power consumption can be rolled down to about 200 watts.  The control range tapers down gradually until it reaches zero at idle.

 

 

For power monitoring, selected models of the current Nehalem generation come with PMBus specification compliant power supplies allowing real-time power consumption readouts.

 

 

The Node Manager power monitoring and capping capability apply to a single server.  To make this capability really useful it is necessary to exercise these capabilities collectively to groups of servers, to add the notion of events and a capability to build a historical record of power consumption for the servers in a group.  The additional capabilities have been implemented in software through the Data Center Manager Software Development Kit developed by the Intel Solutions and Software Group.  An additional Software Development Kit, Cache River allows programming access to components in servers and server building blocks produced by the Intel Enterprise Products Server Division (EPSD), including the baseboard management controller (BMC) and the management engine (ME), the subsystems that host or interact with the Node Management firmware.  EPSD products are incorporated in many OEM and system integrator offerings.

 

Data Center Manager implements abstractions that apply to collections of servers:

·  A hierarchical notion of logical server groups

·  Power management policies bound to specific server groups

·  Event management and a publish/subscribe facility for acting upon and managing power and thermal events.

·  A database for logging a historical record for power consumption on the collection of managed nodes.

 

 

The abstractions implemented by DCM on top of Node Manager allow the implementation of power management use cases that involve up to thousands of servers.

 

If this topic is of interest to you, please join us at the Intel Development Forum in San Francisco at the Moscone Center on September 22-24.  I will be facilitating course PDCS003, "Cloud Power Management with the Intel(r) Xeon(r) 5500 Series Platform."  You will be the opportunity to talk with some of our fellow travelers in the process of developing power management solutions using Intel technology ingredients and get a feel of their early experience.  Also please make a note to visit booths #515, #710 and #712 to see demonstrations of early end-to-end solutions these folks have put together.

0 Comments Permalink
2

Recently in our test lab, we experienced a cooling failure... and I wasn't even sitting in the lab to realize it.  In fact, I wasn't in the same state!

 

With the recent launch of the Xeon 5500 Series servers - I have been testing some use-cases against four of our servers in our lab when I noticed that the temperature was rising pretty drastically in there.  How did I see this?  Using Intel® Intelligent Power Node Manager embeddd in our Xeon Servers and using our Intel Data Center Manager (DCM) SDK software interface - the data is presented in a visual format.

thermal trip.JPG

In the graph above, the dark colored line is the "front panel inlet" temperature, and in a matter of minutes, the temperature in the lab rose from 71F to 87F - 16 degrees!  What I didn't have setup is the scenario is a power policy that activates on a thermal trip.  Here is how you would setup this policy in Data Center Manager under the Policies section for this rack:

 

thermal-policy.JPG

In the event that a thermal event occurred that would cause the room to heat up to 78F (as shown above) - Intel DCM would send the IPMI commands to the platform which in turn would tell the Node Manager firmware to throttle-back the Xeon CPUs to their lowest P-state possible.  This would reduce energy consumed across the systems in the policy group as well as reduce the thermal output of each server.  This would in turn generate less heat across the servers thereby reducing the load placed on an already overheated lab or datacenter.

 

This gives the server managers more time to gracefully shutdown systems, and/or move the workloads to cooler sections of the datacenter.  If you have ever experienced a cooling failure in the datacenter, it's a usually a frenzy to shutdown machines to minimize heat and/or power utilization overall.  This thermal policy can give you more time before you reach a critical temperature where you start losing components, servers and ultimately - loss of data and productivity.

 

Using standard the standard IPMI interface, the Data Center Manager SDK and Node Manager on the Xeon 5500 series platform enable power monitoring, power management, and front panel inlet monitoring.   This gives a server/datacenter manager the capcity to measure power usage per server, where you'd have to previously have more expensive power measurement tools.  External power meters cost anywhere from a cheap $15 to spendy $1000 - but now the technology is embedded into the firmware on the machine.

 

You can learn more about the Xeon 5500 Series Processors on the Intel Xeon website.

2 Comments Permalink
0

The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

 

Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

 

 

The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

     

This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

 

The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

 

DCM-GUI.png

 

The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

     

There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

     

Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

     

The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

 

The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

 

 

 

Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

Combined Power.png

The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

0 Comments Permalink
2

The recently introduced Intel® Xeon® 5500 Series Processor, formerly code named Nehalem brings a number of power management features that not only improve on energy efficiency over previous generations, such as a more aggressive implementation of power proportional computing.  Depending on the server design, users of Nehalem-based servers can expect idle power consumption that is about half of the power consumed at full load, down from about two thirds in the  previous generation.

A less heralded capability for this new generation of servers is that users can actually adjust the server power consumption and therefore trade off power consumption against performance.  This capability is known as power capping. The power capping range is not insignificant.  For a dual socket server consuming about 300 watt at full load, the capping range is in the order of 100 watts, that is, for a fully loaded server consuming 300 watts, power consumption can ratcheted down to about 200 watts.  The actual numbers depend on the server implementation.

The application of this mechanism for servers deployed in a data center leads to some energy savings.  However, perhaps the most valuable aspect of this technology is the operational flexibility it confers to data center operators.

This value comes from two capabilities:  First, power capping brings predictable power consumption within the specified power capping range, and second, servers implementing power capping offer actual power readouts as a bonus: their power supplies are PMBus(tm) enabled and their historical power consumption can be retrieved through standard APIs.

With actual historical power data, it is possible to optimize the loading of power limited racks, whereas before the most accurate estimation of power consumption came from derated nameplate data.  The nameplate estimation for power consumption is a static measure that requires a considerable safety margin.  This conservative approach to power sizing leads to overprovisioning of power.  This was OK in those times when energy costs were a second order consideration.  That is not the case anymore.

This technology allows dialing the power to be consumed by groups of over  a thousand servers, allowing a power control authority of tens of thousands of watts in data centers.  How does power capping work?  The technology implements power control by taking advantage of the CPU voltage and frequency scaling implemented by the Nehalem architecture.  The CPUs are one of the most power consuming components in a server.  If we can regulate the power consumed by the CPUs we can have an effect on the power consumed by the whole server.  Furthermore, if we can control the power consumed by the thousands of servers in a data center, we'll be able to alter the power consumed in that data center.

Power control for groups of servers is attained by composing power control capabilities of power control of each server.  Likewise, power control for a server is attained by composing CPU power control as illustrated in the figure below.  We will explain each of the constructs in the rest of this article.

hierarchy.png

Conceptually, power control for thousands of servers in a data center is implemented through a series of coordinated set of nested mechanisms.

The lowest level is  implemented through frequency and voltage scaling: laws of physics dictate that for a given architecture, power consumption is proportional to the CPU's frequency and to the square of the voltage use to power the CPU.  There are mechanisms built into the CPU architecture that allow a certain number of discrete combinations of voltage and frequency.  Using the ACPI standard nomenclature, these discrete combinations are called P-states, the highest performing state is nominally identified as P0, and the lower power consumption states are identified as P1, P2 and so on.  A Nehalem CPU supports about ten states, the actual number depending on the processor model.  For the sake of an example, a CPU in P0 may have been assigned a voltage of 1.4 volts and 3.6 GHz, at which point it draws about 100 watts.  As the CPU transitions to lower power states, it may have a state P4 using 1.2 volts running at 2.8 GHz and consuming about 70 watts.

The P-states by themselves can't control the power consumed by a server.  The CPU itself has no mechanisms to measure the power it consumes.   This mechanism is implemented by firmware running in the Nehalem chipset. This firmware implements the Intel(r) Dynamic Node Power Management technology, or Node manager for short..  If what we want is to measure the power consumed by a server, looking only at CPU consumption does not provide the whole picture.  For this purpose, the power supplies in Node Manager-enabled servers provide actual power readings for the whole server.  It is now possible to establish a classic control feedback loop where we compare a target power against the actual power indicated by the power supplies.  The Node Manager code manipulates the P-states up or down until the desired target power is reached.  If the desired power lies between two P-states, the Node Manager code rapidly switches between the two states until the average power consumption meets the set power.  This is an implementation of another classic control scheme, affectionately called bang-bang control for obvious reasons.

NM.png

From a data center perspective, regulating power consumption of just a single server is not an interesting capability.  We need the means to control servers as a group, and just as we were able to obtain power supply readouts for one server, we need to monitor the power for the group of servers to allow meeting a global power target for that group of servers.  This function is provided by a software development kit (SDK), the Intel(r) Data Center Manager or Intel DCM for short. Notice that DCM implements a feedback control mechanism very similar to the mechanism that regulates power consumption for a single server, but at a much larger scale.  Instead of watching one or two power supplies, DCM oversees the power consumption of multiple servers or "nodes", whose number can range up to thousands.

 

dcm.png

 

Intel DCM was purposely architected as an SDK as a building block for industry players to build more sophisticated and valuable capabilities for the benefit of data center operators.  One possible application is shown below, where Intel DCM has been integrated into a Building Management System (BMS) application.  Some Node Manager-enabled servers come with inlet temperature sensors.  This allows the BMS application to monitor the inlet temperature of group of servers, and if the temperature rises above a certain threshold, it can take a number of measures, from throttling back the power consumed to reduce the thermal stress on that particular area of the data center to alerting system operators.  The BMS can also coordinate the power consumed by the server equipment, for instance with the  CRAC fan speeds.

 

DataCenter.png

With this discussion we have barely begun to scratch the  surface of the capabilities from the family of technologies implementing power management.  In subsequent notes we'll dig deeper into each of the components and explore how they are implemented, how these technologies can be extended and the extensive range of uses for which they can be applied.

 

2 Comments Permalink

Filter Blog

By author: By date: By tag: