Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > dcm

The Server Room Blog

5 Posts tagged with the dcm tag
0

Prior to the Intel Xeon X5500 Server Platforms*, measuring server power was done via expensive equipment and could only be performed in a discrete fashion.  Unless you had tons of monitoring equipment to mash-up your power data - it was a tedious process.  Now, using Intel DCM and Node Manager - you can pull multiple servers worth of power info to make some important power decisions in your datacenter.

 

First of all, you need to baseline your workload.  If you're confident that you can replicate workload patterns then you've got a starting point.  Otherwise, it's usually a good idea to start monitoring and looking for some cyclical patterns and/or common data points (time, power, thermals, etc) to keep track of.

 

In this scenario (like in my last blog) we're using a SQL workload which can be modified to run the CPU at high levels for a relatively set amount of time.  The base workload runs for 7 min 30 seconds, as shown in the Intel DCM screencap below.

 

base-workload.jpg

In this test case: Idle power for the 4 servers is 782W, and under load - the power increases to 1174W - which is a delta of 392W.  This power increase occurs when work is given to the server and the P/T states react to the workload and increase power/voltage to the system to increase performance.  Exactly what we've been used to seeing even since EIST was introduced several years ago.

 

Now, what I'll show you is something that may be very interesting in scale... I will power cap the servers by 20W each, and set the Intel DCM Power Policy to only allow 1095W for the 4 servers in the rack.

 

20w-per-server-powercap.jpg

 

What is awesome here is that we can still finish the workload in the same 7 minutes 30 seconds.  So essentially, we have saved 80W of power for each set of 4 servers and still get the same amount of work completed!  In a large datacenter this can be HUGE in energy savings.

 

comparative-workload.jpg

Let's do some quick math:  20W power savings per serer x 10,000 servers = 20kW power savings and you still get the work done.  I hope I just helped some of you server admins get some new ideas on your next "I need a raise" talk with your manager

 

*your mileage may vary, so test your own workloads and report out!

0 Comments Permalink
0

With the Intel Xeon 5500 series (Nehalem) based processors, the X5500 chipset and instrumented power supplies, you can start with the most basic use case for Intel Node Manager - monitoring the power usage of your servers.

 

As you can see in the Intel Datacenter Manager (DCM) screen below - there are multiple servers configured into logical units:  HF2-EIL is the lab that these servers are located in.  Rack 1 and Rack 2 are the physical location of these servers, and each Rack contains 2 servers each.

 

epiitpoctbg01-workload-5.5min.JPG

When you highlight one server (as above in DCM)- you can see the power characteristics over a certain time period.  The time period shown gives you the idle power, max power, and thermal measurement.  The 'hump' in the graph is a SQL workload which creates 'work' for the server and the process runs for about 5 1/2 minutes with no power capping.

 

Here's a graph of the 2nd server in that rack, performing a similar workload.  As you can see, the 2nd server power usage is different than the first.

epiitpoctbg02-workload-5.5min.JPG

 

The Intel Datacenter Manager SDK console can monitor multiple systems as well.  The next graph, is both of those servers in the rack, which accounts for both servers power usage during the same timeframe.

1-rack-workload-5.5min.JPG

Finally, here is the final graph, showing the accumulation of all 4 servers, in both Rack #1 and Rack #2.  This shows the maximum power utilized during the workload, the minimum power (idle) and the inlet thermal temperature in the lab.  Something that hasn't been able to be done before without expensive equipment in the datacenter.

 

2-racks-workload-5.5min.JPG

 

My next power based blog will show how power-capping can give you more effience use of your workload power while using Xeon 5500 series platforms.

0 Comments Permalink
2

http://www.intel.com/sites/sitewide/pix/badges/xeon/xeon09_62_trans.gifIntel Intelligent Power Node Manager is a new technology that is available with the Xeon 5500 Series Platforms released earlier this year.  Many of you have asked me questions via Twitter (@Toadster) about "How can I use Node Manager?" - so I wanted to present some simple use cases to simplify the explanation of Node Manager and how you can best use the technology in your own enterprise.

 

First of all, let's explain the growth problem at hand.  As servers shrink in size, the density of each server 'footprint' is growing from a power perspective... a few years ago, a single 42U rack could hold about 21 servers (estimating 2U servers) - and usually hosting one or two apps/servers per physical server, depending on if you had single or dual-socket servers.  In modern datacenters, that same 42U rack can hold 42 servers (1U each) with 2P per server - so you have an immediate density increase of 2X the # of servers, and 2-4X the number of sockets - which can equate to 16X the number  processor threads per rack...  one good thing is that Intel has been developing newer technologies to keep the TDP of each CPU roughly the same over the same time period between processor updates... where you used to have 2 or 4 cores, you now have 8 to 16 cores at the same thermal envelope!

 

Knowing how much power your platform uses is a key factor in populating racks and rows in your datacenter.  Prior to Node Manager technology, most Datacenter Managers would base rack population on 'nameplate' power - or the (W) rating on your power supply.  That's the 'max' power utilized by the platform, and what the PSU is rated for (worst case).  See the image below...

 

NM Use Case - Using Actual Power Data to Increase Rack Density.jpg

As you can see - using Intel Intelligent Power Node Manager technology, you can view your system's power utilization in real-time using Intel Datacenter Manager and the administrator can implement the power caps to ensure your server rack stays within your required power limits.  By utilizing the 'actual' power limits instead of nameplate power, you can increase your rack density thereby increasing your ROI, and decrease your TCO!  Lets face it - everyone loves saving money!

 

Many of us are familiar with this next scenario... it's summertime, and the power company is announcing that the power grid is under strain.  Personal homes start having their A/C cut-off to save the power grid from brown-outs...  now your enterprise can help reduce those risks as well!

 

NM Use Case - On-Demand Power Reduction.jpg

 

Over the next few weeks, I hope to post more blogs/videos:

 

1. Single Node Power Monitoring & Management
2. Group/Rack Power Monitoring & Management
3. Thermal Monitoring & Management

 

Please provide some feedback, and post your questions and ideas for upcoming blogs!

2 Comments Permalink
0

The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

 

Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

 

 

The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

     

This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

 

The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

 

DCM-GUI.png

 

The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

     

There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

     

Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

     

The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

 

The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

 

 

 

Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

Combined Power.png

The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

0 Comments Permalink
2

The recently introduced Intel® Xeon® 5500 Series Processor, formerly code named Nehalem brings a number of power management features that not only improve on energy efficiency over previous generations, such as a more aggressive implementation of power proportional computing.  Depending on the server design, users of Nehalem-based servers can expect idle power consumption that is about half of the power consumed at full load, down from about two thirds in the  previous generation.

A less heralded capability for this new generation of servers is that users can actually adjust the server power consumption and therefore trade off power consumption against performance.  This capability is known as power capping. The power capping range is not insignificant.  For a dual socket server consuming about 300 watt at full load, the capping range is in the order of 100 watts, that is, for a fully loaded server consuming 300 watts, power consumption can ratcheted down to about 200 watts.  The actual numbers depend on the server implementation.

The application of this mechanism for servers deployed in a data center leads to some energy savings.  However, perhaps the most valuable aspect of this technology is the operational flexibility it confers to data center operators.

This value comes from two capabilities:  First, power capping brings predictable power consumption within the specified power capping range, and second, servers implementing power capping offer actual power readouts as a bonus: their power supplies are PMBus(tm) enabled and their historical power consumption can be retrieved through standard APIs.

With actual historical power data, it is possible to optimize the loading of power limited racks, whereas before the most accurate estimation of power consumption came from derated nameplate data.  The nameplate estimation for power consumption is a static measure that requires a considerable safety margin.  This conservative approach to power sizing leads to overprovisioning of power.  This was OK in those times when energy costs were a second order consideration.  That is not the case anymore.

This technology allows dialing the power to be consumed by groups of over  a thousand servers, allowing a power control authority of tens of thousands of watts in data centers.  How does power capping work?  The technology implements power control by taking advantage of the CPU voltage and frequency scaling implemented by the Nehalem architecture.  The CPUs are one of the most power consuming components in a server.  If we can regulate the power consumed by the CPUs we can have an effect on the power consumed by the whole server.  Furthermore, if we can control the power consumed by the thousands of servers in a data center, we'll be able to alter the power consumed in that data center.

Power control for groups of servers is attained by composing power control capabilities of power control of each server.  Likewise, power control for a server is attained by composing CPU power control as illustrated in the figure below.  We will explain each of the constructs in the rest of this article.

hierarchy.png

Conceptually, power control for thousands of servers in a data center is implemented through a series of coordinated set of nested mechanisms.

The lowest level is  implemented through frequency and voltage scaling: laws of physics dictate that for a given architecture, power consumption is proportional to the CPU's frequency and to the square of the voltage use to power the CPU.  There are mechanisms built into the CPU architecture that allow a certain number of discrete combinations of voltage and frequency.  Using the ACPI standard nomenclature, these discrete combinations are called P-states, the highest performing state is nominally identified as P0, and the lower power consumption states are identified as P1, P2 and so on.  A Nehalem CPU supports about ten states, the actual number depending on the processor model.  For the sake of an example, a CPU in P0 may have been assigned a voltage of 1.4 volts and 3.6 GHz, at which point it draws about 100 watts.  As the CPU transitions to lower power states, it may have a state P4 using 1.2 volts running at 2.8 GHz and consuming about 70 watts.

The P-states by themselves can't control the power consumed by a server.  The CPU itself has no mechanisms to measure the power it consumes.   This mechanism is implemented by firmware running in the Nehalem chipset. This firmware implements the Intel(r) Dynamic Node Power Management technology, or Node manager for short..  If what we want is to measure the power consumed by a server, looking only at CPU consumption does not provide the whole picture.  For this purpose, the power supplies in Node Manager-enabled servers provide actual power readings for the whole server.  It is now possible to establish a classic control feedback loop where we compare a target power against the actual power indicated by the power supplies.  The Node Manager code manipulates the P-states up or down until the desired target power is reached.  If the desired power lies between two P-states, the Node Manager code rapidly switches between the two states until the average power consumption meets the set power.  This is an implementation of another classic control scheme, affectionately called bang-bang control for obvious reasons.

NM.png

From a data center perspective, regulating power consumption of just a single server is not an interesting capability.  We need the means to control servers as a group, and just as we were able to obtain power supply readouts for one server, we need to monitor the power for the group of servers to allow meeting a global power target for that group of servers.  This function is provided by a software development kit (SDK), the Intel(r) Data Center Manager or Intel DCM for short. Notice that DCM implements a feedback control mechanism very similar to the mechanism that regulates power consumption for a single server, but at a much larger scale.  Instead of watching one or two power supplies, DCM oversees the power consumption of multiple servers or "nodes", whose number can range up to thousands.

 

dcm.png

 

Intel DCM was purposely architected as an SDK as a building block for industry players to build more sophisticated and valuable capabilities for the benefit of data center operators.  One possible application is shown below, where Intel DCM has been integrated into a Building Management System (BMS) application.  Some Node Manager-enabled servers come with inlet temperature sensors.  This allows the BMS application to monitor the inlet temperature of group of servers, and if the temperature rises above a certain threshold, it can take a number of measures, from throttling back the power consumed to reduce the thermal stress on that particular area of the data center to alerting system operators.  The BMS can also coordinate the power consumed by the server equipment, for instance with the  CRAC fan speeds.

 

DataCenter.png

With this discussion we have barely begun to scratch the  surface of the capabilities from the family of technologies implementing power management.  In subsequent notes we'll dig deeper into each of the components and explore how they are implemented, how these technologies can be extended and the extensive range of uses for which they can be applied.

 

2 Comments Permalink

Filter Blog

By author: By date: By tag: