Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > node_manager

The Server Room Blog

13 Posts tagged with the node_manager tag
0

Prior to the Intel Xeon X5500 Server Platforms*, measuring server power was done via expensive equipment and could only be performed in a discrete fashion.  Unless you had tons of monitoring equipment to mash-up your power data - it was a tedious process.  Now, using Intel DCM and Node Manager - you can pull multiple servers worth of power info to make some important power decisions in your datacenter.

 

First of all, you need to baseline your workload.  If you're confident that you can replicate workload patterns then you've got a starting point.  Otherwise, it's usually a good idea to start monitoring and looking for some cyclical patterns and/or common data points (time, power, thermals, etc) to keep track of.

 

In this scenario (like in my last blog) we're using a SQL workload which can be modified to run the CPU at high levels for a relatively set amount of time.  The base workload runs for 7 min 30 seconds, as shown in the Intel DCM screencap below.

 

base-workload.jpg

In this test case: Idle power for the 4 servers is 782W, and under load - the power increases to 1174W - which is a delta of 392W.  This power increase occurs when work is given to the server and the P/T states react to the workload and increase power/voltage to the system to increase performance.  Exactly what we've been used to seeing even since EIST was introduced several years ago.

 

Now, what I'll show you is something that may be very interesting in scale... I will power cap the servers by 20W each, and set the Intel DCM Power Policy to only allow 1095W for the 4 servers in the rack.

 

20w-per-server-powercap.jpg

 

What is awesome here is that we can still finish the workload in the same 7 minutes 30 seconds.  So essentially, we have saved 80W of power for each set of 4 servers and still get the same amount of work completed!  In a large datacenter this can be HUGE in energy savings.

 

comparative-workload.jpg

Let's do some quick math:  20W power savings per serer x 10,000 servers = 20kW power savings and you still get the work done.  I hope I just helped some of you server admins get some new ideas on your next "I need a raise" talk with your manager

 

*your mileage may vary, so test your own workloads and report out!

0 Comments Permalink
0

With the Intel Xeon 5500 series (Nehalem) based processors, the X5500 chipset and instrumented power supplies, you can start with the most basic use case for Intel Node Manager - monitoring the power usage of your servers.

 

As you can see in the Intel Datacenter Manager (DCM) screen below - there are multiple servers configured into logical units:  HF2-EIL is the lab that these servers are located in.  Rack 1 and Rack 2 are the physical location of these servers, and each Rack contains 2 servers each.

 

epiitpoctbg01-workload-5.5min.JPG

When you highlight one server (as above in DCM)- you can see the power characteristics over a certain time period.  The time period shown gives you the idle power, max power, and thermal measurement.  The 'hump' in the graph is a SQL workload which creates 'work' for the server and the process runs for about 5 1/2 minutes with no power capping.

 

Here's a graph of the 2nd server in that rack, performing a similar workload.  As you can see, the 2nd server power usage is different than the first.

epiitpoctbg02-workload-5.5min.JPG

 

The Intel Datacenter Manager SDK console can monitor multiple systems as well.  The next graph, is both of those servers in the rack, which accounts for both servers power usage during the same timeframe.

1-rack-workload-5.5min.JPG

Finally, here is the final graph, showing the accumulation of all 4 servers, in both Rack #1 and Rack #2.  This shows the maximum power utilized during the workload, the minimum power (idle) and the inlet thermal temperature in the lab.  Something that hasn't been able to be done before without expensive equipment in the datacenter.

 

2-racks-workload-5.5min.JPG

 

My next power based blog will show how power-capping can give you more effience use of your workload power while using Xeon 5500 series platforms.

0 Comments Permalink
1

The ecosystem is growing...

 

Sean Maloney's keynote presentation at IDF 2009 highlighted Intel Node Manager.  This is the video from his keynote which shows customers from Baidu, BMW, Oracle, and Telefonica, who have been working with Intel on Intel Intelligent Power Node Manager.

 

 

Check out the final slide showcasing the OEM/ODM/Console providers and customers using Intel Intelligent Power Node Manager.

1 Comments Permalink
0

http://www.intel.com/sites/sitewide/pix/badges/xeon/xeon09_62_trans.gif I'm always looking for good ways to describe to end-users what Intel Intelligent Power Node Manager can relate to everyday activities.  Over the weekend, I was helping a buddy of mine move to a new home, and of course we rented a truck.  While we were driving, we noticed a cool gauge on the dash and a pretty simple sticker describing what it does:

keep-it-green.jpg

 

Keep it in the Green - what a simple concept!  Most everyone can relate to the gas pedal in your vehicle directly with gas mileage. If you have a lead-foot, you burn more gas.  But people who want to conserve, and keep it green - use cruise control.

 

Well, Intel servers can also be managed to optimize the energy consumed by the platform.  Power Optimzed servers using X5500 Series Processors (Nehalem) and the X5500 chipset in conjunction with Node Manager is like cruise-control - you set your "speed" and the servers keep that maximum speed.  It's all managed via P/T states using Intel Datacenter Manager.

 

Of course, at times the RED ZONE is needed - work needs to get DONE - so you throttle up, kick in the Turbo Boost and release that power cap!  But there are also times when all that energy isn't needed - so you lift your foot off the gas pedal, and set your speed for the work that needs to be done. Intel Xeon based servers can transition to higher/lower power states using technologies like EIST, DBS, and Node Manager.

 

Keep your eyes on the lookout for more data on Intel and server power management at the Intel Developer Forum 2009

 

Cloud Power Management with Intel® Microarchitecture (Nehalem) Processor-based Platforms

 

Check Twitter for more details @IDF and @IntelNews and search #IDF09

 

 

 

 

 

 

 

 

 

* disclaimer: giving credit where credit is due U-Haul owns that sticker and tagline!

0 Comments Permalink
0

In spite of significant gains in server energy efficiency, power consumption in data centers is still trending up.  At the very least, we can make sure that the energy expended yields maximum benefit to the business.  A first step in managing power in the servers in a data center is having a fairly accurate monitoring capability for power consumption.  The second step is to have a number of levers that allow using the monitoring data to carry out an effective power management policy.

 

While we may not be able to stem the overall growth of power consumption in the data center, there are a number of measures we can take immediately:

· Implement a peak shaving capability.  The data center power infrastructure needs to be sized to meet the demands of peak power.  Reducing peaks effectively increase the utilization of the existing power infrastructure.

· Be smart about shifting power consumption peaks. All the watts are not created equal.  The incremental cost of generating an extra watt of power during peak consumption hours is much higher than the same watt generated in the wee hours of the morning.  For most consumer and the smaller commercial accounts flat rate pricing still prevails.  Real time pricing (RTP) and negotiated SLAs will become more common to put the appropriate economic incentives in place.  The incentive of real time pricing is a lower energy bill overall, although the outcome is not guaranteed.  In pilot programs residential consumers have complained that RTP result in higher electricity costs.  With negotiated SLAs the customer can designate a workload to be subject to lower reliability; for instance, instead of 3 9’s, or outages amounting to about 10 hours per year, the low reliability workload can be designated as only 90 percent reliable, and can be out on the average of two hours per day.

· Match the electric power infrastructure in the data center to server workloads to minimize over-provisioning.  This approach assumes the existence of an accurate power consumption monitoring capability.

· Upgrading the electrical power infrastructure to accommodate additional servers is not an option in most data centers today.  Landing additional servers at a facility that's working at the limit of thermal capacity leads to the formation of hot spots, this assuming that electrical capacity limits are not reached first with no room left in certain branch circuits.  Hence measures that work under the existing power infrastructure are to be preferred over alternatives that require additional infrastructure.

 

 

For the purposes data center strategic planning it may make economic sense to grow large data centers in a modular fashion.  If the organization manages a number of data centers, consider making effective use of the existing data centers, and when new construction is justified, redistribute the workloads to the new data center to maximize the use of the new electrical supply infrastructure.

 

Intel has built into its server processor lineup a number of technology ingredients that allow data center operators optimize the utilization of the available power system infrastructure in the data center.

 

 

Newer servers of the Nehalem generation are much more energy efficient, if only because of the side effect of increased performance per watt.  These servers also have a more aggressive implementation of power proportional computing.  Typical idle consumption figures are in the order of 50 percent of peak power consumption.

 

 

Beyond passive mechanisms that do not require explicit operator intervention, the Intel® Intelligent Power Node Manager (Node Manager) technology allows adjusting the power draw of a server and trade off power consumption against performance.  This capability is also known as power capping.  The control range is a function of server loading.  For the Intel SR5520UR baseboard on the 2U chassis, the server will draw about 300 watts at full load and its power consumption can be rolled down to about 200 watts.  The control range tapers down gradually until it reaches zero at idle.

 

 

For power monitoring, selected models of the current Nehalem generation come with PMBus specification compliant power supplies allowing real-time power consumption readouts.

 

 

The Node Manager power monitoring and capping capability apply to a single server.  To make this capability really useful it is necessary to exercise these capabilities collectively to groups of servers, to add the notion of events and a capability to build a historical record of power consumption for the servers in a group.  The additional capabilities have been implemented in software through the Data Center Manager Software Development Kit developed by the Intel Solutions and Software Group.  An additional Software Development Kit, Cache River allows programming access to components in servers and server building blocks produced by the Intel Enterprise Products Server Division (EPSD), including the baseboard management controller (BMC) and the management engine (ME), the subsystems that host or interact with the Node Management firmware.  EPSD products are incorporated in many OEM and system integrator offerings.

 

Data Center Manager implements abstractions that apply to collections of servers:

·  A hierarchical notion of logical server groups

·  Power management policies bound to specific server groups

·  Event management and a publish/subscribe facility for acting upon and managing power and thermal events.

·  A database for logging a historical record for power consumption on the collection of managed nodes.

 

 

The abstractions implemented by DCM on top of Node Manager allow the implementation of power management use cases that involve up to thousands of servers.

 

If this topic is of interest to you, please join us at the Intel Development Forum in San Francisco at the Moscone Center on September 22-24.  I will be facilitating course PDCS003, "Cloud Power Management with the Intel(r) Xeon(r) 5500 Series Platform."  You will be the opportunity to talk with some of our fellow travelers in the process of developing power management solutions using Intel technology ingredients and get a feel of their early experience.  Also please make a note to visit booths #515, #710 and #712 to see demonstrations of early end-to-end solutions these folks have put together.

0 Comments Permalink
2

http://www.intel.com/sites/sitewide/pix/badges/xeon/xeon09_62_trans.gifIntel Intelligent Power Node Manager is a new technology that is available with the Xeon 5500 Series Platforms released earlier this year.  Many of you have asked me questions via Twitter (@Toadster) about "How can I use Node Manager?" - so I wanted to present some simple use cases to simplify the explanation of Node Manager and how you can best use the technology in your own enterprise.

 

First of all, let's explain the growth problem at hand.  As servers shrink in size, the density of each server 'footprint' is growing from a power perspective... a few years ago, a single 42U rack could hold about 21 servers (estimating 2U servers) - and usually hosting one or two apps/servers per physical server, depending on if you had single or dual-socket servers.  In modern datacenters, that same 42U rack can hold 42 servers (1U each) with 2P per server - so you have an immediate density increase of 2X the # of servers, and 2-4X the number of sockets - which can equate to 16X the number  processor threads per rack...  one good thing is that Intel has been developing newer technologies to keep the TDP of each CPU roughly the same over the same time period between processor updates... where you used to have 2 or 4 cores, you now have 8 to 16 cores at the same thermal envelope!

 

Knowing how much power your platform uses is a key factor in populating racks and rows in your datacenter.  Prior to Node Manager technology, most Datacenter Managers would base rack population on 'nameplate' power - or the (W) rating on your power supply.  That's the 'max' power utilized by the platform, and what the PSU is rated for (worst case).  See the image below...

 

NM Use Case - Using Actual Power Data to Increase Rack Density.jpg

As you can see - using Intel Intelligent Power Node Manager technology, you can view your system's power utilization in real-time using Intel Datacenter Manager and the administrator can implement the power caps to ensure your server rack stays within your required power limits.  By utilizing the 'actual' power limits instead of nameplate power, you can increase your rack density thereby increasing your ROI, and decrease your TCO!  Lets face it - everyone loves saving money!

 

Many of us are familiar with this next scenario... it's summertime, and the power company is announcing that the power grid is under strain.  Personal homes start having their A/C cut-off to save the power grid from brown-outs...  now your enterprise can help reduce those risks as well!

 

NM Use Case - On-Demand Power Reduction.jpg

 

Over the next few weeks, I hope to post more blogs/videos:

 

1. Single Node Power Monitoring & Management
2. Group/Rack Power Monitoring & Management
3. Thermal Monitoring & Management

 

Please provide some feedback, and post your questions and ideas for upcoming blogs!

2 Comments Permalink
1

I would like to elaborate on the topic energy vs. power management in my previous entry.

 

   

 

Upgrading the electrical power infrastructure to accommodate additional servers is not an option in most data centers today.  Landing additional servers at a facility that's working at the limit of thermal capacity leads to the formation of hot spots, this assuming that electrical capacity limits are not reached first with no room left in certain branch circuits.

 

   

 

There are two types of potentially useful figures of merit, one for power management and one for energy management.  A metric for power management allows us to track operational "goodness", making sure that power draw never exceeds limits imposed by the infrastructure.  The second metric tracks power saved over time, which is energy saved.  Energy not consumed goes directly to the bottom line of the data center operator.

 

     

To understand the dynamic between power and energy management let's look at the graph below and imagine a server without any power management mechanisms whatsoever.  The power consumed by that server would be P(unmanaged) regardless of any operating condition.  Most servers today have a number of mechanisms operating concurrently, and hence the actual power consumed at any given time t is P(actual)(t).  The difference P(unmanaged) - P(actual) is the power saved.  The power saved carried over time t(1) through t(2) yields the energy saved.

 

 

 

EnergySavings.png

Please note that a mechanism that yields significant power savings may not necessarily yield high energy savings.  For instance, the application of Intel(r) Dynamic Power Node Manager (DPNM) can potentially bring power consumption by over 100 watts, from 300 watts at full load to 200 watts in a dual-socket 2U Nehalem server that we tested in our lab.  However, if DPNM is used as a guard rail mechanism, to limit power consumption if a certain threshold is violated, DPNM may never kick in, and hence energy savings will be zero for practical purposes.  The reason why we do this is because DPNM works best only under certain operating conditions, namely high loading factors, and because it works through frequency and voltage scaling, it brings a performance tradeoff.

 

   

 

Another useful figure of merit for power management is the dynamic range for power proportional computing.  Power consumption in servers today is a function of workload as depicted below:

 

PowerGraph.png

The relationship is not always linear, but the figure illustrates the concept.  On the x-axis  we have the workload that can range from 0 to 1, that is, 0 to 100 percent.  P(baseline) is the power consumption at idle, and P(spread) is the power proportional computing dynamic range between P(baseline) and power consumption at 100 percent workload.  A low P(baseline) is better because it means a low power consumption at idle.  For a Nehalem-based server, P(baseline) is roughly 50 percent of power consumption at full utilization, which is remarkable, considering that it represents a 20 percent over the number we observed for the prior generation, Bensley-based servers.  The 50 percent figure is a number we have observed in our lab for a whole server, not just the CPU alone.

 

   

 

If a 50 percent P(baseline) looks outstanding, we can do even better for certain application environments such as load-balanced front end Web server pools and the implementation of cloud services through clustered, virtualized servers.  We can achieve this effect through the application of platooning.  For instance, consider a pool of 16 servers.  If the pools is idle, all the servers except one can be put to sleep.  The single idle server is consuming only half the power of a fully loaded server, consuming one half of one sixteenth of the cluster power.  The dormant servers still draw about 2 percent of full power.  Hence, after doing the math, the total power consumption for the cluster at idle will be about 8 percent of the full cluster power consumption.  Hence for a clustered deployment, the power dynamic range has been increased from 2:1 for a single server to about 12:1 for the cluster as a whole.

 

   

 

In the figure below note that each platoon is defined by the application of a specific technology or state within each  technology.  This way it is possible to optimize the system behavior around the particular operational limitations of the technology.  The graph below is a generalization of the platooning graph in the prior article.  For instance, a power capped server will impose certain performance limitations to workloads, and hence we assign non time critical workloads to that platoon.  By definition, an idling server cannot have any workloads; the moment a workload lands on it it's no longer idle, and its power consumption will rise.

 

   

 

The CPU is not running in any of the S-states than S0.  The selection of a specific state depends on how fast that particular server is needed online.  It takes longer to bring up a server online in the lower energy states.  Servers in G3 may actually be unracked and put in storage for seasonal equipment allocation.

 

   

 

A virtualized environment makes it easier to rebalance workloads across active (unconstrained and power capped) servers.  If servers are being used as a CPU cycle engines, it may be sufficient to idle or put to sleep the subset of servers not needed.

 

PowerTransitions.png

 

The extra dynamic power range comes at the expense of instituting additional processes and operational complexity.  However, please note that there are immediate benefits in power and energy management accrued through a simple equipment refresh.  IBM reports an 11X performance gain for Nehalem-based HS22 blade servers versus the HS20 model only three years old.  Network World reports a similar figure, a ten-fold increase in performance, not just ten percent.

 

   

 

I will be elaborating on some of these ideas at the PDCS003 Cloud Power Management with the Intel(r) Nehalem Platform class at the upcoming Intel Developer Forum in San Francisco on the week of September 20th.  Please consider yourself invited to join me if you are planning to attend this conference.

1 Comments Permalink
0

54 days to Fall IDF in SFO!  Perhaps I should be a bit less enthusiastic, as during the course of the next two months, I will be extremely busy working on courses, presentation, demos, web updates and new collateral pieces highlighting Intel’s contributions to server and data center instrumentation, data center efficiency and eco-technology.  In addition to those responsibilities, I have taken on ownership of driving a technology blogging program at IDF, with server technology experts sharing their insights here on Server Room – an opportunity that I am very excited about, but I need your help.

My question to you today is – what would you like to see covered in the technology blogs from IDF?  I am starting the process of recruiting “volunteers” to participate, and understanding what you want to see discussed will help me to get the right people to cover the topics that are compelling to you and hopefully facilitate an interesting dialog that will help you to better understand server technologies.  Since its easy to self-recruit, you will definitely see a blog from me covering instrumentation, Intel Intelligent Power Node Manager and other related technology news @ IDF.

So what do you specifically want to see covered in the IDF blogs?  I look forward to you inputs and hope to see you at IDF!


Dave

0 Comments Permalink
9

One of the recurring themes that I've been noticing from end-users who are testing or evaluating Intel Intelligent Power Node Manager (or Node Manager) - the question is "How do we turn it on or off?"  To put it simply - when you have a Node Manager capable platform - you can simply put it to work and let your power policies decide when to enable/disable the features...

 

So let me step things back a bit and talk about the technology itself first.  Node Manager is very much like any *T technology that Intel has deployed over the past several years, it's an ingredient - or in this scenario a mix of ingredients that is available at the platform level.  Here are the 'ingredients' that when combined, give you the ability to monitor/manage power, and in some cases monitor thermal events.

http://www.intel.com/sites/sitewide/pix/badges/xeon/xeon09_62_trans.gif

        • The platform is based on the Xeon 5500 Series Chipset (codename Tylersburg-EP) server board
        • Xeon 5500 Series Processors (codename Nehalem EP)
        • Node Manager Enabled Firmware with the Manageability Engine
        • Server chassis components that meet IPMI 2.0 specifications for monitoring (e.g. thermal monitoring)
        • PMBUS Power Supply - this communicates with the Baseboard Management Controller (BMC) for platform power usage

 

For those of you wanting to get your hands on this technology TODAY - check out the Intel Server linueup:

  • Intel® Server Board S5500WB (codename Willowbrook) which is optimized for IPDC deployment, and supports IPMI 2.0, Intel Intelligent Power Node Manager, and can also support the Data Center Manageability Interface (DCMI) 1.0 specification.
  • Intel® Server Board S5520UR (codename Urbanna) is the mainstream Enterprise platform which support IPMI 2.0 and Intel Intelligent Power Node Manager

 

Both platforms work in conjunction with Intel® Data Center Manager (Intel® DCM) which is the SDK which provides power and thermal monitoring and management.  This SDK allows group and policy based management for single server, rack, logical group, lab, or whole datacenter models.

 

Ok - so that reads like a bunch of marketing stuff... but here's the 'guts' of the technology...

nm-functionality.png

When you purchase a Node Manager enabled server, there are a few simple steps to take to set things up to monitor/manage your server.

 

Most likely you'll need to setup your BMC, Intel provides a CD based implementation to help with this in our servers - it's called the Intel Deployment Assistant.  This lightweight OS bootable CD can setup the most common BIOS settings, check versions of firmware and update them via Internet connection to ensure you have the latest BIOS, BMC, ME and Sensor firmware.  Each OEM will have their own methods but should be similar in function when it comes to setting up the server for monitoring.

 

The BMC needs an ip address, netmask, and default gateway setup - and according to IPMI specifications - you can also set the administrative (user) access rights if you would like to tighten down security a bit.  Once you have these access points setup - you can utilize standard IPMI commands to communicate with your server or use Intel DCM to really  'visualize' the capabiliites of Node Manager.

 

Here's a great demo video showcasing some of the Node Manager & Intel DCM use cases:

 

How many of you have worked with IPMI management before?

 

The technology that has been around for a while, but now Intel has put automation and policy based management features into the platform - thereby reducing costs, increasing responsiveness to power policies, and also making Xeon Servers more energy efficient than before.  Many of our customers are asking for Node Manager enabled servers - is your OEM on track to deliver?

9 Comments Permalink
2

There are two technologies available to regulate power consumption in the recently introduced Nehalem servers using the Intel® Xeon® processor 5500 series.  The first is power proportional computing where power consumption varies in proportion to the processor utilization.  The second is Intel® Dynamic Power Node Manager (DPNM) technology which allows the setting of a target power consumption when a CPU is under load.  The power capping range increases with processor workload.

 

An immediate benefit of the Intel® Dynamic Node Manager (DPNM) technology is the capability to balance and trade off power consumption against performance in deployed Intel Nehalem generation servers.  Nehalem servers have a more aggressive implementation of power proportional computing where idle power consumption can be as small as 50 percent of the power under full load, down from about 70 percent in the prior (Bensley) generation.  Furthermore, the observed power capping range under full load when DPNM is applied can be as large as 100 watts out for a two-socket Nehalem server with the Urbanna baseboard observed in the lab to draw about 300 watts under full load.  The actual numbers you will obtain depend on the server configuration: memory, number of installed hard drives and the number and type of processors.

   

Does this mean that it will be possible to cut the electricity bills by one third to one half using DPNM?  This is a bit optimistic.  A typical use case for DPNM is as a "guard rail".  It is possible to set a target not to exceed for the power consumption of a server as shown in the figure below.  The red line in the figure represents the guard rail.  The white line represents the actual power demand as function of time; the dotted line represents the power consumption that would have existed without power management.

 

PowerCap.png

 

Enforcing this power cap brings operational flexibility: it is possible to deploy more servers to fit a limited power budget to prevent breakers from tripping or to use less electricity during peak demand periods.

 

 

There is a semantic distinction between energy management and power management.  Power management in the context of servers deployed at a data center refers to a capability to regulate the power consumption at a given instant.  Energy management refers to the accumulated power saved over a period of time.

 

The energy saved through the application of DPNM is represented by the area between the dotted line and the white graph line below; the power consumed by the server is represent by the area under the solid white graph line.  Since power capping is in effect during relatively short periods, and when in effect the area between the dotted line and the guard rail is relatively small, it follows that the energy saved through the application of DPNM is small.

   

One mechanism for achieving significant energy savings calls for dividing a group of servers running an application into pools or "platoons".  If servers are placed in a sleeping state (ACPI S5 sleep) during periods of low utilization it is possible to bring their power consumption to less than 5 percent of their peak power consumption, basically just the power needed to keep the network interface controller (NIC) listening for a wakeup signal.

Platooning.png

As the workload diminishes, additional servers are moved into a sleeping state.  The process is reversible whereby servers are taken from the sleeping pool to an active state as workloads increase.  The number of pools can be adjusted depending on the application being run.  For instance, it is possible to define a third, intermediate pool of power capped servers to run lower priority workloads.  Capped servers will run slightly slower, depending on the type of workload. 

 

Implementing this scheme can be logistically complex.  Running the application in a virtualized environment can make it considerably easier because workloads in low use machines can be migrated and consolidated in the remaining machines.

We are conducting experiments to ***** the potential for energy savings.  Initial results indicate that these savings can be significant.  If you, dear reader have been working in this space, I'd be more than interested in learning about your experience.

 

If this topic is of interest to you, please join us at the Intel Development Forum in San Francisco at the Moscone Center on September 22-24.  I will be facilitating course PDCS003, "Cloud Power Management with the Intel(r) Xeon(r) 5500 Series Platform."  You will be the opportunity to talk with some of our fellow travelers in the process of developing power management solutions using Intel technology ingredients and get a feel of their early experience.  Also please make a note to visit booths #515, #710 and #712 to see demonstrations of early end-to-end solutions these folks have put together.

2 Comments Permalink
2

Recently in our test lab, we experienced a cooling failure... and I wasn't even sitting in the lab to realize it.  In fact, I wasn't in the same state!

 

With the recent launch of the Xeon 5500 Series servers - I have been testing some use-cases against four of our servers in our lab when I noticed that the temperature was rising pretty drastically in there.  How did I see this?  Using Intel® Intelligent Power Node Manager embeddd in our Xeon Servers and using our Intel Data Center Manager (DCM) SDK software interface - the data is presented in a visual format.

thermal trip.JPG

In the graph above, the dark colored line is the "front panel inlet" temperature, and in a matter of minutes, the temperature in the lab rose from 71F to 87F - 16 degrees!  What I didn't have setup is the scenario is a power policy that activates on a thermal trip.  Here is how you would setup this policy in Data Center Manager under the Policies section for this rack:

 

thermal-policy.JPG

In the event that a thermal event occurred that would cause the room to heat up to 78F (as shown above) - Intel DCM would send the IPMI commands to the platform which in turn would tell the Node Manager firmware to throttle-back the Xeon CPUs to their lowest P-state possible.  This would reduce energy consumed across the systems in the policy group as well as reduce the thermal output of each server.  This would in turn generate less heat across the servers thereby reducing the load placed on an already overheated lab or datacenter.

 

This gives the server managers more time to gracefully shutdown systems, and/or move the workloads to cooler sections of the datacenter.  If you have ever experienced a cooling failure in the datacenter, it's a usually a frenzy to shutdown machines to minimize heat and/or power utilization overall.  This thermal policy can give you more time before you reach a critical temperature where you start losing components, servers and ultimately - loss of data and productivity.

 

Using standard the standard IPMI interface, the Data Center Manager SDK and Node Manager on the Xeon 5500 series platform enable power monitoring, power management, and front panel inlet monitoring.   This gives a server/datacenter manager the capcity to measure power usage per server, where you'd have to previously have more expensive power measurement tools.  External power meters cost anywhere from a cheap $15 to spendy $1000 - but now the technology is embedded into the firmware on the machine.

 

You can learn more about the Xeon 5500 Series Processors on the Intel Xeon website.

2 Comments Permalink
0

The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

 

Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

 

 

The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

     

This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

 

The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

 

DCM-GUI.png

 

The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

     

There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

     

Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

     

The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

 

The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

 

 

 

Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

Combined Power.png

The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

0 Comments Permalink
1

http://ark.intel.com/inc/images/diagrams/diagram-17.gifBack in the ‘dot-com’ days – many companies would build datacenters across the globe with one thing in mind – performance – and costs weren’t an issue.  It was all about getting the job done, with little concern about the costs.  Well, times have changed and companies have become more energy conscious, not only to become better stewards in using natural resources, but consumers are looking for companies who can design and develop products that can meet their own ‘green energy’ power needs.  It’s not as important anymore to make or build something to be the ‘best of class’ it also has to be ‘efficient’ while being the best. 

 

Corporate initiatives to reduce power but still “KTBR” (Keep the Business Running) are imperative to sustaining business today.  Not only do you need the best performing servers – but they need to be efficient at what is done.  Most of us would agree to cut overhead costs with energy efficiency versus headcount cuts.  It’s better for the environment, better for the business, and benefits everyone.

 

Enterprise companies have been ‘going green’ for a while now.  Initiatives like Climate Savers, LessWatts.org and others have been pushing the technical envelope on how to reduce power usage for businesses large and small.  Intel has a large hand in contributing to the conservative ecology in power usage.  People need computers, computers need power, and power is used – but can the power be reduced and still give the same experience?  With the Xeon 5500 Series platforms – the answer is a resounding - YES!

 

You’ve most likely read about the performance stats around Intel Xeon 5500 (Nehalem) Processors, but I want to show you how their efficiency can give back to your enterprise. Not only do the Xeon 5500 Series give you power and efficiency – there is more technology ‘under the hood’ to be looked at.  Intel Intelligent Power Node Manager has been released in the Xeon 5520 and 5500 Chipsets (previously called Tylersburg-EP). 

 

There are several scenarios we can go through concerning server workloads – but let’s take a real world example of Company “X” (I can’t tell you who right now) but their workloads have been very stable and growing over the past few years.  One of the issues is that their servers are heavily worked during the beginning of the week, pushing the server farm at 85-90% utilization.  The work is reduced gradually over the week and by Friday; the server utilization is around 10-20%. 

pre-x5500.JPG

As you can see in the chart above, the server farm would start out the week with 85-90% load and the servers would run at full power the entire week. This would burn energy at 90% cost, even though the workload had died off toward the end of the week to about 15% - not very efficient.  It’s like leaving the stove on all day long, and only using it for a few minutes when you cooked a meal.

 

Once we brought in the Xeon 5500 based systems, we also enabled ACPI power management which is much more pronounced with the Xeon 5500 because of the increased number of P-states.  We are able to add power capping using Node Manager to limit the power usage by the racks to meet the daily requirements of the customer workloads.  This helped us to have the power available when needed, and reduce the power when the workloads aren’t as power hungry. 

post-x5500.JPG

Another key feature that we’re going after is to increase the server density using Node Manager and Intel Datacenter Manager by measuring the power usage and capping the maximum power utilized by the entire rack.  The benefit is that with Intel Intelligent Power Node Manager, the data comes in real-time and we can modify the power curves on a regular basis through the server console. 

 

Like many customers, power savings can have a large impact in your overhead costs.  If this sounds like a solution that your company could benefit from, then definitely ask your favorite OEM when their Node Manager enabled Xeon 5500 series platform will be available.  Intel Server Products are available today and ready for your datacenter – the ROI is estimated to be a short 8 months, so a little green goes a long way.









1 Comments Permalink

Filter Blog

By author: By date: By tag: