One of the first questions in my mind when I was first exposed to Intel(r) Intelligent Power Manager (Node Manager) was "what is the performance impact of applying Node Manager technology?" I will share some thoughts. The underlying dynamics are complex and not always observable and hence it's difficult to provide a definitive answer. Robert A. Heinlein popularized the term TANSTAAFL ("There ain't no such thing as a free lunch") in his 1966 novel “The Moon is a Hard Mistress”. So, does TANSTAAFL apply here? Node Manager brings benefits with the ability for the application to designate a target power consumption, a capability otherwise known as power capping. On the cost side, Node Manager takes some work to deploy, and has performance impact that varies from very little to moderate. On the other hand, Node Manager can be turned off, in which case there is no overhead.
Node Manager is useful even when it is not actively power capping but is used as a guardrail, ensuring that power consumption will not exceed a threshold. The predictable power consumption has value because it provides data center operators a ceiling in power consumption. Having this predictable ceiling helps optimize the data center infrastructure and reduce stranded power. Stranded power refers to a power allocation that needs to be there even if it's only for occasional use.
The performance impact can vary from zero when Node Manager is used as a guardrail to a percentage equal to the number of CPU cycles lost due to power capping when Node Manager is applied at 100% utilization. When applied during normal operating conditions, the loss of performance is smaller than the number of cycles lost to power capping implies because the OS usually compensates for the slowdown. If the end user is willing to re-prioritize application processes, under some circumstances it is possible to bring performance back to the uncapped level or even beyond.
Power capping is attained through voltage and frequency scaling. Power consumed by a CPU is proportional to frequency and to the square of the voltage applied to the CPU. This is done in discrete steps (“P-states” as defined by the ACPI standard.
The highest performing P-states are also the most energetic. Starting from a fully loaded CPU and the highest P state, the DBS assigns lower energy P-states as workload is reduced utilizing the Intel(r) SpeedStep technology. An additional dip takes place as idle is reached as unused logical units in the CPU are switched off automatically.
Node Manager allows manipulating the P-states under program control instead of autonomously as under SpeedStep. Since the CPU is running slower, this has the effect of potentially removing some of the cycles that otherwise could be used by applications, but reality is more nuanced.
At high workloads, most CPU cycles are dedicated to running the application. Hence, if power capping is applied, a reduction in CPU speed will yield and almost one-to-one reduction in application performance.
At the other end of the curve, if the CPU is idling and power consumption is already at the floor level. An application of Node Manager will not yield any additional power consumption reduction.
The more interesting cases take place in the mid-range band of utilization, when the utilization rate is between 10 and 60 percent, depending on the application (40 to 80 percent in the BMW case studybelow.) Taking utilization beyond the upper limit is not desirable because the system would have difficulty in taking up load spikes and hence response times may deteriorate to unacceptable levels.
We have run a number of applications in the lab and observed their performance behavior under Node Manager. Surprisingly, the performance loss is less than frequency scaling would indicate. One possible explanation is that when utilization is in the mid-range, there are idle cycles available. The OS compensates to some extent for the slower cycles by increasing the time slices to the applications, using up otherwise idle cycles, to the point that the apparent performance of the application is little changed. The application may need to be throttled up to re-gain the pre-capping throughput.
One way to verify this behavior is to observe that CPU utilization has indeed gone up in a power capped regime. BMW conducted a proof of concept with Intel precisely to explore the boundaries of the extent to which that application could be re-prioritized under power capping to restore the original, uncapped throughput. TANSTAAFL still applies here. The application is still yielding the same performance under power capping. However, since there are fewer cycles available due to frequency scaling, there will be less headroom should the workload pick up suddenly. In this case the remedy is simply to remove the cap. The management software needs to be aware of these circumstances and initiate the appropriate action.
The experiments in this proof of concept involved an application mix used at a BMW site. In the first series of experiments we plotted power consumption against CPU utilization by throttling the workload up and down, shown in red.
In the second series, shown in green, for each dot in the original curve we apply an initial power cap. This yields a performance reduction. The workload is throttled up until the uncapped performance is restored. This process is repeated with increasingly aggressive power policy caps until the original performance cannot be reached. The new system power consumption without impacting system performance is shown plotted in green. The difference between the red and green curves represents the range of capping applicable while maintaining the original throughput level. The execution and running at the green level yields the same uncapped system performance. However, since idle cycles have been removed, there is no margin left to pick up extra workload. Should it happen, performance indicators will deteriorate very quickly.
Under the circumstances described above, the system was able to deliver the same throughput at a lower power level. There was no compromise in performance. The tradeoff is in the form of diminished headroom in case the workload picks up. The system operator or management software have the option to remove this cap immediately should this headroom be needed.