Skip navigation

The Data Stack

14 Posts authored by: ShannonCepeda


Test your technical knowledge of implementing parallelism with this quick quiz, or the previous ones here and here.


There are several guidelines that can help developers or system tuners to predict the benefits of parallelism. Which of the ones below is a fake?


A. Gustafson’s Law
B. the Karp-Flatt metric
C. Stevensen’s Corollary
D. Amdahl’s Law


As you probably know, software parallelism is a great way to take advantage of the cores available on Intel’s latest processors.  But achieving parallelism requires writing software differently.  Intel has many resources out there to help developers make the transition, including the Intel® Software Network Parallel Programming Community and the suite of developer tools called Intel® Parallel Studio.  Parallel Studio includes three products: Intel® Parallel Inspector, Parallel Composer, and Parallel Amplifier.


Which brings us back to the quiz.  In studying the theory of parallelism you would undoubtedly come across rules A, B, and D (C is the imposter). Even without being well-grounded in theory though, a product like Intel® Parallel Amplifier can help you to understand and performance tune the parallelism in your applications. Want to learn more about Parallel Studio? Now through July 12, you can learn about all 3 of the products in Parallel Studio for a chance to win a $500 Gift Card and 1 license for Parallel Studio. Try the Intel® 20 Questions contest! Start today and it’s not too late to win!

For the past several months we have been hard at work on a new training class for developers. This class teaches the main concepts of threading and scalability to C++ developers new to parallelism. If this means you, and you work in the Bay area, join us for the pilot class, which will be free. Seating is limited, so register early!


When: Friday, July 17, 2009
Where: Intel Santa Clara site, building SC12 lobby
Time: 9AM - 4PM, lunch provided


Our invite has more information including the content agenda and how to register.
Hope to see you there!

I have received a number of customer questions recently on Intel® Hyper-Threading Technology. Hyper-Threading Technology is available on the new Intel® Core™ i7 processor and the Xeon® 5500 series processors. Here are a few of my favorite questions and answers - ranging from the basics to more advanced topics.

What is it?

Intel® Hyper-Threading Technology is a performance feature on our new Intel® Core™ i7 processor and the Xeon® 5500 series processors. Put simply, it allows one core on the processor to appear like 2 cores to the operating system. This doubles the execution resources available to the O/S, which potentially increases the performance of your overall system. For the visually-oriented, you can view a graphical explanation of Intel® Hyper-Threading Technology by clicking on the demo here.

Talking about cores, threads, and Hyper-Threads can get a bit confusing. To make things simple for the rest of this blog, I'm going to call Hyper-Threads hardware threads, and O/S level threads software threads. Just as a refresher, a core is 1 CPU. Each Core™ i7 or Xeon® 5500 series processor shipping currently has 4 cores (we may offer other versions in the future).


How can I tell if my system is using Hyper-Threading Technology?
You must have a processor, chipset, operating system, and BIOS that all support the technology. Luckily, that is not much of a problem. Many of the desktop and server platforms that ship with Nehalem-based processors include this support. Most of these platforms will allow you to enable or disable Hyper-Threading Technology as a BIOS option (it should be enabled by default). You can view your CPU information using the Task Manager in Windows*, and /proc/cpuinfo in Linux*. If you have a supported platform and Hyper-Threading is enabled, you should see twice the number of CPUs as you have physical cores in your platform. For example, if you have a dual-processor Xeon® 5500 series server, you should see 16 CPUs. (16 hardware threads running on 8 physical cores, 2 threads per core.)

Available CPUs on the same platform with Hyper-Threading Technology disabled (top) and enabled (bottom).


Can I run 2 applications simultaneously on 2 different threads on the same core?
Yes. The 2 software threads running on a single core do not have to be threads of the same process. They could be (in the case of multi-threaded software), or they could be from 2 separate applications. Which 2 software threads would run on the 2 hardware threads of a Hyper-Threaded core would be up to the operating system. So, yes, you could have 2 different applications running on the same core at the same time. (Whether you would get equal performance in this scenario as you would with the 2 apps running on separate cores is a different issue – see question 6.)


Now that you know the basics, visit my article in the Intel Software Network knowledgebase to learn more. Get the answers to these 3 advanced questions on Intel® Hyper-Threading Technology:

How is it implemented, under the covers?
• Can I give one hardware thread priority or ensure that it doesn’t get “starved” for execution time?
• What kind of performance benefit will I get from using Intel® Hyper-Threading Technology?


What other questions do you have on the performance features of the new Nehalem-based processors?

Here’s the final follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the tenth habit: Compare apples to apples.  




Much of performance analysis involves comparisons: to baselines, to competitive systems, or to expectations. It is surprisingly easy to make an inappropriate comparison. I have seen it done many times and certainly been guilty of it myself. So the final habit to be aware of is to always compare apples to apples.


Make sure that the 2 systems or applications you are comparing are being run the same way, with the same configuration, under the same conditions. If there is a difference, understand (or at least hypothesize about) the impact of that difference on the performance. Dig into the details about experiments – for some ideas on what to look for, see habit 7.


You should always make this a habit – but it is especially important when you are making decisions based on the comparison. Double-check your work in this case!


This series has given you 10 of the habits I have learned in my years tuning server performance. Of course there are other tricks of the trade and BKMs, which I will try to cover in future blogs. But making these habits part of your routine will help make you a better, more consistent performance tuner. Good luck with your optimization projects!

Here’s the 9th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the ninth habit: Don’t Break the Law.



Amdahl’s Law, I mean. Amdahl’s Law tells you how much improvement you can reasonably expect from tuning a part of your system. It is often used in the context of software optimization and parallelization. Basically, what it says is that the potential improvement (speedup) you will get from applying an optimization with a speedup of X to a fraction F of your system is equal to 1/((1-F) + F/X). More generally, the speedup you will get depends on how much of your system the optimization affects as well as how good the optimization is.


For example, say you think you can speed up a function that takes 25% of your application’s execution time. If you can speed it up by 2x, then the potential speedup of your whole application, according to Amdahl’s Law, is 1/((1-.25) + .25/2) or a 1.14x speedup. Knowing something like this means you can evaluate which is more important: a 2x optimization affecting 25% of your code or a 4x optimization affecting 10%. (It’s the 25% one.)


Amdahl’s Law can also be used in other situations, such as estimating the potential speedup from parallelization or evaluating system tuning changes. It can be tricky in certain cases, such as when there are overlapping or dependent parts of the system. So use your intuition as well. However, in general using this law can help you to focus on making the common usage case faster for your system.


Once you have a good understanding of Amdahl’s Law, you may want to check out Gustafson’s Law and Little’s Law as well. All are commonly used in performance optimization. Being armed with the knowledge of these theoretical basics can help you to sniff out suspicious performance results or conclusions, as Neil J Gunther humorously wrote about here.


So stay out of trouble with the law (both the ones I mentioned and the legal kind!), and look for my post on the last habit next month.



Here’s the 8th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the eighth habit: Use the Right Tool for the Job.






There are many different reasons why people undertake performance analysis projects. You could be looking to fine-tune your compiler-generated assembly code for a particular CPU, trying to find I/O bottlenecks on a distributed server application, or trying to optimize power performance on virtual server, just to name a few. As I discussed in habit 2, there are also different levels where you can focus your investigation – mainly the system, application, and macro or micro-architecture levels.


It can be overwhelming thinking of all the different ways to collect and analyze data and trying to figure out which methods apply to your particular situation. Luckily there are tools out there to fill most needs. Here are some of the things you should consider when trying the find the tool(s) that are right for you.


  1. Environment – Many tools work only in specific environments. Think about your needs – are you going to be performing analysis in a Windows or Linux environment, or both? If you are analyzing a particular application, is it compiled code, Java*, or .NET* based? Is the application parallel? Are you running in a virtual environment?
  2. Layer – Will you be analyzing at the system, application, or micro-architecture level, or all 3 ? At the system level, you are focusing primarily on things external to the processor – disk drives, networks, memory, etc. At the application level you are normally focused on optimizing a particular application. At the micro-architecture level you are interested in tuning how code is executed on a particular processor’s pipeline. Each of these necessitates a different approach.
  3. Software/Hardware Focus – Finally consider whether you will mainly be tuning the software or the hardware (platform and peripherals) or both. If you plan to do code optimization, you will need a tool with a development focus.
  4. Sampling/Instrumentation - For software optimization tools in particular, there are 2 main methods used to collect data. Sampling tools periodically gather information from the O/S or the processor on particular events. Sampling tools generally have low overhead, meaning they don’t significantly increase the runtime of the application(s) being analyzed. Instrumentation tools add code to a binary in order to monitor things like function calls, time spent in particular routines, synchronization primitives used, objects accessed, etc. Instrumentation has a higher overhead, but can generally tell you more about the internals of your application.



After determining your specific needs, take a look at the tools out there (you might start with the lists available on wikipedia or at HP’s Multicore Toolkit.) Of course I recommend you also check out the Intel® Software Development Products. There are several specifically for performance analysis:


  • Intel® VTune™ Performance Analyzer works in both Windows* and Linux* environments and provides both sampling and an instrumented call graph. The sampling functionality can be used to perform analysis at all levels – system, application, and micro-architecture. It is multi-core aware, supports Java* and .NET* and also allows developers to identify hot functions and pin-point lines of code causing issues.
  • Intel® Thread Profiler is supported on Windows*. It is a developer-focused tool that uses instrumentation to profile a threaded application. It supports C, C++, and Fortran applications using native threading, OpenMP*, or Intel® Threading Building Blocks. Intel® Thread Profiler can show you concurrency information for your application and help you pinpoint the causes of thread-related overhead.
  • Intel® Parallel Amplifier Beta plugs into Microsoft* Visual Studio and allows C++ developers to analyze the performance of applications using native Windows* threads. It uses sampling and a low-overhead form of instrumentation to show you your applications hot functions, concurrency level, and synchronization issues.




Finding the right tool for your situation can greatly reduce frustration and the time needed to complete your project.  Good luck, and keep watching The Server Room for information on the last 2 habits in the coming months.



Here’s the 7th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the seventh habit: Document and Archive.



I hope the reason why you need to document and retain data for any performance project is understood, so I won’t go into it. Nor will I recommend particular documentation solutions – just find a database or filing solution you like that gets the job done. What I will do is list what needs to be documented.


Normally, performance tuning consists of iterating through experiments. So, for each experiment, it is important to document:


  • What changes were made – hopefully you weren’t trying too many things at once!
  • The purpose – why you tried this particular thing (including who requested it, if appropriate
  • General information – date & location of testing, person conducting the test
  • Hardware configuration:
    • Platform hardware and version, BIOS version, relevant BIOS option settings
    • CPU model used, number of physical processors, number of cores per processor, frequency, cache size information, whether Hyper-Threading was used (cpu-z can help document all this)
    • Memory configuration – number of DIMMs and capacity per DIMM, model number of DIMMs used
    • I/O interfaces – model number of all add-in cards, slot number for all add-in cards, driver version for all devices (on Windows*, msinfo can help with this, on Linux*, lspci)
    • Any other relevant hardware information, such as NIC settings, external storage configuration, external clients used, etc if it affects your workload
  • Software configuration:
    • Operating System used, version, and service pack/update information (use msinfo on Windows systems, uname on Linux systems)
    • Version information for all applications relevant to your workload
    • Compiler version and flags used to build your application (if you are doing software optimization)
    • Any other relevant software information, such as third-party libraries, O/S power utilization settings, pagefile size, etc if it affects your workload
  • Workload configuration:
    • Anything relevant to how your experiment/application was run, for example, your application’s startup flags, your virtualization configuration, benchmark information, etc
  • Results and data - naturally you would store all the above information along with the results and data that accompany your experiment


This blog entry is also the appropriate place to for me to mention the role of automation in your tuning efforts. If you are going to be doing a significant number of experiments, invest the energy needed to set up an automation infrastructure – a way to run your tests and collect the appropriate data without human attention. I included links to automated ways to gather the above data where appropriate.



Keep watching The Server Room for information on the other 3 habits in the coming months.

Here's the 6th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the sixth habit: Try 1 Thing at a Time.



Like habit 2, Start at the Top, this habit looks easy to understand and to keep. But, due to the constant desire for productivity, I and most others I know in the performance community have broken it many times. Some times I even get away with it. But trying to keep this habit is important, because when I don't get away with it, breaking this rule results in even more work than I was trying to save.



The concept behind this habit is simple - when you are optimizing your platform or your code, make only one change at a time. This allows you to measure the effect of each change, and only accumulate the positive changes (however small) into your workload. I have seen instances, for example, where 2 small changes applied at the same time to a workload cancelled each other out: one caused a small in performance and the other a small increase. If these changes weren't tested individually, we would have missed out on that performance gain.



Another thing that can happen in a complex workload is that two changes that seem independent can interact with each other. Like many developers know from fixing bugs, changing one thing may affect something else. Keeping all your changes separate can help you identify these interactions more easily.



You may be wondering when it is acceptable to break this habit. I think of performance methodology, and this rule in particular, as similar to the scientific method we learned in school. It's always good to follow it - doing so will help you quantify your successes and failures, stay organized, and defend your conclusions - but, you can still make a big breakthrough without it. In some cases, like when you are making small local changes to source code in completely different modules, or when you are changing two things you are certain won't interact, the habit can be broken. But the advice I give, especially to those involved in long-term optimization projects, is to follow it.



What has your experience been? Please share your "changing multiple things at one time" stories.



Keep watching The Server Room for information on the other 4 habits in the coming weeks.

Here's the 5th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the fifth habit: Know Your Workload.


Spend some time getting to know your workload.






The idea of a "workload" is integral to the concept of performance. The workload is the set of software and tests that you run on the server in order to measure its performance. Also part of the workload is the is concept of the "metric", which means, the number you will use to quantify performance. You should understand as much as you can about your workload in order to characterize and interpret your system's execution.



Let's look at the real-life example of a car's fuel economy. The EPA measures fuel economy using 2 workloads: city and highway. Each workload tests different aspects of the car's performance, and the metric used to quantify that performance is miles per gallon (MPG). Like the EPA's fuel economy test, a good workload for server performance tuning should have the following three characteristics:


  • Measurable - There is a quantifiable metric.

  • Reproducible - Measurements are repeatable and consistent.

  • Representative - The workload should be typical of normal operating conditions and should stress the parts of the system (including code) where performance is most critical.


Depending on the usage model for the server(s) you are tuning, some example appropriate workloads might be: loading websites , processing XML, encoding/decoding MP3s, responding to database queries, rendering frames, etc. Metrics could be time to run, number of users serviced, transactions processed per second, etc. If your metric is time, take special care that you are measuring it accurately.


After choosing or creating a suitable workload, spend some time getting to know it. Measure the variance between runs. Use O/S and processor-level tools (to be discussed in the blog for habit #8) to sample the workload's characteristics at various points during its execution.



One thing to remember about sampling is that you want to make your sample interval at least as long as the amount of time it takes to complete a unit of work in your workload. For example, suppose your workload is a stream of web page requests and you are measuring response time. If the longest response time you see is about 2 seconds, then you want to make sure you take samples over 2 seconds in length. It's best to use a multiple of your longest operation time, so 4 or 6 seconds in this case. This way you can be sure your samples include one complete operation in the workload. Then try to determine if the workload is stable - meaning, do the characteristics vary at different times during execution? (If so, you will need to sample more often to understand the workload or possibly split it into phases). Use the data to get an idea of your workload's CPU, memory, network, and I/O usage.



At the application level, become familiar with the software stack you will use. How is the workload generated (user, clients, test files, etc)? Understand the major operations that occur - what components of the O/S are needed? What device drivers are used? And finally, study the application(s). Know whether the application(s) being tested are single- or multi-threaded and as much as you can about the internals.



Choosing (or developing) an appropriate workload is necessary for correct performance measurement and tuning. Being as familiar as you can with the workload will help you to interpret your performance data and identify areas for optimization.



Keep watching The Server Room for information on the other 5 habits in the coming weeks.


Here's the 4th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the fourth habit: Know Your BIOS.




My last blog talked about beginning your system tuning by consulting a block diagram. The other thing you should always look at is your system's BIOS. Many server BIOSes these days allow you to configure options that affect performance. Like everything in the performance world, which set of BIOS options will be best will depend on your workload!



First things first, how do you find this "BIOS"? Most servers have a menu called "Setup" (or something similar) that you can access while the system is booting, before it starts loading the operating system. This "Setup" menu allows you to access your system's BIOS. Changes that you make here will affect how the operating system can utilize your hardware, and in some cases how the hardware works. If you change something here, you usually have to reboot and then the change will "stick" through all future reboots (until you change it again). As platforms grow increasingly sophisticated, they are offering a widening array of user-configurable options in Setup. So a good practice is to examine all the menu options available whenever you get a new platform. Here are some of the most common options on Intel platforms that could affect performance:



  • Power Management - Intel's power management technology is designed to deliver lower power at idle and better performance/watt (without significantly lowering overall performance) in most circumstances. There are 2 types - P-States, which attempt to manage power while the processor is active, and C-States which work while the processor is idle. In some BIOSes, both of these features are combined into one option which you should enable. In other cases they are separated. If they are separate, here's what to look for:

    • Intel EIST (or "Enhanced Intel Speedstep" or "Intel Speedstep" or "GV3" on older platforms) - This is the P-State power management that works while the processor is active. Leave it enabled unless directed to change it by an Intel representative.

    • Intel C-States - If you have this option or something similar, it is referring to the power management used when the processor is idle. Enable all C-States unless directed by an Intel representative.

  • Hardware Prefetch or Adjacent Sector Prefetch - These options try to lower overall latencies in your platform by bringing data into the caches from memory before it is needed (so the application does not have to wait for the data to be read). In many situations the prefetchers increase performance, but there are some cases where they may not. If you don't have time to test these options, then go with the default. Intel tests the prefetch options on a variety of server workloads with each new processor and makes a recommendation to our platform partners on how they should be set. If, however, you are tuning and you have the time to experiment, try measuring performance using each of the prefetch setting combinations.





There are several other options that might affect performance on specific platforms. Some examples might be a snoop filter enable/disable switch, a setting to emphasize either bandwidth or latency for memory transactions, or a setting to enable or disable multi-threading. In these cases, if you don't have time to test, use your Intel or OEM representative's suggestion or go with the default setting.



Being familiar with how your system's BIOS is configured is another basic component of system tuning.



Keep watching The Server Room for information on the other 6 habits in the coming weeks.


Here's the 3rd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the third habit: Know Your Platform.




As we learned in my last blog, we should start our server performance tuning by looking for system-level bottlenecks. This involves understanding exactly how data flows into and out of your platform - and to do this, you need a block diagram. A block diagram shows the major components on the server's motherboard and the paths between them. From a good block diagram you can derive the maximum data transfer rate (aka bandwidth or throughput) achievable as data flows along those paths.



I usually look at my block diagram before beginning system tuning in order to identify potential bottlenecks. But some people use them in parallel: they measure the bandwidth of various parts of the system and then confirm what they see using the block diagram. You can determine if various parts of your system are heavily stressed, bottlenecked, or lightly utilized. In general you want to trace the path from where data enters your server (NIC, HBA, etc) up to the processor and back to memory or out of the server. The paths connecting one component to another are commonly known as buses. For each bus, multiply the speed by the width to determine the maximum potential bandwidth.



Let's use the block diagram for the Intel S5400SF server board as an example. It has 2 FSBs, each capable of 1333 or 1600 Mega-Transfers/second (MT/s). Each transfer on the FSB is 64 bits (8 bytes), so 8 bytes * 1,600,000,000 transfers gives a maximum theoretical bandwidth of 12.8GB/s per FSB segment. Keep in mind though that in reality a bus will not achieve its theoretical maximum bandwidth - depending on the type of bus it will probably realize 66-80% of the possible throughput.





So, where do you find these diagrams? If you are using an Intel server platform, the block diagrams can usually be found in the technical product specification for each board. If you purchase a platform from one of our OEM partners, ask your salesperson where to get it.



Look at the maximum bandwidth achievable on each link your data will travel over to gain a deeper understanding of how your workload will run on your platform.



Keep watching The Server Room for information on the other 7 habits in the coming weeks.




Here's the 2nd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the second habit: Start at the top.


Let me start by relating a true (although simplified) story. My team at Intel has built up years of expertise running a particular benchmark. So when the time came to start running a new, similar benchmark, we thought: "No problem." We began running tests while the benchmark was still in development. Immediately we had an issue: the type of problem that would normally indicate our hardware environment wasn't set up properly. We checked everything that we had seen cause the issue in the past, and we couldn't find anything. So, we blamed the new benchmark. After all, we were experts and we had been setting up these environments for years! We knew what we were doing. You can probably guess where this story is going: after weeks of doing things to work around the "benchmark issue", we figured out that we had mis-configured the environment, resulting in a bottleneck on one part of our testbed. We didn't thoroughly test that part of the environment because it had never caused us problems with the old benchmark. And of course, on the new benchmark it was critical. We had broken one of the most important rules of performance tuning: Start at the Top.



So now you know how easy it can be to not Start at the Top. Even seasoned performance engineers can get overconfident and forget this rule. But the consequences can be dire:


  • 1. You have to eat major crow when you realize your mistake. I'm just now getting over the humiliation.

  • 2. You might have put tunings in place to address issues that weren't really there. This is at best wasted work and at worst something that you have to painstakingly undo when you fix the real issue. do you avoid this situation? Simple: use the Top-Down Performance Tuning process. This means you start by tuning your hardware. Then you move to the application/workload, then to the micro-architecture (if possible). What you are looking for at each level are bottlenecks: situations where one component of the environment or workload is limiting the performance of the whole system. Your goal is to find any system-level bottlenecks before you move down to the next level. For example, you may find that your network bandwidth is bottlenecked and you need to add another NIC to your server. Or that you need to add another drive to your RAID array, or that your CPU load is being distributed un-evenly. Any bottlenecks involving your server system hardware (processors, memory, network, HBAs, etc), attached clients, or attached storage is a system-level bottleneck. Find these by using system-level tools (which I will touch on in the future blog for Habit #8), remove them, then proceed to the application/workload level and repeat the process.




Being vigilant about using the top-down process will ensure you don't waste time tuning a non-representative system. And it just may save you some embarrassment!



Always measure your bottlenecks!



Keep watching The Server Room for information on the other 8 habits in the coming weeks.





As a follow-up to my first post on the 10 Habits of Great Server Performance Tuners, this post focuses on the first habit: Ask the Right Question.



6 years of performance work have taught me to start all my projects with this habit. Before I explain the kinds of questions I ask, let me demonstrate why this is important. Here are some example undesirable outcomes of performance tuning:




  • You spend months of experimentation trying to match a level of performance you saw reported in a case study on the internet, only to find out later that it used un-released software you can't get yet.

  • You spend months optimizing your server for raw performance. As part of your optimization you fully load it with the best available memory and adapters. Then you find out that your management/users would have been happier with a lower level of performance but a less costly system.

  • Your team works hard to maximize the performance of your application server for the current number of users you have, but makes decisions that will result in bottlenecks and re-designs when the number of users increases.


The outcome we are all hoping for with our tuning projects is that we provide the best level of performance possible within the budgetary, time, and TCO constraints we have. And of course, without sacrificing any other critical needs we'll have for our server, either now or in the future. Since performance optimization can take a lot of time and resources, consider the following questions before embarking on a project:


  • Why are you tuning your platform? (This helps you decide the amount of resources to dedicate.)

    • As part of this question, consider this one: How will the needs and usage models for this server change over the course of its life?

  • What level of performance are you hoping to achieve?

  • Are your expectations appropriate for the software and server system you are using?

    • In determining if your expectations are appropriate, refer to benchmarking results or case studies where appropriate and make sure any comparisons you make are apples to apples!

    • A corollary to this question is: is the server being used appropriate for the application being run?

  • What qualities of your platform are you trying to optimize: raw performance, cost/performance, energy efficiency (performance/watt), or something else?

  • Is performance your top priority for the system, or is scalability, extendibility, or something else a higher goal?


Thinking about the answers to these questions can help you navigate the trade-offs and tough decisions that are sure to pop up, and will help make your tuning project successful.


Keep watching The Server Room for information on the other 9 habits in the coming weeks.

I have been working as a full-time performance engineer at Intel for 6 years. I started by benchmarking server products for performance validation and now I focus on the TPC-C and TPC-E OLTP server benchmarks. I have used a variety of workloads in this job and spent time optimizing each level of the performance hierarchy: application, system, and processor. I, like many of you, have learned the "tricks of the trade" the hard way: by trial, error, and success. I'm sharing now, so you can all benefit from the things I've picked up along the way.


Let's start with some general methodologies to follow when tuning performance, whether you do it full-time, as a hobby, or just in your spare cycles after getting your "regular work" done. I will follow up with a more detailed post on each habit individually.






1. Ask the right question: Why are you tuning your platform? What level of performance are you hoping to achieve? What do you (or your users) care most about: raw performance, cost/performance, performance/watt, or something else?



2. Start at the top: The first and easiest part of your application server to tune is the hardware itself. Move on to the software and workload only after you feel confident that you have removed any system-level bottlenecks.



3. Know your Platform: This should be where you begin your system (hardware) tuning. The first thing, which I can't stress enough, is to get a block diagram of your platform. Then study it!



4. Know your BIOS: Server BIOSes these days come with more and more options. Be sure to give your new platform's BIOS a once-over. Pay particular attention to options relating to performance and power.



5. Know your Workload: To quantify performance, you need a workload! Some examples: web server response time, boot time, frames rendered per second, simultaneous connections supported, etc. Understand as much as possible about how the work gets done.



6. Try one thing at a time: Little changes that seem harmless can significantly alter the behavior of your system. Or worse, they can interact with each other to wreak havoc. Always try one change at a time, and for goodness' sake, do habit number 7.



7. Document and Archive: When you change something, log it! For each experiment you do, store your hardware and software configuration, performance level, and any collected data.



8. Use the right tool for the job: There are free data collection tools out there for various levels of the tuning process. System tuning tools include such as Performance Monitor for Windows or Sar for Linux. Application-level tools include Intel ® VTuneTM for both Windows and Linux.



9. Don't break the law: Amdahl's Law, that is. Amdahl's Law tells us the maximum amount of performance improvement we will get from a particular enhancement. Amdahl can help you set your expectations properly and clue you in to when you should be suspicious.



10. Compare apples to apples: Todd Christ reminds us of this habit in the last paragraph of this post. Don't compare the performance of mis-matched systems. If you must do it, know exactly what the differences are: the processor, memory type/speed/vendor, a software component, chipset, etc. Dig into the configuration details!



So now you have the high-level list! Stay tuned to The Server Room for more information about each habit in the coming weeks.

Filter Blog

By date: By tag: