The Server Room Blog

5 Posts tagged with the performance_tuning tag
2

As part of the Sun Microsystems and Intel alliance, the two companies have collaborated to bring open source Threading Building Blocks (TBB) support to the Solaris Operating System (OS) and Sun Studio software toolchain. Check out the SUN Blog for additional information. Click the video below for a short interview with Deepanker Bairagi, Principal Engineer for the Sun Studio.

Software parallelism can unleash the processing power that the newer multi-core architectures provide, including the Quad-Core Intel® Xeon® processors. For developers, multithreading offers a software parallelism model, but many existing solutions require a lot of low-level coding. Threading Building Blocks offers a rich approach to expressing parallelism in a C++ program by offering higher-level, task-based parallelism that abstracts platform details and threading mechanism for performance and scalability.

The Solaris OS is able to take advantage of multicore architectures, including the Intel Architecture, with features such as a lightweight processes (LWPs), load-balancing across cores, and processor affinities. Sun Studio software offers a complete integrated toolchain for Solaris and Linux platforms, including parallelizing compilers, performance and thread analysis tools, memory and code debuggers, NetBeans-based Integrated Development Environment, and more.

Combined with Threading Building Blocks, developers for the Solaris platform now have a fully loaded toolbox that simplifies the development of optimized multithreaded applications for multi-core Intel processors. Click here to learn more about Threading Building Blocks and optimizing performance for multi-core processors.

Would like to hear from the community on how you see this impacting the next generation of software development for Solaris running on Intel Architecture.

2 Comments Permalink
1


Here's the 4th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the fourth habit: Know Your BIOS.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1357/IMG_2318-noExif.jpg

My last blog talked about beginning your system tuning by consulting a block diagram. The other thing you should always look at is your system's BIOS. Many server BIOSes these days allow you to configure options that affect performance. Like everything in the performance world, which set of BIOS options will be best will depend on your workload!

First things first, how do you find this "BIOS"? Most servers have a menu called "Setup" (or something similar) that you can access while the system is booting, before it starts loading the operating system. This "Setup" menu allows you to access your system's BIOS. Changes that you make here will affect how the operating system can utilize your hardware, and in some cases how the hardware works. If you change something here, you usually have to reboot and then the change will "stick" through all future reboots (until you change it again). As platforms grow increasingly sophisticated, they are offering a widening array of user-configurable options in Setup. So a good practice is to examine all the menu options available whenever you get a new platform. Here are some of the most common options on Intel platforms that could affect performance:

  • Power Management - Intel's power management technology is designed to deliver lower power at idle and better performance/watt (+without significantly lowering overall performance+) in most circumstances. There are 2 types - P-States, which attempt to manage power while the processor is active, and C-States which work while the processor is idle. In some BIOSes, both of these features are combined into one option which you should enable. In other cases they are separated. If they are separate, here's what to look for:
    • Intel EIST (or "Enhanced Intel Speedstep" or "Intel Speedstep" or "GV3" on older platforms) - This is the P-State power management that works while the processor is active. Leave it enabled unless directed to change it by an Intel representative.
    • Intel C-States - If you have this option or something similar, it is referring to the power management used when the processor is idle. Enable all C-States unless directed by an Intel representative.
  • Hardware Prefetch or Adjacent Sector Prefetch - These options try to lower overall latencies in your platform by bringing data into the caches from memory before it is needed (so the application does not have to wait for the data to be read). In many situations the prefetchers increase performance, but there are some cases where they may not. If you don't have time to test these options, then go with the default. Intel tests the prefetch options on a variety of server workloads with each new processor and makes a recommendation to our platform partners on how they should be set. If, however, you are tuning and you have the time to experiment, try measuring performance using each of the prefetch setting combinations.

There are several other options that might affect performance on specific platforms. Some examples might be a snoop filter enable/disable switch, a setting to emphasize either bandwidth or latency for memory transactions, or a setting to enable or disable multi-threading. In these cases, if you don't have time to test, use your Intel or OEM representative's suggestion or go with the default setting.

Being familiar with how your system's BIOS is configured is another basic component of system tuning.

Keep watching The Server Room for information on the other 6 habits in the coming weeks.

1 Comments Permalink
3

Here's the 3rd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the third habit: Know Your Platform.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1247/IMG_2376-edit-x350-noExif.jpg

As we learned in my last blog, we should start our server performance tuning by looking for system-level bottlenecks. This involves understanding exactly how data flows into and out of your platform - and to do this, you need a block diagram. A block diagram shows the major components on the server's motherboard and the paths between them. From a good block diagram you can derive the maximum data transfer rate (aka bandwidth or throughput) achievable as data flows along those paths.

I usually look at my block diagram before beginning system tuning in order to identify potential bottlenecks. But some people use them in parallel: they measure the bandwidth of various parts of the system and then confirm what they see using the block diagram. You can determine if various parts of your system are heavily stressed, bottlenecked, or lightly utilized. In general you want to trace the path from where data enters your server (NIC, HBA, etc) up to the processor and back to memory or out of the server. The paths connecting one component to another are commonly known as buses. For each bus, multiply the speed by the width to determine the maximum potential bandwidth.

Let's use the block diagram for the Intel S5400SF server board as an example. It has 2 FSBs, each capable of 1333 or 1600 Mega-Transfers/second (MT/s). Each transfer on the FSB is 64 bits (8 bytes), so 8 bytes * 1,600,000,000 transfers gives a maximum theoretical bandwidth of 12.8GB/s per FSB segment. Keep in mind though that in reality a bus will not achieve its theoretical maximum bandwidth - depending on the type of bus it will probably realize 66-80% of the possible throughput.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1246/block_diagram.JPG

So, where do you find these diagrams? If you are using an Intel server platform, the block diagrams can usually be found in the technical product specification for each board. If you purchase a platform from one of our OEM partners, ask your salesperson where to get it.

Look at the maximum bandwidth achievable on each link your data will travel over to gain a deeper understanding of how your workload will run on your platform.

Keep watching The Server Room for information on the other 7 habits in the coming weeks.

3 Comments 0 References Permalink
0

Here's the 2nd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the second habit: Start at the top.

Let me start by relating a true (although simplified) story. My team at Intel has built up years of expertise running a particular benchmark. So when the time came to start running a new, similar benchmark, we thought: "No problem." We began running tests while the benchmark was still in development. Immediately we had an issue: the type of problem that would normally indicate our hardware environment wasn't set up properly. We checked everything that we had seen cause the issue in the past, and we couldn't find anything. So, we blamed the new benchmark. After all, we were experts and we had been setting up these environments for years! We knew what we were doing. You can probably guess where this story is going: after weeks of doing things to work around the "benchmark issue", we figured out that we had mis-configured the environment, resulting in a bottleneck on one part of our testbed. We didn't thoroughly test that part of the environment because it had never caused us problems with the old benchmark. And of course, on the new benchmark it was critical. We had broken one of the most important rules of performance tuning: Start at the Top.

So now you know how easy it can be to not Start at the Top. Even seasoned performance engineers can get overconfident and forget this rule. But the consequences can be dire:

  • 1. You have to eat major crow when you realize your mistake. I'm just now getting over the humiliation.
  • 2. You might have put tunings in place to address issues that weren't really there. This is at best wasted work and at worst something that you have to painstakingly undo when you fix the real issue.

So...how do you avoid this situation? Simple: use the Top-Down Performance Tuning process. This means you start by tuning your hardware. Then you move to the application/workload, then to the micro-architecture (if possible). What you are looking for at each level are bottlenecks: situations where one component of the environment or workload is limiting the performance of the whole system. Your goal is to find any system-level bottlenecks before you move down to the next level. For example, you may find that your network bandwidth is bottlenecked and you need to add another NIC to your server. Or that you need to add another drive to your RAID array, or that your CPU load is being distributed un-evenly. Any bottlenecks involving your server system hardware (processors, memory, network, HBAs, etc), attached clients, or attached storage is a system-level bottleneck. Find these by using system-level tools (which I will touch on in the future blog for Habit #8), remove them, then proceed to the application/workload level and repeat the process.


Being vigilant about using the top-down process will ensure you don't waste time tuning a non-representative system. And it just may save you some embarrassment!

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1225/IMG_2506-measureBottleneck-edit2-x250.jpg
Always measure your bottlenecks!

Keep watching The Server Room for information on the other 8 habits in the coming weeks.

0 Comments 0 References Permalink
2

I have been working as a full-time performance engineer at Intel for 6 years. I started by benchmarking server products for performance validation and now I focus on the TPC-C and TPC-E OLTP server benchmarks. I have used a variety of workloads in this job and spent time optimizing each level of the performance hierarchy: application, system, and processor. I, like many of you, have learned the "tricks of the trade" the hard way: by trial, error, and success. I'm sharing now, so you can all benefit from the things I've picked up along the way.

Let's start with some general methodologies to follow when tuning performance, whether you do it full-time, as a hobby, or just in your spare cycles after getting your "regular work" done. I will follow up with a more detailed post on each habit individually.


1. Ask the right question: Why are you tuning your platform? What level of performance are you hoping to achieve? What do you (or your users) care most about: raw performance, cost/performance, performance/watt, or something else?

2. Start at the top: The first and easiest part of your application server to tune is the hardware itself. Move on to the software and workload only after you feel confident that you have removed any system-level bottlenecks.

3. Know your Platform: This should be where you begin your system (hardware) tuning. The first thing, which I can't stress enough, is to get a block diagram of your platform. Then study it!

4. Know your BIOS: Server BIOSes these days come with more and more options. Be sure to give your new platform's BIOS a once-over. Pay particular attention to options relating to performance and power.

5. Know your Workload: To quantify performance, you need a workload! Some examples: web server response time, boot time, frames rendered per second, simultaneous connections supported, etc. Understand as much as possible about how the work gets done.

6. Try one thing at a time: Little changes that seem harmless can significantly alter the behavior of your system. Or worse, they can interact with each other to wreak havoc. Always try one change at a time, and for goodness' sake, do habit number 7.

7. Document and Archive: When you change something, log it! For each experiment you do, store your hardware and software configuration, performance level, and any collected data.

8. Use the right tool for the job: There are free data collection tools out there for various levels of the tuning process. System tuning tools include such as Performance Monitor for Windows or Sar for Linux. Application-level tools include Intel ® VTuneTM for both Windows and Linux.

9. Don't break the law: Amdahl's Law, that is. Amdahl's Law tells us the maximum amount of performance improvement we will get from a particular enhancement. Amdahl can help you set your expectations properly and clue you in to when you should be suspicious.

10. Compare apples to apples: Todd Christ reminds us of this habit in the last paragraph of this post. Don't compare the performance of mis-matched systems. If you must do it, know exactly what the differences are: the processor, memory type/speed/vendor, a software component, chipset, etc. Dig into the configuration details!

So now you have the high-level list! Stay tuned to The Server Room for more information about each habit in the coming weeks.

2 Comments Permalink