The Server Room Blog

24 Posts tagged with the performance tag
1 2 Previous Next
2

Have you ever asked yourself that question when you are bombarded with marketing messages from multiple different companies on why choose their products vs. a competitors product?. As a non-Engineer in an engineer centric company, I certainly have thought about this several times and asked myself a very simple question - Why should I choose one architecture type over another offering?

I suppose the best place is to start at the beginning and try and decipher the acronym soup of RISC, x86 etc. I decided to use my ‘old friend’ Wikipedia http://www.wikipedia.org/ to help with this process. What I found was another alphabet soup that I could have researched for hours, but try and simplify it below. I attach my detailed definition findings at end of this blog.

Simply put, RISC (pronounced risk) is a CPU design to use simplified instructions to execute very fast thus providing higher performance. x86 is a generic term that refers to the instruction set of another CPU architecture. So basically both RISC and x86 are types of instruction sets linked to CPU architecture.

So which one should I choose?.
Call me old fashioned, but as a business guy, it always comes down to 3 basic tenets in terms of making a decision
1) I like choice and the ability to pick and choose between multiple suppliers to get the best deal to meet my needs.(and the ability to change supplier without major obstacles)
2) Performance is really important. The higher performance means that I get my work done quicker which reduces the overall cost / improves time to revenue and ultimately improves the productivity of my business
3) System cost and total cost of ownership are key decision points in today’s era which is vastly different from the ‘dot.com’ boom. It is all about managing the bottom line through good decisions around CAPEX and OPEX spending

I applied my decision criteria and quickly found out that there is not a lot of choice from a hardware and operating system perspective with RISC architecture. In fact it looks quite the opposite of choice which always concerns me, call me pro-choice if you like, but I like the ability to move around suppliers!. On the other hand I found x86 to have lots of choice with many hardware vendors to list and a range of operating systems from windows to Linux and Solaris.

Having choice out of the way, I then moved onto performance for my business and looked at published results from many hardware vendors on different websites like http://www.spec.org. what I found was that Intel based systems had a lot of leading results against architectures like SPARC from SUN or Fujitsu and POWER from IBM.

I then looked at price (and being an ex-Accountant in my past career) nearly jumped for joy when I saw that system prices were low for x86 systems compared to the comparable RISC systems.

This analysis helped me understand it better and helped simplify my decision making.

Here is a short video with a little bit more detail. I would be interested in your thoughts and have you had any similar experiences that you would like to share.

2 Comments Permalink
0

I have visited a number of customers recently. The discussions are usually straight forward where I provide them with a download of our current products, I tell them about things that we are doing in the future and along the way I ask them some questions about trends that they are seeing with their businesses. It will come as no surprise that enterprises are trying to keep up with their current requirements while also squeezing out increasingly flat or dwindling budgets to do something new. Many are turning to virtualization as a way to do more.

So who cares? CFO's care. I went out to visit a leading Fortune 500 company based on the West Coast of the US. Keep in mind I am planning to discuss our server platforms, why I believe they are leadership on performance and power and also all of the great new virtualization features we have recently introduced or will intro in the future. Before we get started they proudly walk me through their new datacenter and I stop in front of a rack that has two servers in it. Two 2U two processor servers. It is right next to another rack that has four servers in it. I inquire as to why both racks are only partially full and I receive a response that says one is owned by Finance, one is owned by a business unit. IT just manages them. You can look at this two ways. The glass half empty way would be that they are wasting an incredible amount of datacenter space and they are hopeless. The glass half full way would be that this is a great opportunity to really deliver value to this company's bottom line by first convincing them that physical consolidation (full up their racks) is important, then showing them a path toward application consolidation and finally sharing a vision of datacenter virtualization that includes compute, storage and networking. Their CFO will care.

IT employees care. One theme that seems to be coming through loud and clear is that people who drive some form of virtualization are usually considered as innovators or leading edge thinkers within their company. I have heard the term "IT Hero" to refer to someone who has delivered on a high ROI project, usually these days through the use of virtualization. I have met a number of IT folks at conferences and during visits and it is uncanny how many are trying to dig for more product information and how eager they are to hear about what new features we're putting into CPUs, chipsets, networking devices. A quick search of Youtube found this case study (here) that sums up the sorts of things I have heard.

It is also increasingly important that all of this stuff works well with the software, VMM and OS vendors product offerings. I know we are working closely with all of the ecosystem players because if we come out with an amazing new feature in our components it would be wasted if the VMM, OS or software didn't take advantage of it. There is some interesting banter here (here) about some of the pros and cons with virtualization. We are busy working on features that improve the performance and simplify the experience end users have when they virtualize. Why do you care about virtualization? What are you doing today that you couldn't do a year or two ago that has been made possible because of virtualization related technology?

0 Comments Permalink
2

As part of the Sun Microsystems and Intel alliance, the two companies have collaborated to bring open source Threading Building Blocks (TBB) support to the Solaris Operating System (OS) and Sun Studio software toolchain. Check out the SUN Blog for additional information. Click the video below for a short interview with Deepanker Bairagi, Principal Engineer for the Sun Studio.

Software parallelism can unleash the processing power that the newer multi-core architectures provide, including the Quad-Core Intel® Xeon® processors. For developers, multithreading offers a software parallelism model, but many existing solutions require a lot of low-level coding. Threading Building Blocks offers a rich approach to expressing parallelism in a C++ program by offering higher-level, task-based parallelism that abstracts platform details and threading mechanism for performance and scalability.

The Solaris OS is able to take advantage of multicore architectures, including the Intel Architecture, with features such as a lightweight processes (LWPs), load-balancing across cores, and processor affinities. Sun Studio software offers a complete integrated toolchain for Solaris and Linux platforms, including parallelizing compilers, performance and thread analysis tools, memory and code debuggers, NetBeans-based Integrated Development Environment, and more.

Combined with Threading Building Blocks, developers for the Solaris platform now have a fully loaded toolbox that simplifies the development of optimized multithreaded applications for multi-core Intel processors. Click here to learn more about Threading Building Blocks and optimizing performance for multi-core processors.

Would like to hear from the community on how you see this impacting the next generation of software development for Solaris running on Intel Architecture.

2 Comments Permalink
1

Hi all, I just found out about this new site, check it out here: http://www.intel.com/references/

1 Comments Permalink
0

45nm and Beyond

Posted by C_Peters Apr 23, 2008

Technology moves at such a rapid pace - it can often be mind-boggling. Even working directly with the product teams at Intel, I sometimes have difficulty keeping pace. The good news is that there is a tremendous opportunity today to be captured thanks to this rapid innovation, as well as a steady stream of advanced technology that IT can use to better support business and gain a competitive advantage. Recently I was interviewed by Tim Phillips from the Register about the current 45nm Quad-Core Intel Xeon products and the next generation Intel platforms based on the Nehalem processor.

A few years back, Intel fundamentally changed the way we design and develop our underlying micro-processor technology. We streamlined our innovation and accelerated it's pace. Internally, we call this new model Tick-Tock. I like to call it shrink and innovate.

A "Tick" is a manufacturing process shrink that delivers smaller silicon with higher speeds, more transistors and lower power consumption (example: moving from 65nm to 45nm process technology). The 45nm quad-core xeon processors (available since Nov '07) utilize unique materials (a high-k, dielectric) that are delivering industry leading performance / watt as measured by the industry's first and only standard benchmark, SPECPower
A "Tock" represents a more extensive architectural innovation (ex. Intel Core Microarchitecture) introducing new micro-architecture features and functionality fully utilizing the higher transistor count set up by the shrink. For Intel Xeon-based servers, the next "tock" is Nehalem. In addition to the new micro-architecture based on 45nm, a system re-design will incorporate next generation memory, I/O and virtualization technology for high performance, high bandwidth solutions compatible with today's leading software solutions
Listen to my podcast interview to learn more about the benefits of using today's products and the timing of next generation Intel technology featuring Nehalem. Is this information useful to you? If so ... how? Have any questions?

I'd be happy to hear from you. Chris



0 Comments Permalink
1


Here's the 4th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the fourth habit: Know Your BIOS.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1357/IMG_2318-noExif.jpg

My last blog talked about beginning your system tuning by consulting a block diagram. The other thing you should always look at is your system's BIOS. Many server BIOSes these days allow you to configure options that affect performance. Like everything in the performance world, which set of BIOS options will be best will depend on your workload!

First things first, how do you find this "BIOS"? Most servers have a menu called "Setup" (or something similar) that you can access while the system is booting, before it starts loading the operating system. This "Setup" menu allows you to access your system's BIOS. Changes that you make here will affect how the operating system can utilize your hardware, and in some cases how the hardware works. If you change something here, you usually have to reboot and then the change will "stick" through all future reboots (until you change it again). As platforms grow increasingly sophisticated, they are offering a widening array of user-configurable options in Setup. So a good practice is to examine all the menu options available whenever you get a new platform. Here are some of the most common options on Intel platforms that could affect performance:

  • Power Management - Intel's power management technology is designed to deliver lower power at idle and better performance/watt (+without significantly lowering overall performance+) in most circumstances. There are 2 types - P-States, which attempt to manage power while the processor is active, and C-States which work while the processor is idle. In some BIOSes, both of these features are combined into one option which you should enable. In other cases they are separated. If they are separate, here's what to look for:
    • Intel EIST (or "Enhanced Intel Speedstep" or "Intel Speedstep" or "GV3" on older platforms) - This is the P-State power management that works while the processor is active. Leave it enabled unless directed to change it by an Intel representative.
    • Intel C-States - If you have this option or something similar, it is referring to the power management used when the processor is idle. Enable all C-States unless directed by an Intel representative.
  • Hardware Prefetch or Adjacent Sector Prefetch - These options try to lower overall latencies in your platform by bringing data into the caches from memory before it is needed (so the application does not have to wait for the data to be read). In many situations the prefetchers increase performance, but there are some cases where they may not. If you don't have time to test these options, then go with the default. Intel tests the prefetch options on a variety of server workloads with each new processor and makes a recommendation to our platform partners on how they should be set. If, however, you are tuning and you have the time to experiment, try measuring performance using each of the prefetch setting combinations.

There are several other options that might affect performance on specific platforms. Some examples might be a snoop filter enable/disable switch, a setting to emphasize either bandwidth or latency for memory transactions, or a setting to enable or disable multi-threading. In these cases, if you don't have time to test, use your Intel or OEM representative's suggestion or go with the default setting.

Being familiar with how your system's BIOS is configured is another basic component of system tuning.

Keep watching The Server Room for information on the other 6 habits in the coming weeks.

1 Comments Permalink
5

Sometimes you get so deep into something that you don't realize how crazy it is until you take a step back. Like most technology companies, Intel has an inherent love for acronyms. The cacophony of standards bodies, advanced technologies, and intense rates of change in our industry necessitates the use of abbreviation just to be able to communicate clearly and briefly. However, while I am at least as much of a techno-phyliac as most of the folks in the technology jungle, even I sometimes run into an acronym wall. I thought to help myself and others it might be a good idea to decode one of the newer sets of network technologies that I work closely with and to decipher some of the associated names and acronyms that come along with it.

10 Gigabit Ethernet: It's here, it's real, and it's growing fast.

Ethernet (IEEE 802.x) has evolved over the years from a new standard linking computers together at slow rates and has moved from 10 Megabit per second (Mbps), to 100Mbps, to 1 Gigabit per second (Gbps), and a few years ago to 10GbE unidirectional throughput. Over time there have been several physical connection types for Ethernet. The most common is copper (Cat 3/4/5/6/7 cabling is used as the physical medium) but Fiber has also been prevalent as well as some other more esoteric (such as BNC Coax) physical media types. The most common 10GbE adapter (until very recently) has been Optical only due the difficulty of making 10GbE function properly over copper cabling.

But this post isn't meant to discuss the past, but more to decode the present and future as it relates to 10Gig Ethernet and the variety of flavors that are available. Below I'll cover a number of acronyms for 10GbE IEEE standards that are often lumped together as '10 Gigabit' and discuss some of the differences and usages for each. After that, I'll also try to clear up some of the confusion about ‘form factor' standards for optical modules (which are separate from IEEE) and some of terms and technologies in that realm:

10GBase-T (aka: IEEE 802.3an):

This is a 10GbE standard for copper-based networking deployments. Networking silicon and adapters that follow this specification are designed to communicate over CAT6 (or 6a/7) copper cabling up to 100 meters in length. To enable this capability, a 10GbE MAC (media access controller) and a PHY (Physical Layer) designed for copper connections work in tandem.

10GBase-T is viewed as the holy grail for 10GbE because it will work within the most prevalent Cat 6/7 based infrastructure that is already in place. For this flexibility, 10GBase-T trades off higher power, and higher latency.

10Gbase-KX4 (aka: IEEE 802.3ap):

This is a pair of standards that are targeted toward the use of 10GbE silicon in backplane applications (such as a blade design). The specifically is designed for an environment where lower power is required and shorter distances (up to only 40 inches) are sufficient.

10GBase-SR (aka: IEEE 802.3ae):

This specification is for 10GbE with optical cabling over short ranges (SR = _S_hort _R_ange) with multi-mode fiber. Depending on the kinds of fiber, SR in this instance can mean anything between 26 - 82 meters on older fiber (50-62um fiber). On the latest fiber technology, SR can reach distances of 300m. To be able to physically support a connection of the cable, any network silicon or adapter that support 10GBase-SR would need to have a 10GbE MAC connected to an Optics module designed for multi-mode fiber. (We'll discuss optics modules in more depth further down in this post.)

10GBase-SR is often the standard of choice to use inside the datacenters where fiber is already deployed and widely used.

10GBase-LR (aka: IEEE 802.3ae, Clause 49):

LR is very similar to the SR specification except that it is for _L_ong _R_ange connections over single-mode fiber. Long Range in this spec is defined as 10km, but distances above that (as much as 25km) can often be obtained.

10GBase-LR is used sparsely and really only deployed where ultra long distances are absolutely required.

10GBase-LRM (aka: IEEE 802.3aq):

LRM stands for _L_ong _R_ange over _M_ultimode and allows distances of up to 220 meters on older standard (50-62um) multi-mode fiber.

10GBase-LRM is targeted for those customers who have older fiber already in place but need extra reach for their network.

10GBase-CX4 (aka: IEEE 802.3ak):

This standard of 10GbE connection uses the CX4 connector/cabling that is used in Inifinband^TM^* networks. CX4 is a lower power standard that can be supported without a standalone PHY or optics module (the signals can be routed directly from a CX4 capable 10GbE MAC to the CX4 connector). Due to the physical specification for CX4 based 10 Gigabit, it provides a lower latency than comparable 10GBase-T copper PHY solutions. With the use of CX4 passive (copper) cables, the nominal distance you can expect between your 10GbE links is ~10-15m. There are also amplified 'active' (but still copper) cables with nominal distances up near 30m.

Below is an image of a standard CX4 based socket that would be on a 10GBase-CX4 NIC:

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1321/CX4+Socket.jpg

There are also what referred to as ‘active optical' cables are for CX4, which actually have an optics module in the termination of the cable, and the cable body is fiber. This kind of active design increases cable reach and improves flexibility (fiber is smaller than copper pairs) but also increases cost. These active cables can increase reach up to 100m.

Intel has recently released our own series of active optical CX4 cables.

For short distances (such as inside the rack in a datacenter), CX4 offers one of the lowest cost ways to deploy 10GbE from switch to server. Because of its design, CX4 also achieves very low latencies as well.

</end of IEEE standards ramble>

Ok, so we've summarized the majority of the IEEE 10GbE standards. But the immediate question arises: "Why are there so many?" Is the IEEE standards body for 10GbE just throwing out all these standards for every possible niche application? The answer is no. For any new standard IEEE phy interface standard to be approved, it must pass on several criteria including "distinct identity" and "broad market potential". While all of these standards certainly won't apply to any given institution's network, they all do all meet real market needs.

X2, XFP, SFP+... say what?

A final mystery that I've alluded to above has to do with the various optical module form factors that are available for 10GbE. XENPAK, X2, XPAK, XFP and SFP+ are standard optics module form factors that are used by both switch and NIC vendors in the industry. These modules that go along with the 10GbE networking products are an interesting beast. They are not specified by IEEE, but are standardized by a group of industry participants through what is known as a Multi-Source Agreement (MSA).

XENPAK, XPAK and X2 are the older module standards originally used for 1GbE, followed by XFP which shrunk the form factor of the actual module as well as the fiber cable pairs. SFP+ is a newer form factor that is now gaining momentum with switch and NIC vendors. An SFP+ optics module can use the same fiber pairs used with XFP (no new fiber cable needed), but the form factor of the cage in the switch or NIC as well as the optics module itself are smaller. The key advantage of using SFP+ is the new form factor can drive lower costs, lower thermals, and higher densities at the switch.

Here is an image of an older X2 optics module:

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1322/X2+Module.jpg

And here is a comparison of the size of XFP (right) relative to SFP+ (left):

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/38-11002-1324/XFP+SFP%2B+Comparison.jpg

The optics modules are driven by a low power interface from the 10GbE MAC. The interfaces are XAUI (for X2 modules), XFI (for XFP modules), and SFI (for SFP+ modules). These interfaces generally are supplied directly from the 10GbE based MAC to the module cage. One of the things the module MSA standards bodies agree on is not only a form factor for the module itself but also the electrical specifications of the driver interface that can be accepted from the MAC.

The key thing I want to hammer home here is that IEEE specification (such as 10GBase-SR) is separate from the module form factor used.

For example, you can have a Short Range optical NIC that uses X2, XFP, or SFP+. So asking for an "SFP+ NIC" isn't actually specific enough, because that could mean a lot of different things. You'd have to specify a 10GBase-SR NIC, with SFP+ optics.

SFP+Direct Attach:

Now that I've thoroughly confused everyone, I'll take it one step further. Not only can each module form factor be used with different IEEE MAC specifications, but each module doesn't even need to be used for a fiber connection at all. An interesting example of using an ‘optics' module form factor for a non-optical design is SFP+Direct Attach.

SFP+DA is similar in concept to CX4 but provides a bit more flexibility. Normally, you may have a switch or NIC that is designed to be able to support the addition of SFP+ based optics modules for a 10GbE fiber connection. Direct Attach allows for passive Twin-Axial (2 pair copper) cables to be plugged directly into the SFP+ cage (in place of an optical module) to carry the serial signal from the MAC directly over the cable to another SFP+ form factor enabled NIC or switch.

Again, the downside is that without either a standalone PHY, or optics module to send the signal over a long distance, a passive cable with SFP+DA has a reach in the ~10-15m range. The real advantage for SPF+DA over CX4 is that on the switch side the SFP+ module design allows higher density switches than CX4 can provide.

For a top of the rack switch, SFP+DA will likely provide excellent cost, power and latency characteristic and still have enough reach to be very feasible inside the rack.

10GbE - The Infrastructure is Ready!

I hope that I've lifted a little bit of the fog that surrounds the 10GbE market and the related technologies. The last thing I want to leave you with is the fact that 10GbE infrastructure is now starting to roll into the mainstream. CX4 switches are available broadly in the market today and SFP+ type designs for both optical modules as well as Direct Attach connections have been demonstrated and will be getting rolled out very soon by various vendors.

Intel is already selling a wide variety of NICs and silicon to meet the various form factors and standards based market needs I listed above along with other vendors in the market place.

After years of anticipation, 10GbE is finally hitting its stride. Next stop... 10_0_GbE... :-)

5 Comments Permalink
4


Today, Intel launched 50W low power versions of the 45nm Quad-Core Xeon processors (the L5400 series).
The 2 new SKUs are listed below:

Quad-Core Xeon L5420 2.50 GHz, 12MB L2, 1333MHz
Quad-Core Xeon L5410 2.33 GHz, 12MB L2, 1333MHz

These products offer IT and business users 2 primary benefits:

  • 45nm 50W quad-core brings 25% improved performance over previous generation 65nm 50W quad-core processors
  • They also run 30W cooler than mainstream 80W quad-core processors delivering the same performance at the same frequency.

We have seen strong interest for these 50W quad-core products and I'd like to hear from you on where you would use low power quad-core and why?

4 Comments Permalink
0


I recently found this simple animation that breaks down the Xeon processor family into bite-sized chunks and explains which Xeon-based servers are best suited to meet common IT and business needs.

I shared it last week when traveling with customers in Taiwan and it was well received.

What do you think of this video?

0 Comments 0 References Permalink
0


Virtualization is without a doubt a very hot topic these days. Companies continue to look to server virtualization to increase the utilization rates of their systems and lower overall deployment and management costs. The basic model of a virtualized server is depicted below:

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1277/Pre-VMDq+Virtualization.JPG


Essentially, you have a VMM (Virtual Machine Monitor) SW layer that talks between hardware and software and allows each virtual machine to successfully use what it thinks is one network port. This is a pretty straightforward model and it directly addresses the general reason for virtualization which is that generally the server may not be utilizing its processing power in full and is thus wasting CPU cycles.

There is an interesting result of this consolidation onto a single physical box with several Virtual Machines. In addition to consolidating CPU processes, you also effectively consolidate I/O bandwidth and switch processing capabilities onto the same platform. The overhead of this switching limits your bandwidth, adds CPU overhead, and effectively reduces the benefits of server virtualization. In some cases you may have a new problem in having created an I/O bottleneck.

This makes a lot of sense if you think about the fact that in essence, what you are doing is merging 5-10 machines that each had 1 or 2 ports of Gigabit Ethernet (all connected via a switch) into a single machine. This new server probably needs to have at least 6 ports or more of Gigabit Ethernet and may even require 10 Gigabit connections just to be able to support the new consolidated workload.

Enter Virtual Machine Device Queues (VMDq):

In order to help the I/O congestion associated with the additional VMM software switching in a virtualized environment, Intel implemented a technology called VMDq in our latest Ethernet NICs and silicon. VMDq is a technology specifically designed to offload some of the switching that was done in the VMM to networking hardware specifically designed for this function. This drastically reduces the overhead associated with I/O switching in the VMM which greatly improves throughput and overall system performance.

Below is a diagram that summarizes the new virtualized server stack with VMDq enabled:

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1278/Post-VMDq+Virtualization.JPG


On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer.

On the transmit side, the packets are serviced round robin style to avoid "head of line" blocking and alleviate potential quality of service (QoS) issues.

The immediate question I expect is "So, don't the VMM vendors have to support this?" And the answer is yes. Intel is supporting this feature today on shipping platforms, but you do need to work closely with the VMM vendor to make sure the whole stack works as designed.

Just this week Intel announced that our VMDq capability will be supported in VMware's upcoming ESX release. This is certainly a big step towards wide support of network virtualization performance enhancing features.

Ethernet technology has grown and become more important over the last 25 years, and the trend appears to be continuing on course.

Ben Hacker

For more details on VMDq, there is a VMDq Whitepaper, and an Intel® VT for Connectivity Datasheet located on our website.

0 Comments 0 References Permalink
3

Here's the 3rd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the third habit: Know Your Platform.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1247/IMG_2376-edit-x350-noExif.jpg

As we learned in my last blog, we should start our server performance tuning by looking for system-level bottlenecks. This involves understanding exactly how data flows into and out of your platform - and to do this, you need a block diagram. A block diagram shows the major components on the server's motherboard and the paths between them. From a good block diagram you can derive the maximum data transfer rate (aka bandwidth or throughput) achievable as data flows along those paths.

I usually look at my block diagram before beginning system tuning in order to identify potential bottlenecks. But some people use them in parallel: they measure the bandwidth of various parts of the system and then confirm what they see using the block diagram. You can determine if various parts of your system are heavily stressed, bottlenecked, or lightly utilized. In general you want to trace the path from where data enters your server (NIC, HBA, etc) up to the processor and back to memory or out of the server. The paths connecting one component to another are commonly known as buses. For each bus, multiply the speed by the width to determine the maximum potential bandwidth.

Let's use the block diagram for the Intel S5400SF server board as an example. It has 2 FSBs, each capable of 1333 or 1600 Mega-Transfers/second (MT/s). Each transfer on the FSB is 64 bits (8 bytes), so 8 bytes * 1,600,000,000 transfers gives a maximum theoretical bandwidth of 12.8GB/s per FSB segment. Keep in mind though that in reality a bus will not achieve its theoretical maximum bandwidth - depending on the type of bus it will probably realize 66-80% of the possible throughput.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1246/block_diagram.JPG

So, where do you find these diagrams? If you are using an Intel server platform, the block diagrams can usually be found in the technical product specification for each board. If you purchase a platform from one of our OEM partners, ask your salesperson where to get it.

Look at the maximum bandwidth achievable on each link your data will travel over to gain a deeper understanding of how your workload will run on your platform.

Keep watching The Server Room for information on the other 7 habits in the coming weeks.

3 Comments 0 References Permalink
0

Here's the 2nd follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the second habit: Start at the top.

Let me start by relating a true (although simplified) story. My team at Intel has built up years of expertise running a particular benchmark. So when the time came to start running a new, similar benchmark, we thought: "No problem." We began running tests while the benchmark was still in development. Immediately we had an issue: the type of problem that would normally indicate our hardware environment wasn't set up properly. We checked everything that we had seen cause the issue in the past, and we couldn't find anything. So, we blamed the new benchmark. After all, we were experts and we had been setting up these environments for years! We knew what we were doing. You can probably guess where this story is going: after weeks of doing things to work around the "benchmark issue", we figured out that we had mis-configured the environment, resulting in a bottleneck on one part of our testbed. We didn't thoroughly test that part of the environment because it had never caused us problems with the old benchmark. And of course, on the new benchmark it was critical. We had broken one of the most important rules of performance tuning: Start at the Top.

So now you know how easy it can be to not Start at the Top. Even seasoned performance engineers can get overconfident and forget this rule. But the consequences can be dire:

  • 1. You have to eat major crow when you realize your mistake. I'm just now getting over the humiliation.
  • 2. You might have put tunings in place to address issues that weren't really there. This is at best wasted work and at worst something that you have to painstakingly undo when you fix the real issue.

So...how do you avoid this situation? Simple: use the Top-Down Performance Tuning process. This means you start by tuning your hardware. Then you move to the application/workload, then to the micro-architecture (if possible). What you are looking for at each level are bottlenecks: situations where one component of the environment or workload is limiting the performance of the whole system. Your goal is to find any system-level bottlenecks before you move down to the next level. For example, you may find that your network bandwidth is bottlenecked and you need to add another NIC to your server. Or that you need to add another drive to your RAID array, or that your CPU load is being distributed un-evenly. Any bottlenecks involving your server system hardware (processors, memory, network, HBAs, etc), attached clients, or attached storage is a system-level bottleneck. Find these by using system-level tools (which I will touch on in the future blog for Habit #8), remove them, then proceed to the application/workload level and repeat the process.


Being vigilant about using the top-down process will ensure you don't waste time tuning a non-representative system. And it just may save you some embarrassment!

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1225/IMG_2506-measureBottleneck-edit2-x250.jpg
Always measure your bottlenecks!

Keep watching The Server Room for information on the other 8 habits in the coming weeks.

0 Comments 0 References Permalink
1


As a follow-up to my first post on the 10 Habits of Great Server Performance Tuners, this post focuses on the first habit: Ask the Right Question.

http://communities.intel.com/openport/servlet/JiveServlet/downloadImage/1207/Greg_Questioning_100by100_no_exif.JPG
6 years of performance work have taught me to start all my projects with this habit. Before I explain the kinds of questions I ask, let me demonstrate why this is important. Here are some example undesirable outcomes of performance tuning:


  • You spend months of experimentation trying to match a level of performance you saw reported in a case study on the internet, only to find out later that it used un-released software you can't get yet.
  • You spend months optimizing your server for raw performance. As part of your optimization you fully load it with the best available memory and adapters. Then you find out that your management/users would have been happier with a lower level of performance but a less costly system.
  • Your team works hard to maximize the performance of your application server for the current number of users you have, but makes decisions that will result in bottlenecks and re-designs when the number of users increases.

The outcome we are all hoping for with our tuning projects is that we provide the best level of performance possible within the budgetary, time, and TCO constraints we have. And of course, without sacrificing any other critical needs we'll have for our server, either now or in the future. Since performance optimization can take a lot of time and resources, consider the following questions before embarking on a project:

  • Why are you tuning your platform? (This helps you decide the amount of resources to dedicate.)
    • As part of this question, consider this one: How will the needs and usage models for this server change over the course of its life?
  • What level of performance are you hoping to achieve?
  • Are your expectations appropriate for the software and server system you are using?
    • In determining if your expectations are appropriate, refer to benchmarking results or case studies where appropriate and make sure any comparisons you make are apples to apples!
    • A corollary to this question is: is the server being used appropriate for the application being run?
  • What qualities of your platform are you trying to optimize: raw performance, cost/performance, energy efficiency (performance/watt), or something else?
  • Is performance your top priority for the system, or is scalability, extendibility, or something else a higher goal?

Thinking about the answers to these questions can help you navigate the trade-offs and tough decisions that are sure to pop up, and will help make your tuning project successful.

Keep watching The Server Room for information on the other 9 habits in the coming weeks.

1 Comments 2 References Permalink
0

In a prior post I argued that a lot of the work happening in your data center could probably be done someplace else. One of the counter arguments to this approach is the potential loss of the competitive advantage achieved by owning your compute resource, especially where your competition can not or does not own a parallel resource. There may be some situations where this is true, but in most situations external resources (ex: Cloud Computing) can actually liberate a business from the capital constraints of building a private compute center. If compute capacity delivers a competitive advantage, external availability provides scale to the limits of what an organization use. Like any other resource, the trick is in using it effectively. Ability to take advantage of this resource will be a future differentiator for compute enabled companies. One of my favorite sound bites was an estimate in "information week" stating that a one-millisecond advantage in trading applications could be worth $100 million a year to a major brokerage firm.

Taking advantage of the computing cloud starts to look a lot like the fabled utility computing architecture. Utility computing is real, but Gartner* still places it on decent into the "trough of disillusionment". I agree, and broad availability of utility computing is still a few years out. That doesn't mean IT managers should be waiting.

Why does Intel care? Will processor type matter in this emerging utility era - in the era of hosting, SAAS, and clouds? My short answer is yes. I think Intel has the right products and roadmap to be "platform of choice" in the evolution to utility. My rationale for this position comes from the behaviors of companies doing leading work in these areas. It turns out that service providers want the very best value, where value is measured as a combination of performance, performance / watt, performance / $, platform efficiency, support for virtualization, management, and security. I.E. pretty much the same stuff that every data center manager should value. Intel has focused server platform evolution toward delivering platform leadership in, efficiency, virtualization and performance. Success in these three pillars ensures continued leadership in the data center. Beyond these pillars, Intel is also working with the software ecosystem to enable effective integration and optimization of the rest of the solution stack. The combination of technical leadership and a shared core architecture that spans mobile, desktop, and servers gives Intel a distinct advantage in utility computing.

0 Comments Permalink
0

Every now and then a colleague, customer or acquaintance sends me a link to an article or blog that usually either features our products or those from one of our competitors. More often than not I get a lot of repeat sources (The Register, The Inquirer, CNET, etc…). The blog that comes my way most often is one from George Ou at ZDNet. One of his most recent blogs (A comparison of quad-core server CPUs) shows a bunch of our latest quad core CPUs and how they stack up against our previous versions as well as those from AMD. I won’t rehash the article here aside from saying it was positive for Intel and to say AMD’s issues with their quad core processors have been well documented.


Is Intel winning now because our products are superior? Are we winning because our competitor is struggling? Do these benchmarks mentioned in George’s blog tell the whole picture? As you can imagine we constantly ask ourselves these questions and many more internally. Our conclusions are that for processors and server platforms, as long as we provide leadership along several key vectors then our market share and overall market position will improve.

Manufacturing process, processor architecture, system architecture, cache size. These are four critical vectors that we have direct control over when we are making design and enabling decisions. At times in our past and in the present we have had leadership on all four. In those times we have won hands down. There have also been times where a competitor has chosen to focus on one or two vectors and that has led to their products being better for a specific area. The four vectors above are things that Intel focuses on but we always have to keep an eye on what end user value they deliver.

Our customers tell us they care about three main things; Price, Performance and Power. The three P’s. George’s blog shows that for one of the P’s (Performance) Intel has leadership, particularly on integer and floating point. There are similar looking examples for database, virtualization and pretty much any performance benchmark we have looked at recently. Thankfully for Intel, Performance is the “P” with the strongest correlation to success in the server market from a MSS perspective. We are also doing some amazing things with regard to Power. Some have been launched already and some will be coming soon with new products in 2008. The market is segmenting and we now make CPUs, chipsets and networking components that help OEMs build platforms targeted at high performance computing, mainstream enterprise, blades, workstations and emerging markets. Each has unique requirements with respect to the three P’s and one size no longer fits all.

I believe that overall George’s blog highlights the success that we are having today. I also think that there will be a steady stream of innovations that will be delivered in 2008 and beyond that will cause us to rethink how we deliver performance at the most efficient power level for the best possible price point. Virtualization, utility computing and charge back models for datacenter environments are all stepping up to take center stage. We all must innovate or become irrelevant…technological evolution waits for no one.

Shannon

0 Comments Permalink
1 2 Previous Next