Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > performance_tuning
1 2 Previous Next

The Server Room Blog

27 Posts tagged with the performance_tuning tag
0

Change is hard, but it can be done and the benefits of change usually outweigh the concerns which were on our minds before we made the change.

 

When making the change from running your solution on a RISC architecture to running that solution on a Xeon architecture, the biggest concern usually relates to whether that solution will run at the same level as on the previous architecture. I'm not talking about performance specifically, but usually the question is around whether operating systems like Linux, Windows, and Solaris on Xeon will meet your business needs for yourmission critical solutions.

 

Like the underlying improvements in the microprocessor, I believe that there have also been major fundamental improvements in the operating systems that run on both today's and the soon to come next generation microprocessors (sorry, my obligatory Nehalem-EX advertisement... coming soon in 2010). A decision made many years ago to run your solution on Unix/RISC was made based on comparing all the different variables at that time to pick what was right for your business. At that time you likely decided that your solution would not run on these operating systems, these operating systems were not suitable for your mission critical workloads etc. Probably right decision at that point, but like everything else decisions get revisited based upon the here and now and what may have been the right solution in the past (and right decision) may not be the right solution for your needs now.

 

I wanted to share some thoughts specifically on Redhat Linux today. Lets take a little look at Redhat Enterprise Linux. Current versions of Redhat can deliver what is required for your critical solutions. RHEL is ready and here are some of the reasons cited by Redhat in recent webinars on this topic and my interpretation of their comments

  • Hosts real-time global mission-critical infrastructures and operations 24 X 7 - its tried and tested by other Enterprises
  • Enables 5x9s availability in highly secure environments - pretty important to most critical solutions
  • Contributes measurable reductions to TCO and enables, agile, standardized, and virtualized infrastructures - TCO benefits through standardization
  • Has major ISVs on-board with the majority of 3rd party Unix applications have Linux and/or Windows versions available - the ISVs that traditionally delivered applications to you based on Unix, also have versions supported on Linux/Windows
  • Many customer unique applications are developed with programming languages such as C, C++, JAVA, or J2EE and can be migrated to Linux and / or Windows - your applications can be moved
  • Hosts most major database systems standard for your infrastructure - all the major databases run and run well on Linux

One of the other things we encounter a lot is around whether the technical considerations to move from one operating system environment are too high to overcome and outweigh the benefits of moving. There are always technical considerations and things that you need to know to move from one environment to another. However you are not alone in trying to understand these technical considerations. Redhat have done a phenomenal job of documenting the challenges of moving from say Solaris to Linux and have developed a great Strategic Migration Planning Guide. This is available on request. In recent webinars Redhat outline some of the things that you need to consider for the following technical categories

- Development Environment; Kernel tuning; Security; Filesystems; Debugging, tracing, Profiling; Command Differences; Deployment methods; Software Management; Virtualization; Application considerations 

In addition to the current versions of Redhat running on Intel architecture, we are also working very closely on future versions that will take advantage of the 20+ new RAS features that are planned for Nehalem-EX - more on that in a future blog

You are not alone, resources, tools and expertize exist to help you make that move and reap the business benefits while still delivering to the requirements of your business. Check out Redhat online tools for more information that dives deeper into all the areas for consideration http://www.redhat.com/migrate/solaris_to_linux/

We think Redhat Linux and Xeon are ready to run your mission critical workloads and solutions...What do you think?

0 Comments Permalink
1

Why upgrade your hardware when migrating to SAP ERP 6.0?  Because it makes simple, practical, business sense that is all.  SAP has identified several key reasons why customers are concerned about migration and several among them are as follows:

·         Cost, Cost, Cost

o   HW infrastructure cost is highlighted as one of the key barriers of migration

·         Business Justification

o   Is there a compelling business reason to upgrade the hardware?

·         Additional risk of business disruption

o   Migration of ERP environment is complex enough…how much more risk is there when upgrading your hardware?

From a cost perspective, the perception that hardware is a barrier to migration can be easily overcome.  Based on research, the hardware cost as a percentage of the overall migration cost is only about 7%.  That means 93% of the cost is in licensing, consulting, etc, etc.  HW costs are only the “tip of the iceberg” and the real $ investment lies elsewhere in the equation.

Is there a compelling business reason to upgrade your hardware? Well…frankly, it does not make sense not to do it.   One, we showed above that the hardware investment is minimal compared to SW licensing, consulting, service, etc.  Two, the hardware requirements of ERP 6.0 are significantly higher than previous versions. ERP 6.0 requires up to 2.5x more CPU performance, 2.5x more memory and 1.5x more I/O!  You will need the increased performance and scalability that Intel provides in our microprocessors.  While the ERP performance requirements have increased 2.5x, Intel performance with SAP has increased 10X!  Oh, btw…energy efficiency does matter and in your new ERP environment you will be able to consolidate servers and save on power and cooling costs.  TCO will be significantly reduced and from hardware investment standpoint, you are likely going to recover the cost of the servers in a very reasonable timeframe.

From my discussions with the IT community, their major concern and number one focus area is to prevent business disruption and downtime.  This costs companies real and significant money.  The fact is that an ERP migration is a complex enough project managing the strategic, functional and technical portions.  Adding a server infrastructure change increases fundamental risk.  But, the key here is that it is done often and done successfully.  Intel IT has published several whitepapers on the subject and communicated “Best Known Methods” to minimize that risk.    A quick summary is inserted here:

Challenge:

         Convert Intel’s Worldwide Warehouse Management Software

         Upgrade from SAP* ERP version 4.7 to 6.0, change the DBMS, and perform a Unicode* conversion as well as a hardware upgrade

         Minimize downtime

Benefit to Intel IT:

         SAP ERP 6.0 improves Intel supportability

         Increases ease of integration to SAP NetWeaver* 7.1 Suite

         Provides access to Enhancement Packs and Enterprise Services

         Intel® Itanium®-based servers provide access to 128 GB of memory for database and SAP operations and significantly increased performance from true 64-bit processing

Key Results:

         Reduced downtime of upgrade by 50% by using Intel Architecture

In summary,  upgrading your server infrastructure when migrating your ERP environment is a very, very complex task, but form a business perspective, it should be fairly easy to see the true benefits from combining the ERP migration and hardware upgrade at the same time.

1 Comments Permalink
0

I wrote a while back about how the Xeon 7400(Dunnington) processor series compared to RISC. Since then I have shared information through other blog posts and sharing content about how Xeon 7400 and Xeon 5500 will compare to both SPARC and POWER.

 

Xeon 7400 and Xeon 5500 are the current products shipping into the marketplace today. I.M.H.O they offer a pretty compelling alternative from both a performance and TCO perspective Vs SPARC and POWER. But I will not try and repeat all the reasons here

 

What I wanted to share with you was some thoughts about what the next product to succeed Xeon 7400 will bring to the RISC party. Nehalem-EX is the code-name for our next generation of product designed to serve workloads currently serviced by Xeon 7400 today (i.e. Database, ERP,  BI etc). EX btw is what we all would traditionally call MP or multi processor servers

 

Don't stop reading now, here is why I'm EXCITED about what Nehalem-EX will bring to the RISC party.

My excitement is actually based on real customer discussions about what Nehalem-EX will do for them and why it delivers some new stuff (my code for features and benefits) which they see as a pre-requisite to make the move from RISC to Xeon. For some customers the TCO and performance of  products have been enough to convince them to move. For some other customers there are still some checkboxes remaining which I believe Nehalem-EX will address

Here is a snapshot of some of the cool new stuff which is actually convincing customers (from some real deals that I have worked)

    1. Improved bandwidth. Up to 9 times memory bandwidth of previous generations
    2. Introduction of Quickpath Interconnects to the EX systems
    3. Add new RAS features previously seen on Itanium products to Xeon products
    4. Significant improvement in performance vs previous generations e.g. Database 2.5xe
    5. More scalable platforms through 8 OEMs offering >8S. These platforms are key to manage large databases and for large scale consolidation
    6. Mainframe class availability in scalable platforms

 

For more information check out the press briefing from May. See more the details in the presentation

 

 

 

Nehalem-EX goes into production later this year and I am pretty excited about how it will change the game. What do you think?

0 Comments Permalink
1

The need to write scalable applications has been important for programmers in the HPC community for years. Now, with the proliferation of multi/many-core processors developing scalable software is now a top priority for many programmers. 

Andrew S. Tanenbaum stated at the USENIX ’08 conference last year that developing “sequential programming is really hard” … the difficulty is “parallel programming is a step beyond that.” 

He is right, but let’s illustrate why it is just a small step.

Here is the point – parallel architectures will continue proliferating and we will need to develop and refine parallel algorithms that exploit parallelism. While difficult, to develop and refine parallel algorithms, the actual programming of these new algorithms, does not need to be hard.  However, if the developer is required to know the intimate details of the hardware then the development and refinement parallel algorithms can be very difficult, and very time consuming.

One approach provided by Intel software developer tools is to abstract away the details of the hardware.  This allows the developer to focus on their algorithms /applications, and rely on Intel software developer tools to provide the best optimizations for current and future platform While you may give up some performance by being abstracted away, what you lose in performance will be rewarded by your ability to quickly iterate through more iteration of your parallelization ideas in less time.  You may find yourself designing and developing better approaches to parallelism because you were able to test more hypotheses. 

An additional by-product of being abstracted away from having to know the intricacies of the hardware is that your software will be highly adaptable to future platforms.  You will see tremendous improvements on multi-core solutions and will be in a great position scale your application performance forward as newer architectures are made available. 

To learn more Intel Software Tools and the benefits of optimizing your software on multi core based solutions first visit http://software.intel.com/en-us/intel-sdp-home/

1 Comments Permalink
0

Running multiple Unix environments across a range of locations adds increased complexity and cost to the IT environment. I came across an interesting case study and wanted to highlight some of the key findings

 

YPF SAis the largest company in Argentina operating in the Oil and Gas industry. The company has 29 gas plants around Argentina running different Unix environments such as HP-UX, AIX and Solaris.

 

YPF SA consolidated their SAP ERP and Oracle DB environment from multiple Unix environments to Red Hat Enterprise Linux 5 with integrated virtualization running on Intel Xeon based platforms from IBM System X

 

Some of the key findings to highlight

  • Key requirement from Unix Administration Team that "migrating from old RISC/Unix and proprietary servers to open and flexible platforms would pose no risk to the reliability, availability and performance of the systems"
  • Positive impact on cost and performance; Lowered costs, simplified management and increased compatibility
  • Reduction in costs especially when compared to license costs of RISC based platforms
  • Increased performance and availability drove decision to scale with RHEL and Xeon
  • Ability to leverage Redhat integrated virtualization. Free up internal hardware and technical resources for other projects

 

 

I guess the combination of Redhat and Intel deliver the business results that customers are seeking. What do you think?

0 Comments Permalink
1

Are you a developer writing applications to run on the Solaris operating system?. Are you looking for ways to optimize your Solaris solution on industry standard architecture based on Intel microprocessor? If you answer yes to either of these questions then please read on.

 

Intel and SUN have been working closely together to optimize the Solaris operating system on the Intel Xeon 5500 processor. Most of you probably know the Xeon 5500 better by its product codename Nehalem. The Xeon 5500 is the the product that fits into 2 socket platforms.

 

SUN have just published a very compelling quick reference guidethat will assist both Developers and System Administrators looking to optimize Solaris solutions on Xeon based processors. The guide talks about the work that Intel and SUN are doing together, technical descriptions of specific features and capabilities that can be implemented in the Solaris OS to optimize the capabilities of the Xeon.

 

I have just finished reading this and it is a very compelling paper covering topics such as

- How Solaris takes advantage of Intel Turbo Boost Technology to use available power headroom to deliver higher performance based on workload demand

- How Solaris can take advantage of new Intel Quickpath Interconnect (better known as QPI) and other innovations in the OS to reduce memory latency

- How Solaris performance counters help to better manage workloads

- How Solaris takes advantage of many of the power efficiency capabilities in the processor. Things like Power Aware Dispatched in Solaris enable the processor to stay longer in idle states. In non tech talk this saves power.

 

Solaris has been a tried and tested operating system for along time for companies running their most business critical workloads. This paper talks about the combination of Solaris and Xeon to deliver improved reliability and availability for these critical workloads. Detail information on predictive self healing, fault management, leveraging Intel Machine Check Architecture and more all included in this paper.

 

Probably my favourite section is around the developer tools optimizations and the different tools available for developers that want to run and optimize their applications on Solaris and Xeon.

 

Ok, I'll stop waxing lyrical now. This is a very compelling paper and it does certainly construe that Solaris and Xeon 5500 could be the perfect combination for your Solaris solution. What do you think?

1 Comments Permalink
1

The debate on how to best increase system capacity to accommodate growing applications has raged on for years; “scale up” with more CPU, memory, and I/O, or “scale out” with loosely connected systems.    Scaling out by adding networked systems to increase capacity has been a good economical solution for many IT managers because it allows them to grow by using less expensive, industry standard building blocks.  However, there are some notable exceptions to this line of thought.  One is that the class of applications that require shared memory and large database support are much better suited to run on a single, expandable system that scales up.  These are typically transaction processing, business intelligence and ERP solutions.   Until now, IT managers running applications that require scale-up systems larger than 4 or 8 CPUs have had limited platform choices and most were proprietary and expensive RISC-based servers.

 

The other problem with the scale out approach is the people, facilities, software and overhead costs and complexity of managing very large numbers of servers, which can grow to a point where the costs outweigh the performance and system cost benefits.  The industry solution to achieving better ROI has been to consolidate multiple scale-out servers onto single industry standard scale-up servers with virtualization solutions.  This is a good solution, but is limited by the number of application loads the IT manager feels comfortable placing on a single server, given the need to maintain peak performance and availability for each application.

 

Well, it looks like the scale-up, scale-out debate is about to take another turn.  In the server product update Intel gave on May 26th, they talked about new levels of system scalability and choice supported by the upcoming Nehalem-EX processor.  This processor will support systems that scale up to 8 sockets natively (shared memory, without any additional silicon), and up to 16 sockets and higher with node controllers from system manufactures that allow single systems to share memory beyond 8 sockets.   So far there are over 15 different designs from 8 OEMs that offer 8 socket or higher scalability.  But of course, for the class of application where scaling is important, socket count doesn’t tell the whole story of what’s needed for scalable performance.  Thread support, key for transaction processing and virtualization, scales at the rate of 16 threads per socket with 8 cores and Hyper Threading (2 threads per core).  That would be 128 threads for an 8-socket system, and 256 threads for 16 sockets.   And in order to keep those threads fed with data close to the CPU, each processor supports up to 24 MB of shared cache (1.5X current generation Xeon), and an impressive 16 memory slots per socket or 128 DIMMs on an 8-socket system.  In addition, the Scalable Memory Interconnect gives these systems 9 times the memory bandwidth of today’s top Xeon processor.  Finally, four QuickPath interconnect links per socket allow for high-bandwidth sharing of data across the system.

 

So the net of it is that the industry is going to see a broad selection of highly scalable, next-generation servers that significantly extend the economic advantage of industry standard scale-up solutions for business-critical, large database, and high-end virtualization/consolidation deployments.     I would expect these systems to give IT managers a very cost-effective alternative to the much more expensive and proprietary RISC-based servers they use today.

 

What are your thoughts?  Mike

 

Related Topics:

 

 

 

 

1 Comments Permalink
7

I was thinking about what to write in my next blog and what I could share beyond what I have written previously about Intel Vs RISC in terms of TCO, performance and the customers that are choosing to move.

 

Luckily I didn't have to think too long on a Friday morning as a a topic came to mind instantly. There are numerous articles flying around this morning that picked up on the Oracle comments yesterday about how SPARC based systems compare to Intel. Thanks for providing me with an appropriate topic.

 

So in case you missed it, there was a question and answer session with Larry Ellison. When asked about SPARC, this was the reply "SPARC is much more energy efficient than Intel while delivering the same performance on a per socket basis. This is not a green issue, its an economic issue. Today, database centers are paying as much for electricity to run their computers as they pay to buy computers. SPARC machines are much less expensive to run than Intel machines"

 

1) SPARC more energy efficient than Intel?  Seriously, in what parallel universe does that exists?

SUN continues to use watts per thread as measure of energy efficiency. The recognized industry standard benchmark for measuring energy efficiency is SPECpowerand I don't see any SPARC based results in the 91 results published. The absence of a result certainly says something very clear to me - no story.

 

These UltraSPARCT2+ systems get loaded with a lot of memory to deliver the their results, so when you look at overall system power (what people care about) they are not as energy efficient as Intel based systems.

 

SPECpower is effectively based of SPECJbb-2005 so another way of loking at this is to look at the SPECJbb-2005 results for a 4 socket UltraSPARcT2+ system and a Xeon 7400 system. The 4s UltraSPARCT2+ delivers 693k BOPs while Xeon 7400 is 532kBOPs. So you conclude that SPARC is better than Xeon?. That would be the wrong conclusion

UltraSPARCT2+ system would consume 1525 watts Vs Xeon 7400 at 816 watts. If you look at BOPs per watt (another way of looking at energy efficiency and performance) then you would see that Xeon 7400 is 43% more energy efficient. Doing a similar comparison with Xeon 5400 (I haven't even talked about our latest Xeon 5500, Nehalem) would be up to 77% more efficient than UltraSPARCT2+.

 

And lastly before I forget to mention the 4s UltraSPARCT2+ had 128GB memory and costs over $150,000for the system, while Xeon 7400 based system had 64GB memory and costs around $32,000.

 

2) SPARC deliver same performance on a per socket basis?

2S Xeon 5500 has performance leadership over 2S UltaSPARCT2+ across a wide range of benchmarks. Up to 70% more performance and up to 60% lower system cost. 4S Xeon 7400 has price/performance leadership over 4S UltraSPARCT2+, UltraSPARCT2+ results achieved with system loaded with lots of memory that drives the cost up to 3-4Xthat of Xeon 7400 system

 

3) SPARC machine are less expensive to run?. I can't for the life of me work this one out!.

Hardware systems based on Intel have leading price/performance (read cheaper), lower energy needs (so electrivity bill lower) and any software product with a license per core strcuture is less expensive on Xeon system than an 8 core UltraSPARcT2+ (which also has higher multipler per core)

 

That's all for now folks. I just wanted to share some data on why I know that SPARC machines are much MORE expensive to run than Intel machines

7 Comments Permalink
6

Ever find yourself in a new location staring hopelessly at a map, wondering where you are?  Then to make matters worse, you call someone on your cell phone and can’t describe where you are so they can help? I think we’ve all been there more than once…

Since the Intel Xeon® 5500 processors launched in March, I’ve been getting a bunch of questions (including from the Ask An Expert community [http://communities.intel.com/message/12284#12284] in the Server Room) about DDR3 memory and how best to configure your server platforms to optimize performance.  Many times, folks are having a hard time just getting the conversation started, so here are a couple of tips to get you going.  The good thing is that DDR3 memory picks up where DDR2 memory leaves off in terms of speed, so you know you’ll be moving forward!

  1. Figure out how much memory you need.  With multi-core CPUs now mainstream in servers, you need enough memory to keep these compute engines fed.  One metric you might look at is “GB per CPU core” or “GB per socket” for your existing servers, and then project your memory requirements from there.

  1. Start with DDR3 1066 memory, as that will deliver a good balance of memory performance and capacity. 

ð        If you need more bandwidth (and willing to give up some capacity), use DDR3 1333

ð        If you need maximum capacity (and willing to give up some bandwidth), use DDR3 800

  1. Match your CPU to your memory speed because the faster memory does require a faster processor.  Check out page 11 of the product brief for the quick reference table.

  1. Wherever possible, fill up as many memory channels as possible, and populate all channels evenly (same type, size and number of DIMMs). 

ð        Most two-socket Xeon® 5500 platforms will have a total of 6 memory channels, so aligning your memory requirements to a multiple of 6 GB will optimize memory performance for most application environments.  

ð        However, you can mix/match memory types if your requirements call for something that is not a multiple of 6.

  1. For Server application environments, always go with ECC supported memory.  Decide between Registered (RDIMM) and Unbuffered DIMMs with ECC (UDIMM ECC).

ð        RDIMM provide greatest flexibility across DIMM sizes and availability

ð        UDIMM ECC provide a lower cost alternative if you are using 1 GB or 2 GB DIMMs

You will still want to check with your system vendor on the specifics, such as memory configurations and DIMM types and options supported for a given server, but hopefully this helps you pointed in the right direction.

If you are still lost, ask me a question on this blog or Ask An Expert in the Server Room.

6 Comments Permalink
0

Here’s the final follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the tenth habit: Compare apples to apples.  

IMG_2531-apples.jpg

Much of performance analysis involves comparisons: to baselines, to competitive systems, or to expectations. It is surprisingly easy to make an inappropriate comparison. I have seen it done many times and certainly been guilty of it myself. So the final habit to be aware of is to always compare apples to apples.

Make sure that the 2 systems or applications you are comparing are being run the same way, with the same configuration, under the same conditions. If there is a difference, understand (or at least hypothesize about) the impact of that difference on the performance. Dig into the details about experiments – for some ideas on what to look for, see habit 7.

You should always make this a habit – but it is especially important when you are making decisions based on the comparison. Double-check your work in this case!

This series has given you 10 of the habits I have learned in my years tuning server performance. Of course there are other tricks of the trade and BKMs, which I will try to cover in future blogs. But making these habits part of your routine will help make you a better, more consistent performance tuner. Good luck with your optimization projects!

0 Comments Permalink
0

The Intel XEON Processor 5500 is the new world record holder in >30 top performance benchmarks for 2-socket servers. Check out this video with Pat Gelsinger at the launch event in Santa Clara.

 

 

You can also check out all the performance results here: Server Performance Summary - Intel® Xeon® Processor

0 Comments Permalink
0

I'll be up front, I really don't know what Brittany Spears, Miley Cyrus or Susan Boyle would say about moving from RISC to the Xeon 5500 processor!. What I can share is the feedback that I'm getting direct from customers. I'm currently out on the road and have got some real feedback direct from customers on why they are looking at migrating their solutions from RISC  processors to Xeon processors.

 

Over the past couple of days I have had the opportunity to meet directly with individual customers and hosted a roundtable with several customers to discuss their plans to replace their RISC based infrastructure. The conversation has been very open and frank and has not been about 'should I move' but more focused on 'how do I make the move'. As could be expected the down economy is placing big taxes on the ability of IT organizations to support their business units need for organic growth in a flat to down IT spending environment. A big priority for most of the customers that I spoke with is how to reduce their overall TCO while still meeting the increased demands being placed on IT by their business Partners. Most of the customers are already engaged in active projects to assess moving from RISC or are building their plans to make this migration.

 

During the roundtable I had opportunity to share the latest Xeon 5500 processor performance comparisons Vs the main SPARC and POWER based solutions out there. There was great rejoicing and joy (ok I'm taking poetic license here) in the roundtable when we share some of the results that we highlighted when we launched the Xeon 5500 processor just over 3 weeks ago. So I want to spread the joy and let you read for yourself the performance and price performance benefits.

 

We compared the Xeon 5570 processor vs the top UltraSPARCT2+ in a 2 socket configuration. We took best published results on spec.org and sap (so no funny games at play). The results comparing best UltraSPARCT2+ vs best Xeon 5500 with 1 taken as baseline for SPARC redults were amazing

- 20% better on SAP-SD

- 62% better java performance for Specjbb2005

- 69%better for integer performance SPECIntrate-2006

- 75% better for floating point performance SPECfprate-2006

But the best bit was the cost competitiveness of the Xeon 5500 solutions. Comparing both solutions with 32GB memory, the Xeon 5500 based solutions are offered at approx $11,000 whereas the UltraSPARCT2+ is at $36,000.

 

Compared the Xeon 5570 processor vs the top POWER6 in a 2 socket configuration gave even more staggering results. At the roundtable today customers were amazed. They keep hearing that POWER 6 has leading performance and more GHz so better performance. Right?. Wrong is the answer and I noticed many customers scribbling down the comparisons. Again taking 1 as baseline for POWER results

- 150% better on SAP-SD

- 190% better java performance for Specjbb2005

- 126%better for integer performance SPECIntrate-2006

- 90%better for floating point performance SPECfprate-2006

But the best bit was the cost competitiveness of the Xeon 5500 solutions. Comparing both solutions with 32GB memory, the Xeon 5500 based solutions are 92% less expensive than equivalent POWER 6 offerings.

 

I only shared the specific comparisons vs RISC and have not gone into the architectural advancements of the Xeon 5500 processor and how it addresses real business needs that have been flagged to us. There have been lots of other blogs out in cyberspace over the last few weeks on improvements in IO, low latency etc. so you don't need my 2 cents.

 

I think now is the time to make the move from RISC, what do you think?

0 Comments Permalink
6

OK, so we launched the Xeon 5500 processor based servers and workstations a couple of weeks ago. While I don’t have direct quotes of support from Brit, Miley, Susan or any country presidents who have signed economic stimulus into law I am pretty confident that if they were ever actually considering purchasing a server or workstation they would come to the conclusion that the new Xeon 5500 platforms would be their best choice.

I had the privilege of being at one of the thirty seven different worldwide Xeon 5500 launch events. I was on Wall Street and attended the NASDAQ launch event on March 31st. Based on which data source estimate you look at Financial Services as a whole represents about 20% of the worldwide market for servers. It was also evident when meeting with customers in the NYC area that they are passionate about performance and power consumption. Most of them had received pre-production seed systems and had already done extensive testing prior to this launch event. I have been in Intel’s Server Platform Group for over a decade now and I have never seen so much enthusiasm for a product launch.

I won’t rehash the performance benchmarks and performance per watt data. There are many benchmarks, blogs and press articles doing that. What I took away from the conversations was a feeling of optimism from the end users I spoke to. Some people felt that these new products would be what it takes for them to deliver solutions that would give them a performance advantage over their competition. In few markets does that pay off more, and translate almost directly to the bottom line, than in Financial Services. Others felt that these systems would help them continue to add to their existing datacenters without having the need to build a new one. This was due to the performance per watt improvements and the end users ability to replace many old servers and workstations with a few new ones.

Lastly, I think human nature being what it is we are seeing that IT professionals want to work on cool new projects. These Xeon 5500 servers and workstations represent a shiny new toy that IT professionals can use to have a material impact on the bottom lines of their companies. To some degree the same applies to virtualization in that it is disruptive and provides a new cost effective way to deliver legacy solutions and also enables flexibility for future growth. The IT folks that I have met who familiarize themselves with virtualization, new hardware and advanced management techniques (power, systems, virtualization) generally are viewed internal to their companies as leaders with visionary capabilities.

As we all work through this economic morass I am hopeful that with new technology introductions, and a relentless focus on efficiency, we will all emerge with a greater level of capability and a higher degree of flexibility. I also believe IT will emerge as a key asset of differentiation for companies from Wall Street to Main Street and this will place an even greater burden on delivering solutions to meet those unique needs.

What do you think?

Shannon

shannon.poulin@intel.com

6 Comments Permalink
3

This blog post is meant to discuss some of the considerations for performance tuning your Intel® Xeon® Processor 5500 (“Nehalem-EP”) series based server. I’d like to do this by discussing the un-boxing process.

Step 1. Place the box on the floor

Step 2. Open the box

Step 3. Carefully remove the server.

Step 4. Plug the server into a keyboard, mouse, and monitor.

Step 5. Plug the server into the wall socket.

Step 6. Power on the server.

There. You are done tuning your Nehalem-EP based server for performance. “Really?” you ask? Well mostly. There are some considerations and I’ll discuss them. I can speak to this subject as I was asked to tune this class of system using the TPC-Cand TPC-Ebenchmarks.

BIOS / Firmware / Drivers

It is very important to remember to update your system's BIOS, firmware, and OS drivers before you do any deep performance tuning. I cannot over state the importance of this step. Your system's manufacturer should be able to provide the latest BIOS and firmware associated with your server. OS drivers are available through many sources these days. Typically these can be downloaded from OS vendors, hardware vendors, from the Linux open source community, or the platform's manufacturer.

A good example of this is the SATA driver associated with the ICH10. The ICH10 is part of the chipset that supports Nehalem-EP. I recommend going to Intel’s website and using the Intel Matrix Storage Manager driver for the SATA controller.

Understand your system

Last year, Nehalem launched for the desktop market segment. Now it is time for the server market. The Nehalem-EP processor is meant to be used in dual processor (DP) socket systems. Nehalem-EP is the follow on to the Intel Xeon Processor 5400 (“Harpertown”) series. However, Nehalem-EP is really very different from Harpertown. The Nehalem-EP processor is based on the Intel Core i7. The Nehalem-EP processor inherits the same architectural features as the Intel Core i7. Once you understand these features, then you can better tune your system for performance.

L3 Cache:

Nehalem-EP uses a level 3 cache. Depending on which SKU you are using it can be 4MB or 8MB in size. If you are interested in performance, then I would encourage you to pick the larger cache size SKU.

Hyper Threading Technology:

If some threads is good then more threads is better. This is where Hyper threading technologycomes in to play. Nehalem-EP provides this technology out of the box. So, on a typical DP server this will give your system 16 threads of processing goodness.

Intel Quick Path Interconnect:

Nehalem-EP supports a CPU interconnect known as Intel Quick Path Interconnect (QPI). This interconnect is the replacement for the Front Side Bus of old. QPI provides a point to point link to each of the processors and the Intel X58 chipset. The Nehalem-EP supports QPI speeds of up to 6.4GT/s. This provides a theoretical bandwidth of 25.6 GB/s. This is a welcome shift for Intel’s designs for the future.

Turbo Boost Technology:

As with the desktop SKUs of Nehalem, the Nehalem-EP supports Turbo Boost technology. This technology will run the CPU at a higher frequency than its rating. It will increase the frequency in steps of 133MHz until it achieves its upper thermal and electrical design requirements. Turbo Boost Technology is dynamic. In other words, the processor will decrease its core frequency if the temperature is too high. If your application is sensitive to core frequency changes and does not fully utilize all cores, then it may benefit from this technology.

Integrated Memory Controller:

Another key feature of Nehalem based processors is that they have the memory controller integrated into the processor. This allows for much lower memory latencies. The Nehalem-EP supports three channels of DDR3 memory. It is important to talk about DDR3 memory and population on Nehalem-EP based servers. As mentioned before Nehalem-EP supports three channels of memory and supports 800, 1067, and 1333 MT/s memory speeds. Those speeds are dependent on how many channels are populated with DIMMs. For instance, 1333 MT/s is supported in a single DIMM per channel configuration. 1067MT/s is supported in a single DIMM per channel and two DIMMs per channel configuration. 800MT/s is supported in all configurations. These speeds are based on dual ranked DIMMs. If you plan on filling up all the memory slots with as many DIMMS as possible you will end up running at 800MT/s. So, here is the consideration. Does your application need all that memory or could it use less memory running at a higher speed? If the answer is yes to the latter, then perhaps running two DIMMS per channel at 1067 MT/s is the best configuration.

To wrap things up here, we have looked at the new and Nehalem architecture, the importance of BIOS/ firmware/ OS drivers, and memory population. Your application's performance will vary, but I hope I have given you some things to narrow down your performance testing. Thanks for taking the time to read this blog post. For more great performance methodology tips please check out Shannon Cepeda’s blogposts on performance tuning.

3 Comments Permalink
0

Here’s the 9th follow-up post in my 10 Habits of Great Server Performance Tuners series. This one focuses on the ninth habit: Don’t Break the Law.

test2.JPG

Amdahl’s Law, I mean. Amdahl’s Law tells you how much improvement you can reasonably expect from tuning a part of your system. It is often used in the context of software optimization and parallelization. Basically, what it says is that the potential improvement (speedup) you will get from applying an optimization with a speedup of X to a fraction F of your system is equal to 1/((1-F) + F/X). More generally, the speedup you will get depends on how much of your system the optimization affects as well as how good the optimization is.

For example, say you think you can speed up a function that takes 25% of your application’s execution time. If you can speed it up by 2x, then the potential speedup of your whole application, according to Amdahl’s Law, is 1/((1-.25) + .25/2) or a 1.14x speedup. Knowing something like this means you can evaluate which is more important: a 2x optimization affecting 25% of your code or a 4x optimization affecting 10%. (It’s the 25% one.)

Amdahl’s Law can also be used in other situations, such as estimating the potential speedup from parallelization or evaluating system tuning changes. It can be tricky in certain cases, such as when there are overlapping or dependent parts of the system. So use your intuition as well. However, in general using this law can help you to focus on making the common usage case faster for your system.

Once you have a good understanding of Amdahl’s Law, you may want to check out Gustafson’s Law and Little’s Law as well. All are commonly used in performance optimization. Being armed with the knowledge of these theoretical basics can help you to sniff out suspicious performance results or conclusions, as Neil J Gunther humorously wrote about here.

So stay out of trouble with the law (both the ones I mentioned and the legal kind!), and look for my post on the last habit next month.

0 Comments Permalink
1 2 Previous Next

Filter Blog

By author: By date: By tag: