Skip navigation

I recently had the opportunity to sit down with Intel's Chief Virtualization Architect Rich Uhlig to discuss the new usage models and virtualization technologies in Intel new Xeon 5500 series platform. Rich and I have been friends and colleagues for several years and the video of our discussion is attached and can be viewed on Youtube. The conversation sparked some interesting questions from my colleagues, friends and children which I thought I would share with a wider audience.


First the questions from my son's (I have three boys...yes this means that my wife has the patience of a saint):
Dad, what is virtualization? Does that mean you can take people and computers and teleport them to new places, like Star Trek? Did Intel invent virtualization? Why do you think it is so cool? When I grow up, can I be virtualized?


My Answer:
Slow down.....slow down...let me try to answer the questions one at a time.


Virtualization is the ability to increase computer, network and storage utilization with multiple operating systems or logical machines, called virtual machines. This allows Dad and his friends to use more of their computers with different applications and devices. Using virtualization allows Dad and his friends to save money, save power and increase efficiency.


Response (My three son's in unison):
Boring! I thought you said your job is cool. Your such a geek......(trailing off and looking at their iPods)


My response:
Guys, hold on...let me explain. Virtualization technology IS cool. While it wasn't invented by Intel, we have worked with an industry of incredibly gifted engineers, architects and designers to create new ways for people to use their computer technology....and the best part is we are only in the beginning. By the time you are an adult you will have the opportunity to use virtualization technology in ways we are only beginning to imagine. Think of virtualization as a journey and evolution of computer technology for Dad and his friends to maximize the use of the computers that we buy/build. Hopefully, with more innovation and computer technology advances you will be able to create a virtualization layer that will allow you and your digital identity to "teleport" to new places in a virtual cloud. You won't be "virtualized" but you will be able to create your digital environment wherever there is a machine that can understand your commands. That is pretty cool. Think of it this way, you can save and play your Nintendo Wii, Sony Playstation or XBox profiles on any machine, any where in the world that can download your profile.


Response (from my 13 year old):
You mean I can play EA's Madden Football 24 hours a day with my friends, even when we are on vacation and you want me to see some historic landmark, like the Lincoln Memorial?


My response:
Well...yes but not exactly what I had in mind. (aargh!)


A recent question from my friend from a former job on Virtualization:
I hear the new Intel chip, Nehalem (formerly known as the Intel Xeon 5500 series), is the best product you guys have released in a long time, What makes the product so good, is it the virtualization technology that you work on?


My response:
Virtualization technology provides increased instrumentation and flexibility for the Intel Xeon 5500 series platform but it is only one a host of fantastic features which make this product the best we have ever released. For Data Center managers, increased efficiency is an every day part of life. Nehalem offers increased performance, increase memory capacity, a new Quick Path Interconnect (which acts like a NUMA switch fabric on silicon, remember that cool product we launched in 1997 at Sequent Computers?) and a 2nd generation of virtualization capabilities that deliver native virtualization instruction capabilities for VMWare, Microsoft, Citrix and a host of Xen providers. It is a truly a breakthrough server product. With this new architecture and design characteristics we are able to meet the needs of a platform of new Virtualization usage models including: Rapid Application Deployment, High Availability, Virtual Desktop Infrastructures and Server Consolidation. It is a very exciting time...


My friend's response:
Very cool. I miss working on hardware innovations...sounds like you guys at Intel are up to something special. Should I buy the stock?


My response:
Thanks. Intel is a great place to work and we are doing some very cool product innovations. Do we always have to talk about stock price?


Finally, a recent question from a dear colleague:
What happens if virtualization technology is deployed on every platform that Intel ships? Won't business and consumers need less devices? Won't users no longer have an insatiable demand for compute, network and storage resources?


My Answer:
Funny you should ask that question. Rich Uhlig, Fernando Martins, Rick Olha, RK and I have debated this exact question for years. The answer is simple. Virtualization increases demand for more resources than ever before. In fact, until the recent economic downturn virtualization technology was cited by a Citigroup analyst as the key driver to Server growth in 2H 2007. For the first time in over 10 years the markets average selling price was increasing. Why? Because users could do more with every server they purchased. Virtualization actually facilitiates more usages on more application development and production environments than ever before. As we increase the performance of the instruction sets and Intel Microarchitectures we increase the capabilities that virtualization can impact for new usage models, while preserving some the legacy compatibility that users require for 32-bit application workloads. Simply stated, "we can do more with less!"


Next question (by the way this was a skeptical Intel exec.):
Doing more with less is fine...but what about our volumes for server products? what happens when virtualization is prevalent across all of Intel CPU and Platform offerings?


My response:
Flexibility and control are critical to all of our customers regardless of form factor. Is there anything worse then buying a new server, PC or handheld and having application compatibility errors? No. Do we really believe the world wants to become software compatibility specialists everytime Microsoft releases a new operating system? What about Dell, HP, Lenovo, IBM, Acer, Nokia, Motorola, LG, Samsung, RIM and HTC? It has taken us over 10 years of research, testing and product development to get here. Virtualization is a "Hot Topic" today and will be in the future because it makes a positive difference in our customers lives both financially and efficiently. Our job is deliver the greatest silicon products the world has ever seen, over and over and over again. Virtualization allows us to do that AND preserve the investments our customers and software partners make in developing their own operating environments. What is cooler than that? Virtualization facilitates innovation, consumption and utilization, our customers are telling us this everyday. Innovation is critical to this process, enabling our software colleagues is a must and opening up the discussion is part of the process.


Her response:
Well, I guess you are pretty passionate about virtualization?


My response:

I hope so...that is why you hired me.






Have a listen, enjoy the video and join the discussion of Rich and I. For us, Virtualization is a very Hot topic, that we have thought is Cool for a very long time.





Did you know that many electrical utility companies are offering rebates for companies that purchase energy efficient IT equipment such as servers, PCs and power management software?

Why are utilities doing this? Today’s high cost of energy and the availability of Federal stimulus dollars for energy efficiency programs are making this an ideal time for utilities to offer customers incentives for investing in energy efficient computers and servers. Federal agencies are directing funds to utilities to support these incentives.  Also, state legislation often requires many utilities to devote a portion of revenues to fund energy efficiency programs, including encouraging the purchase of energy efficient IT equipment such as servers, PCs, and power management software.

In the United States, there are currently 20+ utilities that are offering rebate incentives for the purchase energy efficient IT equipment with another 70+ utilities considering or in the process of rolling out a rebate program. Here’s a list of utilities that we know of (as of July ’09).



• Arizona Public Service Company

• Austin  Energy

• Avista

• BC Hydro (Vancouver, BC)

• Bonneville Power Administration

• Energy Trust of Oregon

• Idaho Power

• Los Angeles Department of Water & Power

• Manitoba Hydro

• Northeast Utilities

• Oncor Energy

• Pacific Gas and Electric

• Sacramento Muni Utility District

• San Diego Gas and Electric

• Seattle City Light

• Silicon Valley Power

• Snohomish PUD

• Southern California Edison



In addition to the savings that can be achieved just by consolidating multiple older servers with newer Xeon® 5500 (Nehalem) servers, getting additional cash back from the utility companies can make the decision to refresh your server infrastructure that much more lucrative.



Let me know if you are aware of other rebate or incentive programs offered by your utility company (U.S. or another country).

54 days to Fall IDF in SFO!  Perhaps I should be a bit less enthusiastic, as during the course of the next two months, I will be extremely busy working on courses, presentation, demos, web updates and new collateral pieces highlighting Intel’s contributions to server and data center instrumentation, data center efficiency and eco-technology.  In addition to those responsibilities, I have taken on ownership of driving a technology blogging program at IDF, with server technology experts sharing their insights here on Server Room – an opportunity that I am very excited about, but I need your help.



My question to you today is – what would you like to see covered in the technology blogs from IDF?  I am starting the process of recruiting “volunteers” to participate, and understanding what you want to see discussed will help me to get the right people to cover the topics that are compelling to you and hopefully facilitate an interesting dialog that will help you to better understand server technologies.  Since its easy to self-recruit, you will definitely see a blog from me covering instrumentation, Intel Intelligent Power Node Manager and other related technology news @ IDF.



So what do you specifically want to see covered in the IDF blogs?  I look forward to you inputs and hope to see you at IDF!


I wrote a while back about how the Xeon 7400(Dunnington) processor series compared to RISC. Since then I have shared information through other blog posts and sharing content about how Xeon 7400 and Xeon 5500 will compare to both SPARC and POWER.


Xeon 7400 and Xeon 5500 are the current products shipping into the marketplace today. I.M.H.O they offer a pretty compelling alternative from both a performance and TCO perspective Vs SPARC and POWER. But I will not try and repeat all the reasons here


What I wanted to share with you was some thoughts about what the next product to succeed Xeon 7400 will bring to the RISC party. Nehalem-EX is the code-name for our next generation of product designed to serve workloads currently serviced by Xeon 7400 today (i.e. Database, ERP,  BI etc). EX btw is what we all would traditionally call MP or multi processor servers


Don't stop reading now, here is why I'm EXCITED about what Nehalem-EX will bring to the RISC party.

My excitement is actually based on real customer discussions about what Nehalem-EX will do for them and why it delivers some new stuff (my code for features and benefits) which they see as a pre-requisite to make the move from RISC to Xeon. For some customers the TCO and performance of  products have been enough to convince them to move. For some other customers there are still some checkboxes remaining which I believe Nehalem-EX will address

Here is a snapshot of some of the cool new stuff which is actually convincing customers (from some real deals that I have worked)

    1. Improved bandwidth. Up to 9 times memory bandwidth of previous generations
    2. Introduction of Quickpath Interconnects to the EX systems
    3. Add new RAS features previously seen on Itanium products to Xeon products
    4. Significant improvement in performance vs previous generations e.g. Database 2.5xe
    5. More scalable platforms through 8 OEMs offering >8S. These platforms are key to manage large databases and for large scale consolidation
    6. Mainframe class availability in scalable platforms


For more information check out the press briefing from May. See more the details in the presentation




Nehalem-EX goes into production later this year and I am pretty excited about how it will change the game. What do you think?

I have been working in servers for almost 10 years with intel and in the last 4 years, much of my job had me focused on how IT uses server technology to create business value.  It was an awesome experience where I learned new things every day from OEMs, Intel's customers, sales force and with social media ... many extended.


During my role as an end user product marketing manager, I found that Intel IT was an extremely valuable source of learning for me to understand the end customer of the products that Intel makes and enables.   I also found that end user IT organizations valued hearing how Intel IT was approaching business challenges and deploying technology solutions to create value.  So when I was given the opportunity to move into Intel IT and be part of that team and learn from the inside out, my decision was easy.


I'm excited about seeing all aspects of technology (client, server, storage, network, facility, pc companions ... ), how IT aligns to business goals, makes investment trade-offs, implements new projects without disrupting business processes, and a host of other topics.  My learning curve is steep and fast (just the way I like it).  So as I transition and learn about Intel IT from the inside out, I will continue to share my experiences and learnings as i go


If you'd like to follow along real time on my Journey to the Center of IT, follow me on twitter (@Chris_P_Intel) as I will share the things I find, learn and explore.  Let me know if there is something you want to know.


In the mean time, check out the various resources available from IT@Intel including technical / business whitepapers, tools, videos and blogs.  Many of these are already on my required reading list.



Intel's Clayton Craft shows and discusses a HP Z600 Workstation featuring the Intel Xeon 5500 processor at the HP Tech Forum.


Mike Lafferty (Intel) demonstrates the Xeon 5500 Processor series, code-named Nehalem. Check out the video....







I talk with a lot of customers.  Since the initial disclosure, there has been a groundswell of EXcitement(bad geek pun) for the upcoming Nehalem EX launch.  I think this is primarily driven by the realization of just how significant the Xeon 5500 ( aka Nehalem EP ) product has been.  Xeon 5500 delivered an unprecedented leap in Xeon performance, the biggest ever.  Things are pretty good if you can get 30-40% per generation.  Nehalem EP was 2-3 times the performance of the previous generation.



Nehalem EX looks well positioned to take this crown away, delivering the biggest leap ever.


Nehalem EX arrives in the box with:

  • Up to eight cores / 16 threads with hyper-threading and 24MB of cache
  • Up to 9x the memory bandwidth over  and up to 64 memory slots in a four socket platform
  • Over 15 eight-Socket+ designs from 8 OEMs coming
  • New RAS technology - Machine Check Architecture (MCA) – formerly reserved for high end Itanium systems



This is exciting from a performance perspective, but even more exciting as an opportunity for consolidation and migration.



Large enterprise applications – ERP, CRM, Decision support have been the domain of the scale up SMP architecture.  Clouds and grids are making progress, but these applications are often easiest to manage on a single image.  In order to meet service level requirements ( like completion of close in under 6 hours ) IT managers have resorted to 16, 32, even 64 processor RISC systems.  This scale up domain has been mostly outside the scope of Xeon systems.



Nehalem EX with 64 threads in a four way box changes the math.  Systems requirements that forced scale on expensive proprietary RISC architectures can now run on a Xeon platform.  With systems up to 128 threads there are very few enterprise applications that will not fit into this box


It is a one-two punch.  SMP scale and mainframe class reliability features.  The opportunity to migrate of legacy RISC to Xeon is upon us.

Steve Phillips with Cisco gives us a tour of "The Datacenter of the Future". Check out this short video:



What does your Datacenter of the Future look like? How can Intel and Cisco help?

Competition, Comparison, Self Improvement, Benchmarking.


We do them in business. We do them in our careers. We do them in our leisure. ... and if you are like me you like to watch them on TV or live as well. Who is watching Lance Armstrong? or Tiger Woods? or their favorite sports team compete regularly.


IT professionals are no different.  Today, one of the business emhpasis points for IT is energy efficiency.  Now there is a way for you to quickly compare your own IT organization against itself and others.  This IT self-assessment tool takes about 2-3 minutes to complete and will answer these three questions


  1. How efficient is your server infrastructure today?
  2. How do you compare to your peers?
  3. How much more efficient could you be?


The Community Window: Server Efficiency is a tool hosted on the Intel Premier IT Professionals website ( where registration is free and so is the information and best practices shared by other IT professionals throughout the industry.  Join and conduct your Server Efficiency self assessment today.   Chris

server efficiency tool.bmp

Learn about Intel IT’s proof-of-concept testing and total cost of ownership (TCO) analysis to assess the virtualization capabilities of Intel® Xeon® processor 5500 series. Our results show that, compared with the previous server generation, two-socket servers based on Intel Xeon processor 5500 series can support approximately 2x as many VMs for the same TCO.

One of the recurring themes that I've been noticing from end-users who are testing or evaluating Intel Intelligent Power Node Manager (or Node Manager) - the question is "How do we turn it on or off?"  To put it simply - when you have a Node Manager capable platform - you can simply put it to work and let your power policies decide when to enable/disable the features...


So let me step things back a bit and talk about the technology itself first.  Node Manager is very much like any *T technology that Intel has deployed over the past several years, it's an ingredient - or in this scenario a mix of ingredients that is available at the platform level.  Here are the 'ingredients' that when combined, give you the ability to monitor/manage power, and in some cases monitor thermal events.

        • The platform is based on the Xeon 5500 Series Chipset (codename Tylersburg-EP) server board
        • Xeon 5500 Series Processors (codename Nehalem EP)
        • Node Manager Enabled Firmware with the Manageability Engine
        • Server chassis components that meet IPMI 2.0 specifications for monitoring (e.g. thermal monitoring)
        • PMBUS Power Supply - this communicates with the Baseboard Management Controller (BMC) for platform power usage


For those of you wanting to get your hands on this technology TODAY - check out the Intel Server linueup:

  • Intel® Server Board S5500WB (codename Willowbrook) which is optimized for IPDC deployment, and supports IPMI 2.0, Intel Intelligent Power Node Manager, and can also support the Data Center Manageability Interface (DCMI) 1.0 specification.
  • Intel® Server Board S5520UR (codename Urbanna) is the mainstream Enterprise platform which support IPMI 2.0 and Intel Intelligent Power Node Manager


Both platforms work in conjunction with Intel® Data Center Manager (Intel® DCM) which is the SDK which provides power and thermal monitoring and management.  This SDK allows group and policy based management for single server, rack, logical group, lab, or whole datacenter models.


Ok - so that reads like a bunch of marketing stuff... but here's the 'guts' of the technology...


When you purchase a Node Manager enabled server, there are a few simple steps to take to set things up to monitor/manage your server.


Most likely you'll need to setup your BMC, Intel provides a CD based implementation to help with this in our servers - it's called the Intel Deployment Assistant.  This lightweight OS bootable CD can setup the most common BIOS settings, check versions of firmware and update them via Internet connection to ensure you have the latest BIOS, BMC, ME and Sensor firmware.  Each OEM will have their own methods but should be similar in function when it comes to setting up the server for monitoring.


The BMC needs an ip address, netmask, and default gateway setup - and according to IPMI specifications - you can also set the administrative (user) access rights if you would like to tighten down security a bit.  Once you have these access points setup - you can utilize standard IPMI commands to communicate with your server or use Intel DCM to really  'visualize' the capabiliites of Node Manager.


Here's a great demo video showcasing some of the Node Manager & Intel DCM use cases:


How many of you have worked with IPMI management before?


The technology that has been around for a while, but now Intel has put automation and policy based management features into the platform - thereby reducing costs, increasing responsiveness to power policies, and also making Xeon Servers more energy efficient than before.  Many of our customers are asking for Node Manager enabled servers - is your OEM on track to deliver?


Your first server, that is. There’s nothing like a real server to help your business become more competitive. While some small businesses can get away with using a desktop as a server, there’s really no substitute for the real thing. With the ability to more efficiently handle more users, accommodate the latest applications, and deliver greater reliability, having a real server will make all the difference.


Aren’t you ready for the real thing? A server built on an Intel® Xeon® processor has a lot to offer your business, so be prepared for dramatic performance and productivity improvement. If you want to be able to handle the demands of more customers, more data and more staff, an Intel Xeon processor based server is the way to go.



Can you afford downtime? Of course not! Maximize business uptime with technology that’s ready to work all day, every day.  And protect your critical digital assets with error correcting memory and support for RAID storage.


And just to build the IT excitement for your first server, check out this animation to see what a real server can do for your business:












So, if you’re flirting with transitioning to a real server, I would just advise that you make sure that your first really is the best.  Talk to your IT solutions provider [] about implementing an Intel Xeon processor-based server. And remember, once you go Xeon, you’ll never go back!

As I've blogged before, my job takes me to many places and I get see all kinds of cool technology when I get there. This example is no exception, I've put together a short video with Steve Cumings of HP showing a tour of the Performance Optimized DataCenter or abbreviated as "POD". Its actually the same size as a standard container for shipping anywhere around the world. These type of assets are vital in time of need such as disaster recovery. Take a look and let us know how you could use this cool technology.


There are two technologies available to regulate power consumption in the recently introduced Nehalem servers using the Intel® Xeon® processor 5500 series.  The first is power proportional computing where power consumption varies in proportion to the processor utilization.  The second is Intel® Dynamic Power Node Manager (DPNM) technology which allows the setting of a target power consumption when a CPU is under load.  The power capping range increases with processor workload.


An immediate benefit of the Intel® Dynamic Node Manager (DPNM) technology is the capability to balance and trade off power consumption against performance in deployed Intel Nehalem generation servers.  Nehalem servers have a more aggressive implementation of power proportional computing where idle power consumption can be as small as 50 percent of the power under full load, down from about 70 percent in the prior (Bensley) generation.  Furthermore, the observed power capping range under full load when DPNM is applied can be as large as 100 watts out for a two-socket Nehalem server with the Urbanna baseboard observed in the lab to draw about 300 watts under full load.  The actual numbers you will obtain depend on the server configuration: memory, number of installed hard drives and the number and type of processors.


Does this mean that it will be possible to cut the electricity bills by one third to one half using DPNM?  This is a bit optimistic.  A typical use case for DPNM is as a "guard rail".  It is possible to set a target not to exceed for the power consumption of a server as shown in the figure below.  The red line in the figure represents the guard rail.  The white line represents the actual power demand as function of time; the dotted line represents the power consumption that would have existed without power management.




Enforcing this power cap brings operational flexibility: it is possible to deploy more servers to fit a limited power budget to prevent breakers from tripping or to use less electricity during peak demand periods.



There is a semantic distinction between energy management and power management.  Power management in the context of servers deployed at a data center refers to a capability to regulate the power consumption at a given instant.  Energy management refers to the accumulated power saved over a period of time.


The energy saved through the application of DPNM is represented by the area between the dotted line and the white graph line below; the power consumed by the server is represent by the area under the solid white graph line.  Since power capping is in effect during relatively short periods, and when in effect the area between the dotted line and the guard rail is relatively small, it follows that the energy saved through the application of DPNM is small.


One mechanism for achieving significant energy savings calls for dividing a group of servers running an application into pools or "platoons".  If servers are placed in a sleeping state (ACPI S5 sleep) during periods of low utilization it is possible to bring their power consumption to less than 5 percent of their peak power consumption, basically just the power needed to keep the network interface controller (NIC) listening for a wakeup signal.


As the workload diminishes, additional servers are moved into a sleeping state.  The process is reversible whereby servers are taken from the sleeping pool to an active state as workloads increase.  The number of pools can be adjusted depending on the application being run.  For instance, it is possible to define a third, intermediate pool of power capped servers to run lower priority workloads.  Capped servers will run slightly slower, depending on the type of workload.


Implementing this scheme can be logistically complex.  Running the application in a virtualized environment can make it considerably easier because workloads in low use machines can be migrated and consolidated in the remaining machines.

We are conducting experiments to ***** the potential for energy savings.  Initial results indicate that these savings can be significant.  If you, dear reader have been working in this space, I'd be more than interested in learning about your experience.


If this topic is of interest to you, please join us at the Intel Development Forum in San Francisco at the Moscone Center on September 22-24.  I will be facilitating course PDCS003, "Cloud Power Management with the Intel(r) Xeon(r) 5500 Series Platform."  You will be the opportunity to talk with some of our fellow travelers in the process of developing power management solutions using Intel technology ingredients and get a feel of their early experience.  Also please make a note to visit booths #515, #710 and #712 to see demonstrations of early end-to-end solutions these folks have put together.

  It has been nearing a month since I posted my blog on Extended Page Tables and it's niceties, I had promised to come up with a follow-up blog with some hands on test runs I had planned to run in my lab. With burgeoning  to do list from work and endless meetings per day, finally I made sometime and setup a testbed in the lab.

My goal was to run some workload on the hardware setup with Extended Page Table enabled to help the virtual machine to  translate memory address, then I also planned to rerun the workload on the same hardware setup without EPT and perform a comparison of both result sets. I wanted to keep the test simple enough to achieve my goal while making sure results are repeatable with multiple runs.


I decided to use open source workload called DVD Store, this workload was developed by Dell and passed over to open source community, it comes in varients of Microsoft SQL server, Oracle Database server and MySql Database server. The Database schema is made up of eight tables and few store procedures and transactions. The workload comes in Three different DB sizes of 10MB, 1GB and 100GB. However being an open source workload, it allows us to tweak the size of the database and customize is to suit specific size requirement. I went ahead and tweaked the database to be of 2GB in size, this allowed me to fit the Database and log files on the storage devices I had in the lab without going for an expensive SAN based storage. As the name of the workload says, this is a order processing OLTP database workload simulating customers browsing through the store and adding selected DVDs and completing the order. Primary metric coming out of the workload is the number of orders processed during the workload execution period, secondary metric is average milliseconds taken to process each order.





Intel S5520UR Dual socket server.

CPU: Intel Xeon X5550 2.67 GHz 8 cores


Hard drive: 500GB SATA II 7.2K RPM holding OS partition, Intel® X25-E Extreme SATA Solid-State Drive 3 Nos.

NIC: Embedded 1Ge full duplex.

Keyboard, mouse and Monitor



Gateway E-4610S SB

CPU: Intel Core2 Duo 4300 1.80GHz


Harddrive: 80GB SATA II

NIC: Embedded 1Ge full duplex.

OS: Windows XP professional with SP3.






Microsoft Windows 2008 enterprise server 64bit edition

Microsoft SQL 2005 64bit


I wanted to go with Solid state drives to ensure I am not disk bound anytime while running the workload, the alternative to run the workload without SSD would be to use a boatload of conventional hardrives increasing the setup complexity and foot print of my test hardware. Just using 3nos Intel SSDs makes life easier and provides terrific I/O performance.  ESX was naturaly the choice of hypervisor with 3.5 update 3 used in test run without EPT and ESX 4.0 to execute workload with EPT.


Test Methodology


I not going to delve deeper on how to setup the environment, OS instalaltion, application setup, and customizing the workload these topics are out of scope for this blog. But since it is required to know on how I ran my tests, I will talk about the methodology just enough for readers to understand the workload execution method and test duration, which helps in understanding the result chart below. Test was run from the client machine usinf workload driver and was ran for 10 minutes at a stretch and for Three times just to ensure the results were repeatable. The number of orders executed were pretty much close to with +- 100-200 OPM.







Above chart shows number of Orders the server was able to execute per minute. The X-Axis represents the number of vCPUs allocated to the virtual machine and Y-Axis shows the orders per minute. With each additional vCPU added to the virtual machine the number of orders executed by the server increases, as you can notice in the chart there is a 15%, 18% and 31% increase in number of OPM clearly scaling up with additional vCPUs allocated to the virtualmachine.


Response time.



Above chart shows the average response time to complete one order. There is a 15% to 40% decrease in response time between server running without EPT and server enabled with EPT.  In addition to improvement in response times, the server does 15%-30% more number of transactions.




When I completed my workload execution and came started seeing the data, it was apparent to me that EPT plays the major factor in improving performance of any virtualized workload. With virtualization technology achieving wide spread adoptability, IT orgs are exploring on how virtualize applications which were left untouched till now due to fear of peformance degradation and blowing up the SLA promised to the business. But Technologies like EPT provides enough reasons for the IT managers to start thinking about virtualizing critical workloads like SQL, Exchange etc. This is the last part of the Two part series blog in EPT. Feel free to comment if you have any questions.


Bhaskar D Gowda.

The need to write scalable applications has been important for programmers in the HPC community for years. Now, with the proliferation of multi/many-core processors developing scalable software is now a top priority for many programmers.

Andrew S. Tanenbaum stated at the USENIX ’08 conference last year that developing “sequential programming is really hard” … the difficulty is “parallel programming is a step beyond that.”

He is right, but let’s illustrate why it is just a small step.


Here is the point – parallel architectures will continue proliferating and we will need to develop and refine parallel algorithms that exploit parallelism. While difficult, to develop and refine parallel algorithms, the actual programming of these new algorithms, does not need to be hard.  However, if the developer is required to know the intimate details of the hardware then the development and refinement parallel algorithms can be very difficult, and very time consuming.


One approach provided by Intel software developer tools is to abstract away the details of the hardware.  This allows the developer to focus on their algorithms /applications, and rely on Intel software developer tools to provide the best optimizations for current and future platform While you may give up some performance by being abstracted away, what you lose in performance will be rewarded by your ability to quickly iterate through more iteration of your parallelization ideas in less time.  You may find yourself designing and developing better approaches to parallelism because you were able to test more hypotheses.


An additional by-product of being abstracted away from having to know the intricacies of the hardware is that your software will be highly adaptable to future platforms.  You will see tremendous improvements on multi-core solutions and will be in a great position scale your application performance forward as newer architectures are made available. To learn more Intel Software Tools and the benefits of optimizing your software on multi core based solutions first visit

Your most valuable employee is the one that creates tomorrow’s successes.  Providing them tools that help them do that faster will help your organization create new products or optimize old ones more rapidly.  The benefit to the organization is increased opportunities to win the customer’s attention via new products or your responsiveness to their request; the employee gets to brag on what he or she just helped bring to market. Before we get too far let’s look at Intel’s mission with respect to workstations.  We are laser focused on supplying technology that provides users with an uncompromised experience in transforming their ideas into reality.  With that in mind we look at how users create; we try understanding their obstacles and work with the ecosystem of hardware and software providers to deliver solutions to real problems that may be inhibiting their opportunity to innovate.  One technology that is helping users innovate faster is virtualization. No, we are not looking to remove the workstation from the user’s desk or share his or her workstation with peers, who also need a workstation.  We are using virtualization to deliver the performance they need to innovate faster.


The Observation


We saw workstation user’s innovation slow as they multitasked between tasks – some of them not even theirs.  The involuntary tasks included deploying IT security patches, updates, and system backups to name a few.  We also saw that users were no longer just doing Computer Aided Design (CAD) alone, but they were doing CAD, using productivity tools, meshing, web surfing for supporting facts, collaborating via video and Instant Messaging (IM) tools, digital white boarding and trying to do analysis-driven design.  They were very busy people who can’t afford any downtime or slow time.

In some cases we noticed that some users actually had not one, but two or more workstations running in completely different environments, many times with different OSs.


The Problem


What the above really lead to is a conclusion that too many tasks were going after too few resources and that the experience we had hoped the user would encounter was not happening.  In fact the reverse was happening – interactive creative tasks were slowing, system sluggishness was at an all time high.  The “uncompromised experience in transforming their ideas into reality” we wanted for a workstation user was not there and any innovation that was possible was slowed down to a crawl.


A Potential Solution


Intel® Virtualization Technology for Directed I/O (Intel VT-d), once just thought of for servers actually has a place in the workstation market.

This technology provides an important step toward enabling a significant set of emerging usage models in the workstation. VT-d support on Intel platforms provides the capability to ensure improved isolation of I/O resources for greater reliability, security, and availability.  That is a mouth full let’s see it in action.

There are two key requirements that are common across workstation usage models.

1.    The first requirement is protected access to I/O resources from a given virtual machine (VM), such that it cannot interfere with the operation of another VM on the same platform. This isolation between VMs is essential for achieving availability, reliability, and trust. This helps you get the performance you want from your workstation.

2.    The second major requirement is the ability to share I/O resources among multiple VMs. In many cases, it is not practical or cost-effective to replicate I/O resources (such as storage or network controllers) for each VM on a given platform.


In the case of the workstation, virtualization can be used to create a self-contained operating environment, or "virtual software appliance ," that is dedicated to capabilities such as manageability or security. These capabilities generally need protected and secure access to a network device to communicate with down-the-wire management agents and to monitor network traffic for security threats. For example, a security agent within a VM requires protected access to the actual network controller hardware. This agent can then intelligently examine network traffic for malicious payloads or suspected intrusion attempts before the network packets are passed to the guest OS, where user applications might be affected. Workstations can also use this technique for management, security, content protection, and a wide variety of other dedicated services. The type of service deployed may dictate that various types of I/O resources, graphics, network, and storage devices, be isolated from the OS where the user's applications are running.


The Result


In collaborating with virtualization and automation leader, Parallels, on its Parallels Workstation Extreme solution,  we identified two impediments to workstation user productivity.  The first was the issue around general resource overhead that afflict a traditional virtualized workstation system due to  insufficient resources to address the overload of requests. The second issue explored includes the more complex problem of a single workstation with the need to support multiple OSs and display visualization programs at near- or full-performance within virtualized machines. The first issue was more straightforward - create VMs, partition resources and now the user has a very resilient workstation that is capable of delivering the intended experience.  IT can have their VMs and the user has his or her workstation back and the concept of digital prototyping to create and explore a complete product before it is built is a reality.  The creative innovator in the company can now iterate through more ideas in less time and your company created more opportunities to catch the customer’s attention just went through the roof. The second issue offered a more complex challenge.


We identified certain industries such as the oil and gas exploration space where users actually had two or more physical workstations - one running Windows, the other running Linux. Both workstations had visual display requirements by the end user and both computers acted on the same reservoir data with applications that while similar in many ways, were still different in their functionalities and purpose.  In oil drilling projects that typically involve millions of dollars in capital investment, the confirmation of expected end results is an asset that far outweigh the costs of a few workstations. Nevertheless, in today’s economic setting, the ability to get the same functionalities at a lower cost is one of many key drivers in helping companies achieve healthy bottom lines.


The Proof Point For Virtualization In A Workstation


Engineers from Schlumberger, a leading oil field service provider, run performance-demanding applications such as GeoFrame* and Petrel*.  These applications serve to analyze complex geological and geophysical data and determine the viability of potential reservoirs, or to optimize production at existing sites. With GeoFrame running on Linux* and Petrel on Microsoft Windows*, Schlumberger engineers have been using these applications on two separate physical workstations, driving IT spending higher, pushing down user productivity and increasing both power consumption and IT maintenance costs.


A New Paradigm For A New Day


With the availability of Intel Xeon processor 5500 series-based workstations, game-changing workstation virtualization software such as Parallels Workstation Extreme has opened up new horizons with breakthrough graphics performance with Intel’s latest processor technology. Parallels Workstation Extreme is built on top of the Parallels FastLane Architecture that effectively leverages the full potential of hardware resources such as graphics and networking cards to offer optimal workstation performance.


In comparison testing, Schlumberger compared the concurrent performance of applications running side-by-side on a virtualized Intel Xeon processor 5400 series-based workstation with the same setup on the newer Intel Xeon processor 5500-based machine. The results were astounding. The first machine with the older processor without Intel-VT-d support ran Petrel on the host OS at full native speed, but performance for GeoFrame in a VM slowed enormously. While Petrel refreshed its graphics at a rate of 30 frames per second, GeoFrame crawled along at a graphics refresh rate of JUST one frame every 19 seconds, an agonizingly slow performance on an older workstation without Intel VT-d support.


When the group tested the same applications on the newer Xeon 5500 series workstation with Intel VT-d support, the results were striking: Both applications – Petrel running on the host OS and GeoFrame in a guest OS in a VM - ran at full native speed, and both were able to refresh graphics at near 30 frames per second—a 570 times improvement over the first workstation. Russ Sagert, Schlumberger’s Geoscience Technical Advisor for North America said “our engineers were blown away by the performance. We hammered these machines with extreme workloads that stressed every aspect of the system. Amazingly, the new workstation based on the Intel Xeon processor 5500 series provided performance enabling this multiple OS, multiple application environment for the first time.”


The key element in Schlumberger’s new environment is Intel Xeon processor 5500 series-based workstations with Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d).  Together, these technologies enable direct assignment of graphics and network cards to virtual machines, enabling the machine to circumvent the interrupt and exit loop and clearing the previous performance problems. Running in conjunction with Parallels Workstation Extreme, which effectively leverages Intel Virtualization Technology, including VT-d, the solution revolutionizes virtualization for high-end users. “High-performance virtualization on Intel Xeon processor 5500 series-based workstations is a game-changing capability,” says Sagert. “We can allocate multiple cores, up to 64 GB of memory and a dedicated graphics card to each machine. The results are spectacular.”


In the final analysis, moving to the Intel Xeon Processor 5500 series of next-generation workstations does far more than cut costs. It impacts the way that work gets done. If you have clients running the kind of resource-intensive, graphics-rich applications that traditionally slow to a crawl in a virtualized environment, consider the benefits of finally moving beyond the I/O barrier.


A fully configured Intel Xeon Processor 5500 series-based workstation running Parallels Workstation Extreme delivers the performance level that makes a virtualized workstation a leading contender for users with multi-workstation requirements. A streamlined work interface, reduced office noise and clutter, access to the same data repository and significant performance gains works on the user side. But the IT organization also gains benefits by lowering capital, management, support, provisioning, data protection, space, and energy and cooling costs. Moreover, the IT team can now standardize on a single OS image while addressing alternative requirements.


Learn More

Intel Workstation Processors

Parallels Workstation Extreme


[RC1]To distinguish from the hardware appliance breed

I have been around the supercomputing market for over 25 years and have had an opportunity to see some interesting ideas come and go.  Let me share two that I experienced firsthand.


  • CDC’s Cyber205 or a Cray 1S.  The CRAY-IS and the CDC Cyber 205 both offered effective vector processing, however, code conversion between them may have required some significant algorithmic changes. Cray of course won the HPC race at that time.  Note, the Cyber 205 was a tremendous performer, when you could keep their long vector pipeline busy. However, one branch or gap in the vector processing pipeline would cause a flush of the vector unit and what performance advantage you appeared to have vs. a Cray 1S was quickly erased.
  • An early day accelerator was Floating Point Systems.  In particular the FPS 164 was an awesome “off load” system where the needs of a few users were satisfied with better throughput than the Cray X-MP and Y-MP of the day. Convex, had a better idea.  It was better at serving the needs of more than an FPS 164 and was simpler to develop, maintain and scale software to next generation systems.


So what are the lessons from history? Perhaps it is that there it is there is a tight connection between application, architectures and algorithms and that it is extremely important to maintain a level of application flexibility and versatility in order to adopt new architectures as they become available in the market.  The old adage still remains true, software will outlive the useful life of hardware.  So it is important to be able to quickly adapt new shifts.


The same questions probably still apply today as they did when Cray, CDC and FPS were around.


When does an accelerator computing strategy work best?


The easiest answer is if your application is extremely data parallel in nature, then it may be well suited for an accelerator strategy. The word extremely is the critical part.


If your application only performs some level of data parallelism and includes task, thread and cluster level parallelism or contains a small fraction of branching or is host to irregular data sizes, then perhaps an accelerator may not be the best fit.


How much real performance will an accelerator strategy deliver?


Often times we hear claims of 10X, 20X or even greater than 30X.


These are great headlines, but as many have noted, you need to understand an accelerators impact on the total execution time of your application.  What may have been 10X to 30X or more on a kernel of the application may only deliver a mere 2X to 3X or even less in terms of total application performance improvement.


Of course the real question is what are we really comparing performance speed ups to?


I have seen well tuned software on accelerators compared to “baseline” code running on one core of an old processor.  However, when you use available software technology and turn compiler flags on and add in a math kernel library call the performance on multi-core solutions can jump by over 10X and in some cases can exceed 30X multiples for total execution time.  This standards based accelerated software will scale forward as newer microarchitectures are made available from Intel.


Why is the difference between the promise and the actual performance so great?


Always a good question.


The promise deals with a small part or a kernel of the software that is data parallel and can potentially scale linearly as more compute resources are added.  Again if the application is extremely data parallel, then an accelerator strategy may be the correct approach.


However, when the actual performance result, or total application performance, is significantly different it is often because of several things.


  • One common reason is that you may be comparing optimized software on multi-core systems to optimized software on an accelerator.  When I compare similarly optimized software on a multi-core system I see that 20 – 30X difference often fades to less than 2X  and in most cases better than hardware accelerators.  This is because optimized software on a multi-core solution accelerates all components of the application.
  • Another situation is the bandwidth imbalance of the attach points of the accelerators, typically the attach speeds do not match the memory bandwidth or the ALU speed on the accelerators and the theoretical peak flops are tough to achieve.  Sometimes, for larger workloads due to limited amount of memory on the accelerator card, performance deteriorates.
  • Another situation may be that your application depends on different forms of parallelism which include task, thread or cluster level parallelism and even in some cases sequential forms of your software


So back to the differences in performance between the Cray 1 and CDC Cyber 205.


While Cyber 205 was great at edges of science the Cray proved to be the workhorse of high performance computing.  It offered better system balance than the Cyber 205.  Here is an example, if you take great care to optimize your software for a particular architecture you will no doubt see tremendous performance gains.  However, like the Cyber 205, if you break that pipeline you need to pay for the overhead to restart the long vector pipeline.  Often times, even with today’s accelerators, that start up cost reduces what appears to be stellar performance gains of the Cyber 205 to being no better than, or sometimes, even slower than the Cray 1.  There were of course examples with the Cyber 205, as there is today with accelerators that demonstrate where select sciences can see tremendous advantages over traditional computing solutions.


What other considerations may weigh in your decision to adopt an accelerator strategy?


Are you constantly refining your software?


Many researchers would probably answer yes.  They are constantly refining their software to improve the results the performance or both.


As I mentioned at the beginning of the blog, the old adage still remains true, software will outlive the useful life of hardware.  So it is important to be able to quickly adapt to new shifts.  One way to simplify these moves is to use standards based tools which can give you the flexibility to create applications that can use the multiple types of parallelism mentioned above via tools, compilers, and libraries.  You may also want to use standards based tools to acquire the versatility you need in order to scale your software across multiple architectures – e.g. large, many and heterogeneous cores.


The caveat with respect to using non standard tools is that you become locked into a specific architecture.  If that architecture from the same vendor would happen to change, you may be required to make some significant changes (e.g. tuning to grain sizes).


Do you want to maintain, support and update multiple code bases?


I don’t.  I want to invest n the development of parallel algorithms.  The old adage is that software will far out live any hardware implementation still applies and I need the flexibility and versatility to quickly and as painlessly as possible be able to adopt new architectures as they are made available.  I do not want to invest in maintaining, supporting and updating an ever increasing set of code streams as newer architectures are made available.


Our team goal at Intel is to develop software tools and hardware technology that can help you scale-forward your application performance to future platforms without requiring a massive rebuild – just drop-in a new runtime that is optimized for the new platform to experience the improvement (akin to the printer/display driver model, buy a new printer/display, install the respective driver, and your system enjoys improved benefits).  That is the goal.


If you want to learn more about what we are doing to deliver high performing HPC solutions that are both flexible and versatile please visit



I had the recent opportunity to work on this case study published jointly by Intel, Dell and Motion Computing that reviewed how information technology investment by Correctional Health Services Corporation in Puerto Rico drove a transformation of their health services in their prison system.


There are tons of case studies out in market and web but to me this one stood out in it's dramatic impacts from improved efficiency of employees and workers at the prison, improved health care of inmates, the ability to meet minimum documentation standards, and a lowering of costs to manage the IT infrastructure.


If you read one case study this year .. this one is recommend.  Definitely a feel good story all around.



Today, I was made aware through my twitter contacts about the Cray CX1 product. Since i have been doing several online webinars and blogs recently talking about the advances in HPC performance over last decade and what innovation has enabled for mainstream HPC users. So i wanted to share what i found about Cray.  In short, the CX1 is an HPC solution Purpose-Built for Offices, Laboratories or Other Constrained Environments and sized to fit under a desk, contains up to 8 server blades and an awful lot of storage and I/O.


Read more straight from Cray's CX1 product brief ... "Who says world-class high performance computing (HPC) should be reserved for large research centers? The Cray CX1™ supercomputer makes HPCperformance available to everyone, combining the power of a high performance cluster with the affordability, ease-of-use and seamless integration of a workstation. Equipped with powerful Intel® Xeon® processors and state-of-the-art visualization and storage capabilities ... (more)





I’ve spent a fair number of words in the past on the benefits of 10 Gigabit and what it means for the server market.  Through the addition of FCoE and DataCenter Ethernet as well as advanced virtualization features 10 Gigabit seems likely to have its big day in the sun here pretty soon.  But the question is still “When”?





While the proof is ultimately in the raw volumes of 10 Gigabit that ship, and the number of IT users who utilize the higher performance, there are some key reasons to think that 10 Gigabit momentum is accelerating beyond just the numbers* below:



10 Gigabit Forecast.JPG




Over the past year, there has been a raft of new 10 Gigabit switch announcements** from Cisco (Nexus 5k/7k), Arista (7100, 7124, and 7148), BNT (G8100), Extreme Networks (Summit X650) Juniper (EX8200), Voltaire (8500) and many others that have increased the choice, and the density of 10 Gigabit switches in the marketplace.   There are now many 48+ port 10 Gigabit switches available and even a few 200+ port models.  Also, the improved density and feature set of certain switches (such as Voltaire’s 280+ port 8500 series switch) provide a path for 10 Gigabit’s ascent into the clustering market by improving port density and latency for clustering applications.





Broad acceptance of SFP+ has also helped to drive a rapid improvement in price, density, and power.  SFP+ provides a smaller form factor standard for optics, as well as a standard connection methodology to connect directly from switch to NIC via a Twin-Ax copper (read: ‘low cost’) cabling solution inside the rack (up to 10m).  The widespread adoption of SFP+ form factors has dramatically reduced the entry level price points for switches, and through the ‘direct attach’ copper connection capability it has also reduced the overall cost for initial and ongoing deployments of 10 Gigabit by providing a lower cost bridge to optical or full 10GBase-T support.





There are also a few data points to suggest that the Server side cost for 10 Gigabit will also be dropping fast going forward.  As power for 10GBase-T continues to drop quickly, more and more Server vendors are looking at the options available to embedded 10 Gigabit directly into their systems.  This will not likely be a 2009 story, but it is approaching quickly.  Additionally, the acceptance of SFP+ form factors for optics/direct attach cabling has provided a path that some Server vendors may use to design 10 Gigabit down on motherboards without adding the extra cost and power of a 10GBase-T solution.  This looks like a likely near term given that the solution power and design are robust and ready for motherboard based designs today.





Finally, the continued cost reduction provides an attractive long term value of standards based 10 Gigabit Ethernet.  There is clear indication downward pressure on 10GbE prices already present today.  We will see 10 Gigabit pricing follow a similar price curve as we saw with Single Gigabit.  This is evidenced in the recent pricing announcement where Intel reduced the cost of single port 10GBASE-T adapter 40% from $999 to $599.  The competitive economics of standards based hardware will continue to drive down 10 Gigabit prices even further and we will see 10GBASE-T pricing below the $500 / port price in the near future. Once it gets on the motherboard, prices will drop even further.




Overall, the power, density, latency, and cost of 10 Gigabit are all improving at a rapid rate.  Form factor flexibility coupled with a wide array of switch and NIC vendors in the marketplace will provide choice and low cost for IT departments while virtualization and convergence in the datacenter and elsewhere continue to provide demands for ever greater I/O bandwidth and performance.






Ben Hacker






* Del’Oro Forecasts as of Q1 ‘09

Filter Blog

By date: By tag: