whlea

Live From: "Cisco Live" Event

Posted by whlea Jun 30, 2009

I’m here this week in familiar stomping grounds, the Moscone Center, in San Francisco. Today’s event started off strong with John Chambers keynote address. His speech was very engaging as he wandered through the audience, capturing the attention of nearly 10K attendees. What caught my eye especially was his focus on collaboration and Web2.0. The example he used was the recent launch of the Cisco Unified Computing Solution (UCS) which was launched via online tools such as blogs, telepresence, and flicker, check out this photo:

IT-Web20 Enabling Cisco.JPG

This shows that the virtual launch reached 10x the audience at 1/10th the cost! I am really glad to hear that since this is what I do for a living.

John also spoke about some emerging technologies and I found out that Cisco has been working very closely with the Dallas Cowboys on increasing the customer experience. I was a little disappointed to hear John is a Niners fan, but had to expect that coming from a man and a company that was named after San Fran’cisco’, so I give him a break on that one.

Cowboys.JPG

It was also very interesting to hear a bit about the history of the Cisco logo, looks like times have changed and so has the logo:

logo.JPG

After the keynote, I caught up with John and Kirk Skaugen, Executive Vice President with Intel’s Digital Enterprise Group at the Intel booth where Kirk had a surprise. Intel presented to Cisco and John a XEON 5500 processor series wafer (code named Nehalem).

kirk_john_1.JPG

Here’s another shot with a the XEON 5500 wafer:

Kirk-John Cisco Live.JPG

I’ll being covering more of the event and participating in social media events during the event. Look for future updates here in the Server Room.

Wm. Hank Lea

So, its not clear from this posting whether VMware's "Code Central" was announced or escaped but this looks to be a very valuable repository for sharing vSphere scripts.

 

I'm a recent convert to the wonders of creating new capabilities through the vSphere SDK. Our team has been using it to prototype some interesting new usages for power aware virtualization that we hope eventually will find their way into the VMWare Distributed Power Management (DPM) tool.

 

The most interesting usage is what we call "platooning" where different server resource pools are kept in different power states from fully powered on through power capped to standby and full off. Servers are moved from one platoon to the next (and workloads are migrated onto them) based on a set of policies for required application capacity headroom and power on latency as load increases. Our belief is that, by carefully designing these policies, we'll be able to save significant power across the data center without impacting peak throughput or response time of any of the applications.

 

Unfortunately, we don't have the data to demonstrate this savings yet. That's where the SDK comes in. We're able to prototype the usage, collect the data, validate the feasibilty and, if it never shows up in DPM, still be able to implement it in production.

 

We're just coming up to speed on the SDK, having completed our first "Hello World" integration with it but we think its going to be a very valuable tool for experimenting and going to production with many new usages. I'm hoping Code Central provides a good source of examples to help bootstrap our development.

Did you know that using an electricity rate of 11.4 cents per kWh provides a simple method of calculating annual electricity cost of any device?

 

1 watt of power consumption, with an electricity rate of 11.4 cents per kWh, cost $1 per year, assuming power usage remains constant.  Also, as a general rule of thumb, every 1W of device power consumption in a data center requires an additional 1 watt for overhead power (Source: Intel IT). So a device that consumes 1W actually consumes 2W of power at a data center level.


Here's the math:  1 Watt power * 8760 hours per year / 1000 * $0.114 electricity rate per kWh = $1 per year.  This math holds the same for any currency, Euro, Yuan, etc.  11.4 cents per kWh is the crossover point…and as electricity rates increase over 11.4 cents, 1 watt will cost more than a $1 per year. 

The datacenter overhead power, often referred to as Power Usage Effectiveness (PUE) is a number which has emerged as the leading metric for data center energy efficiency.


You might say that 1W = $2 annually doesn't sound like much, but start doing the math for 1000 servers that consume 200W in a data center with a PUE of 2.0 which works out to annual electricity cost of ~$400,000 per year.  Now, for every 1 watt the server power consumption is reduced, this would translate into $2000 annual savings.  Note, this is a very rudimentary example, but it is useful to illustrate why customers are really starting to focus on power as one of their key purchase decisions.

    

If you need energy efficient servers, there are multiple server vendors currently have some exceptional energy efficient products based on Intel(R) Xeon(R) 5500 processors.  And looking forward, we are also actively working on how to reduce power of the processor and at the system level for the upcoming generations of products.


Here’s some good reference on electricity rates:

For United States, state by state electricity rate comparison

For Europe, 1st half of 2008 rate comparison by country.

 


Remember, power is one purchase decision, but it is not the only one.  A rack of servers that consumes less power that does less work isn't an efficient way of deploying servers either.  Ensure that the performance vector is considered.  Intel® Xeon® 5500 processor based servers provid exceptional performance and perf/watt leadership over the competition.


Quick question for you:  How does electricity rate of 11.4 cents per kWh and a data center PUE of 2.0 compare to your data center? 

 

 

 

We talk a lot about how great the Intel Xeon processor compares vs. competing RISC architectures when it come to price and price/performance on various workloads, but unfortunately for many existing people running on RISC hardware, simply throwing out the old and standardizing on the shiny new Intel-based servers isn't always that simple of a proposition.  Why? Your existing software running on UNIX (i.e. AIX, Solaris) may be custom-coded on your flavor of UNIX, the source code may be lost, the guy who wrote retired 5 years ago, etc.  So, how do you account for this when 'running the numbers' to see if it makes sense to rid yourself of the power and money-sucking old RISC server collecting dust in the back of the data center?  These five steps may help:

 

1.  Understand the business benefits of moving from your existing RISC hardware to IA (and compare vs buying new RISC hardware)

This is the simple analysis that looks at performance of your existing system, compares it to new hardware and then factor in other significant cost items like power consumption, software licensing, software/hardware maintenance costs, etc.  Of course, this almost always shows that new Intel hardware will save you significant dollars over the long-term and you can figure out how quickly you pay-back your cost of the server in years or possibly months based on this simple calculation.  And, many server hardware vendors (and Intel field reps) have these tools available, you just need to ask.

2.  ***** your current RISC-based infrastructure

Meaning, look at all the software that actually runs on these servers, the packaged applications and the custom code.  Do you use particular storage adapters and drivers for your SAN, etc?  Make a list.  Now, look to see which items are easy, and which ones will be hard to migrate.  If it's a packaged database that available on Windows, Linux, and Solaris for IA already, then it may be fairly to migrate the data over in a short period of time by yourself and move on.  However, for those custom codes and potential software packages that will need to be changed in order to move to current hardware, start looking at the real costs to migrate these pieces.  Often, this step will require some help from a services company or a hardware vendor that can provide these services in addition to selling you the new hardware.  Now that you have these estimates, factor it back into step #1.  Sure, the ROI will not look as good, but often will be surprising still very good even after factoring in these migration costs.

3.  Develop a migration plan

You may chose to do this on your own if it doesn't look too intimidating, but for more complicated migrations, likely you will need some external help.  If you've factored in these services costs already during the previous step then the cost of doing this step is already justified.  Many services companies will give you the estimate very inexpensively.

4. Test

You may only be able to test the 'easy stuff' initially, but verify the performance deltas between the new and old systems calculated in step 1 to correctly size how much hardware you will need in actual deployment.  This is where the actual performance of the system will measure up vs. the performance estimates used in your ROI analysis in step 1.  Sometimes this can be better than calculated or worse, your mileage is guaranteed to vary.

5. Deploy

If you have your migration plan in place from step 3, now you execute according your plan, ensuring your migrate data in the right order to ensure minimal downtime.

 

These steps can be very intimidating, many people in IT find it hard to justify the migration costs (particularly if you need to pay for some services), but taking a systematic approach to it and carefully calculating your ROI including these extra costs will often make it worth the effort.

The computer industry is filled with pundits, speculators, visionaries, salesman, brilliant architects and professors. Each provides invaluable insight into their experience, their intelligence, their alma mater, their ticker symbol, their ego and what’s next. Some win the “what’s next lottery”, others work for years of brilliance in relative obscurity.

Seemingly, a world that has deployed over 1 Billion devices a year for the last 3 years , is incapable of understanding the gravity of a new programming models, a new hardware architecture, a sleek new design that delivers on a vision that Gene Rodenberry thought of in the 1960’s or Da Vinci in the 15th Century. What is old is new…..and let me tell you why? It will revolutionize the industry (not evolutionize…a term reserved for slower growing industry’s that require government assistant every decade or so…), transform your environment and provide freedoms you had only hoped to enjoy….and we invented it 40 years ago. Does any of this sound familiar?

It should. These are the paraphrased slogans of an industry in transition. Real products matters, product differentiation matters, standards matter, interoperability matters….and shareholders pay for future expectations.

The future of computing…is NOW. The future of the computer industry is NOW. The next generation of computer programming, software architectures and transformational technologies is NOW. As an industry we have finally begun to embrace interface, architectural and software programming standards to usher in a new era of interoperability and scalability. Behind us are the days of “proprietary interfaces” (What does that actually mean other than I am going to sell you some extra accessories that will be worthless in 2 years?), which do not provide a differentiated performance/cost advantage. Gone are the days of developing programming languages that lock-in customers to individual companies, whether vendors innovate or not. These rules of the past are slowly melting away, allowing the entire industry to embrace interoperability and standards at the highest level in history. Industry diversity is healthy and insures that the most innovative and technologically relevant companies will “win” most of the time. Allowing the 1 Billion and the Next Billion customers of the world to enjoy the best interface technology yet developed….each other.  It also provides us with a unique ability to move to the next phase in our dynamic industry’s growth, autonomic instrumentation.

At Intel, we are constantly working to develop the next great performance architecture, filled with new innovative “goodies”, as our Chief Virtualization Architect Rich Uhlig calls them. These “goodies” (a technical term that Rich borrowed from his nephew, I believe) come in the form of virtualization technologies (Intel VT-x, Intel VT-d and Intel VT-c), security technologies (Intel LT-SX), performance technologies (Hyper-Threading, Turbo Boost) and energy efficiency instrumentation (Node Manager and Data Center Manager). Soon they will also include differentiated services in the cloud which facilitate ease of use and growth for a host of vertical industries in need of innovation. The resulting architectures that emerge will be instrument rich, feature capable and as scalable as users are willing to pay for.

Why is this important? Instrumentation matters. As we apply business and personal rules to our growing compute environments it has become increasingly clear that the more tools we make available to users the better informed we are in making decisions. The more disclosure we provide to investors through the use of autonomic programming architectures the more informed they will be of their investing decisions.

How can you day trade $1B in 35 different stocks without clear autonomic controls in your data center, your database, your application and your client devices?

How can you move 450 Million people efficiecntly throughout a country for 2 weeks without autonomic controls on transportation: plains, trains, boats and automobiles, as they do during the Spring Festival in China?

How can you process 1 Billion text messages a day without clear business rules? What happens when these messages are also coming from machines to other machines, modifying databases, applications and clients?

As humans, we must apply guidelines, much like laws,  for our machines to take action when we are asleep, when we are tired, when we are not present, when we are just simply being human….to slow to react to a rapidly changing environment.

The innovators of the computer industry today understand this NOW. We do not need to discuss a vision of 40 years ago without a plan to act NOW. Claiming ideas without action is dishonorable at best, criminal at worst. The innovators of today must build products and services that help solve the problems of today. We do not need to look to 2050 without a plan to act NOW. The visionaries of tomorrow are…..not born. The visionaries of today…can call me in 10 years.

Autonomic controls are in place today, machine to machine computer architectures are here today, scalable compute engines are here today. Are they perfect, no. Are they effective, yes. The design architects, product engineers and systems designers of today need to address these concerns. Autonomic Instrumentation delivers control to the administrator, the user and the developer. Rules engines can be modified to maximize efficiency, minimize consumption and increase productivity. All of these will lead to increase shareholder (read: No just people who buy shares of stock) value across your enterprise, your school, our hospitals, our governments, and your home.

When executed properly, Autonomic controls should be able to deliver 20-25% performance and efficiency increases with each new generation of Moore’s law. In some cases, as in the Intel Xeon® 5500 Series these increases have been over 150% in virtualization performance, these increases will be a combination of software architecture enhancement and silicon optimization. In other cases, it will be through the dedicated hard work of increase instrumentation capability of a processor platform at the same price of the previous generation through energy efficiency and memory controls.

Autonomic controls will also allow end users to avert disasters in our data centers, our homes and in our hands. Autonomic instrumentation design frameworks, allow users to set parameters on data migrations, data backup, security, memory access, power consumption and virtual machine architectures.

For Intel and our new Xeon® 5500 Series processor family, and our recently announced

Intel® Nehalem-EX platform provide the new generation of platform instrumentation. As product developers, designers and architects we should all find a way to increase the tools available to our customers to take advantage of these instrumentation capabilities. I look forward to being able to announce more of these new features as we announce them and help to provide development frameworks for developers, engineers and architects to build new products and services, ushering in the future of autonomic computing innovation…today.

I was out at HP Tech Forum last week and had a chance to catch up on all the latest technology advancements with HP and Intel. What I saw was staggering, over 17 new HP-Intel designs, the HP Performance Optimized Datacenter (POD), and lot's more that I will be sharing with you in coming days as I add more video from the event and help to tell the story if you couldn't be there. First off, I caught up with John McAtee from Intel's HP account team. He was showing a cool demonstration on why now is the right time to invest in XEON 5500 processor series technology. Check out this video and find out how you can start saving in your datacenter today !

 

 

If you want more information on how the XEON 5500 processor series can starting saving in the datacenter, check out this ROI Calculator tool. Also, if you are looking for detailed information or are just looking to gain more knowledge, you can always "Ask The Professor" in our Server Learning Center.

Sunsets can last a while, but in the end the sun will go down.  I talk to a lot of companies and listen to a lot of data center managers.  Customers trust their AIX-Power and Solaris-Sparc platforms.  These are solid platforms and deliver good features and reliability, but, if these managers could get the sense of security, performance, and reliability with Linux / Intel Xeon platforms, they would move tomorrow.  It is simple economics.

 

The reality is that customers are making this move, and being successful.  The hardware reliability on Intel platforms today is amazing.  Intel recently announced that their next generation of Xeon(Nehalem) EX based servers will support Machine Check Architecture.  This brings high end Xeon X86 servers into the RAS family previously reserved to proprietary RISC & mainframe platforms.  Intel Xeon already eclipses the performance of proprietary RISC processors both on a per processor basis and a per dollar basis.  It is reasonable to say Xeon can deliver better performance, better value, and equal or better reliability.  The only hurdle left is the software.

 

Linux has come a long ways.  It is no longer a university OS, run by geeky dudes in black T-shirts emblazoned with the quadratic formula.  It is mainstream and solidly supported.  Linux is the primary development and delivery platform for Oracle.  Other OS environments are ports, delaying support and innovation.  Linux is used by major financial companies.  Linux is available in solid and well supported distributions with a 20 year history of enterprise business.  Linux experts are broad community, worldwide, and growing in number.  Linux is economical vs proprietary RISC.

 

In an era of big budgets and conservative (don’t make any changes) philosophies, businesses will always stick to their proprietary RISC systems.  That era is over.  Sticking to your RISC systems may seem like the safe move, but failing to examine the opportunities for better performance and lower cost with Linux on Intel Xeon platforms is business negligence.  Business negligence is seldom rewarded. 

Performance, Price, Infrastructure, Ecosystem, TCO, RAS – when the decision factors are examined it becomes clear -  we are in the RISC twilight and the sun will set on Sparc-Solaris and Power-AIX.

Based on Intel’s current processor core count and extrapolating from their “Tick Tock” model for scheduled new CPU designs, by 2017 Intel could very well be designing a 20-Core CPU!  Do we need that functionality with so many single threaded applications in the market today?  Maybe not today, but in 8-10 years computing usage models are going to be a lot different than they are now.

Whether virtualization environments will be running on 20-Core Intel processors by then or not, one thing is very clear high end virtual environments will require much more powerful management environments that what we have today.  Management tools that now only look at high level performance metrics will need to look at detailed server component and CPU level power consumption, detailed Core level performance metrics, and managed thermal output.  The more finitely we can manage our virtual elements the greater control we will have at the server, rack, and data center level for optimizing data center virtual server density to the physical limitations of our data center environment.

With VMware’s latest server virtualization version, renamed vSphere, new manageability capabilities that increase the usability and decrease the cost of managing a virtual environment are included.  Although they don’t go to the level I just described, they do provide some nice improvements from version 3.5.  Storage and network optimizations have been added that allow hosts to power down when not needed using their power management (DPM) tool (which is now a fully supported feature and not just “experimental”). VM Monitoring now uses VMware Tools to evaluate individual VM’s and check to see if they are running and Fault Tolerance now ensures continuous availability for virtual machines against hardware failures.

Intel is driving the creation of not just multi-core CPU’s but the tools that will drive virtualization architecture adoption in the future.  Providing powerful tools that manage power, thermal, and performance to help make our lives as data center operations personnel easier and make the value proposition of virtualization that much greater.  Management definitely will continue to be a key component to determining TCO in the future.  See the following around what Intel is doing around management. (http://www.intel.com/design/servers/ism/sms.htm)

What are the key management tools, in your opinion, that drive virtualization adoption, those “can’t do without” management apps?

Mark

 

 

Running multiple Unix environments across a range of locations adds increased complexity and cost to the IT environment. I came across an interesting case study and wanted to highlight some of the key findings

 

YPF SAis the largest company in Argentina operating in the Oil and Gas industry. The company has 29 gas plants around Argentina running different Unix environments such as HP-UX, AIX and Solaris.

 

YPF SA consolidated their SAP ERP and Oracle DB environment from multiple Unix environments to Red Hat Enterprise Linux 5 with integrated virtualization running on Intel Xeon based platforms from IBM System X

 

Some of the key findings to highlight

  • Key requirement from Unix Administration Team that "migrating from old RISC/Unix and proprietary servers to open and flexible platforms would pose no risk to the reliability, availability and performance of the systems"
  • Positive impact on cost and performance; Lowered costs, simplified management and increased compatibility
  • Reduction in costs especially when compared to license costs of RISC based platforms
  • Increased performance and availability drove decision to scale with RHEL and Xeon
  • Ability to leverage Redhat integrated virtualization. Free up internal hardware and technical resources for other projects

 

 

I guess the combination of Redhat and Intel deliver the business results that customers are seeking. What do you think?

My grandfather was born in the early 1900’s.  By all accounts he was a hardworking man with a strong degree of curiosity.  He passed away in his late 80’s and before he died I remember talking to him about my pursuit of an Electrical Engineering degree.  He nodded politely, asked a few questions and when I helped to fix the electrical outlet in his garage I got the sense that he thought I was heading down the path to be an electrician.  I believe that thought pleased him.  Several years ago I was explaining to my five year old daughter in layman’s terms what I did for a living and what my company made.  I said things like “We make tiny engines that run computers” or “I work with computers that run websites like Webkinz® and Disney®”.  She seemed impressed.  Months later when she was asked by a parent of her friend what her dad did for a living I was a combination of proud and surprised to hear that she replied “They make chips…”  (proud moment) “…and salsa!” (um OK.  I still have work to do).

_

Now the other day she walked up to me and said something like “Dad, I am having trouble getting the Slingbox to work on mom’s iPod Touch.  It is connected to the Internet but the remote does not seem to be changing the channel.  Can you help me?”  Clearly she has made some progress up the technology curve, but it also struck me how far she has come.  Kids these days are surrounded by technology.  In our house alone there are at least the following electronic devices; Oven, Microwave, AppleTV, refrigerator, smoke detector (3), carbon monoxide detector, programmable thermostat, furnace, radio, garage door opener (2), wireless speakers, televisions (3), set top boxes (3), ceiling fans with remotes (3), netbook, Slingbox, Clear wireless router, remote outlet, sprinkler control box, iPod Touch, desktop computer, Wii, iPod shuffle (2), alarm clocks (3), oven timer, electronic light dimmer, cordless phones (4), AV receiver, DVD players (3), VCR, iPod docking station, security system, motion sensor, camcorder, camera (2), USB hub, music keyboard, AV switch, computer keyboard, battery chargers (4), Wii remotes (4), Wii Fit Pad, Wii drums, copier/fax/scanner, computer monitor, AC, Power supplies (4), RFID credit cards (2), washer, dryer, noise canceling headphones, answering machine, internet modem, cell phones (2), handheld GPS, auto GPS and electronic battleship.

_

I am sure I have forgotten several things and I did not count cars or anything at my children’s school.  I am also sure each of the electronic devices in our house has either a processor, microcontroller, ASIC or multiple of each.  Admittedly, the silicon content in our house is probably above average given where I work and the personalities my wife and I have.  But when I think back to my grandfather he had none of these silicon laden items.  I am sure he didn’t care since it is hard to miss something you never knew.  Of the hundreds of pieces of silicon in our house about a dozen or so are smart enough to connect to each other or to “the cloud” in some way.  I put “the cloud” in quotes because it is not only the most over-hyped word of it’s time it is also the best way to articulate what I suspect my children and many others think of the services that they get when all of this stuff gets connected.

_

I can safely say two things are fact. First, my grandchildren will have in their house many more pieces of silicon than I do. Second, they will have more pieces of silicon that can connect to each other and communicate with “the cloud”.  There are many billions of devices connected to the Internet today and that number will grow.  At Intel we are building silicon, and increasingly software assets, that facilitate the processing and movement of data both on those devices and between them. Servers are increasingly becoming an important part of that over-hyped cloud word. My cable company has a cloud delivering me my on demand video content, A social media site allows me to upload pictures into their cloud to share with my friends, someone just used a cloud architecture to develop a perpetual motion machine.  OK, one of those things was false.

_

My grandfather thought a cloud was something in the sky.  My children think it streams video to their handheld device.  What will our great-grandchildren think?

 

Non-x86 RISC architectures, Power or SPARC, have been used in high end business critical virtualization solutions for a long while now. These come with a vertical stack of solution including the hardware, software, manageability tools and services provided by one vendor. This often leads to lock-in to the proprietary virtualization solution and services, and can be expensive from an end user perspective.

 

There are reasons why companies that can afford RISC based solutions have subscribed to it. This has been mainly due to Reliability, Availability and Serviceability (RAS) features, scalability and dedicated resources for quality of service (QoS) and isolation.

 

 

The world of virtualization however has significantly changed in the last 5 years. x86 based hardware and software products today offer well accepted and high performance virtualization solution. With the eminent availability of highly scalable and resilient Nehalem-EX products with 16-threads per socket and extensive RAS capabilities in the near future, the line between an expensive RISC solution and x86 based virtualization solution could blur further.

 

 

From an end user’s perspective, Nehalem-EX could provide sufficient capabilities that they have come to expect out of a RISC based virtualization infrastructure. Looking at it:

 

  • Hardware partitioning of Nehalem-EX platform would be possible. Along with this OS virtualization and full commercial hypervisor support for logical partitioning already exists on Xeon processors.
  • Nehalem-EX hardware infrastructure allows software ecosystem to deliver capacity on demand. For example extra CPU capacity can be dynamically added as needed. Moreover VM migration and policy based load balancing capabilities that already exist in commercial hypervisors complement this and provides IT easy methods to manage capacity at the datacenter level.
  • Memory can be dedicated by not oversubscribing the available physical memory.
  • CPUs can be dedicated by creating CPU affinity.
  • Dedicated I/O assignment is possible using VT for Directed I/O. It can also restrict DMA access from devices to certain areas in memory, increasing isolation and system reliability.
  • Single Root IO Virtualization feature would be available as part of Intel VT for connectivity in the networking devices. This allows a single NIC to be shared amongst multiple VMs directly, while isolating the traffic from a NIC queue to a VM for better reliability. Per VM bandwidth allocation can also be supported.
  • Nehalem EX adds virtualization feature that could help increase VM performance in a processor oversubscribed environment with high system utilization.
  • Nehalem-EX will add new reliability, availability and serviceability (RAS) such as Machine Check Architecture (MCA) Recovery that allows error detection, error recovery and VM isolation.
  • Inherent power technologies in the CPU, Turbo mode, and Dynamic Power Node Manager for system wide power capping all deliver IT the essential keys to balance power and performance.

 

 

 

While Nehalem-EX measures up to the infrastructure needs, it also enables horizontal solution that would allow customers to take advantage of best of breed software from the virtualization ecosystem thus reducing lock-in. This could result in faster innovation leading to an array of choices for business critical virtualization.

 

 

Based on http://www.itjungle.com/tfh/tfh042808-story03.html, a Power virtualization solution with Power6 based 4 Socket P550 box (~$93,000) and PowerVM Enterprise Edition for large system ($1,969 per core, with $220 per year on the maintenance) will totally cost an enterprise $109,000, just in one server acquisition.

 

 

While pricing of NHM-EX 4S system is not available, approximating a cost using current 4-Socket Intel server pricing and commercial VMM software would suggest that Intel based solution could cost at-least 50% less in just infrastructure. Other savings like not requiring specialized RISC based hardware, services, solution and staff would add to the lower cost of ownership in the long run.

 

 

Given the economy and Nehalem-EX features, would it not make sense to take RISC out of your investment?

 

 

Picture1.jpgAt ISC09, the Top 500* results were announced: 399 out 400, nearly 80%, of the world’s supercomputers are using Intel processors.  The Top500 list is based upon one benchmark, Linpack.  While powering most of the world’s fastest computers is a great endorsement of the role Intel’s technology is playing to help solve the most complex high performance computing problems, no one buys a supercomputing machine just to run Linpack.  Linpack is a kernel that does not necessarily resemble any real application.  It’s just one evaluation vector among many. So, should you demand more?

Yes, look beyond the flops:  look at real application performance or benchmarks that might more closely resemble yours, look at the versatility, and look at ease of deployment of your solution.  

Today, Intel processors deliver more performance and throughput in less space and require less power than ever before.  The Intel® Xeon® 5500 Platform delivers up 3X performance over the previous generation Intel Xeon 5400 to decrease your time to discovery.   The Top500 list has 33 new entries based on the Xeon 5500, which launch only 3 months ago. Intel tools (compilers, libraries, and cluster kits) bring new levels of software versatility by enabling HPC users and ISVs write applications that extract peak performance and scale forward.  The Intel’s Cluster Ready program is easing cluster deployments, increasing reliability and lowering TCO by making it simpler to purchase, deploy and manage an HPC cluster.

So while providing flops is great, don’t forget (and demand) to look at real application performance, ask for software tools and technologies that maximize the value of your HPC system.

Jimmy

*Other names and brands may be claimed as the property of others

I have been watching the social chatter today about the latest Top500 supercomputing list and seeing companies, manufacturers, application vendors and even countries compete for mind share of this most recent list on twitter.

 

However, as I read about and explored this list, the things that jumped out at me were not the who’s number one, two, three … or who grew what number of spots ... but rather the trends that have occurred over time. These trends have not happened in the last 6 months or the last 6 years but instead over the course of nearly a decade of innovation

 

1)      Today, the #10 posting (a cluster using the 3-month old Xeon X5570 processor (Nehalem-EP)) delivers the same FLOPS performance capability equal to the entire June 2000 TOP 500 computers list. (see below)

 

 

top 500 over time jun 09 Performance_Development.png

source: http://www.top500.org/lists/2009/06/performance_development

 

2)      Also, the emergence of multi-core intel-based servers complemented by affordable open-source software solutions have enabled a transformation of how supercomputing performance is delivered. Intel based servers have gone from nearly “0” to nearly “400” over this decade.

 

 

IntelTOP500history.jpg

source: http://www.intel.com/pressroom/images/IntelTOP500history.jpg

 

I recently had the opportunity to co-present a webinar with Matt Jacob’s of Penguin Computing where we talked about how High Performance Computing is changing the way that businesses innovate, research, design, analyze and create. What used to be only done in large datacenters and universities are now available to mainstream IT and businesses.

 

This is extremely important for areas like health care, financial services, manufacturing and many other industries.  Equally important are the software technologies (intel cluster ready software) that can make clustering available and easy to use so that this performance capability can be tapped without a ton of complexity.

 

So, while the Top500 list may be interesting for bragging rights, what excites me and many of the end users that I talk to are is the power, affordability and accessibility that high performance computing has to mainstream business users and the innovation and creativity that brings to the marketplace.

 

How are you using computing perfomance to do things that once were not possible in your business?  Share your story with us !!!

 

Chris

http://twitter.com/chris_p_intel

 

 

Here is another happy customer, YPF Gas, successfully reducing cost by migrating Oracle, SAP, and other workload from multiple proprietary UNIX environments to open, industry standard-based.  The choice was Red Hat Enterprise Linux with virtualization, running on Intel Xeon processor-based servers.  We can see from the number of times the word “cost” is used in also published press release, it is the major challenge for IT mangers and we have solution for it.  YPF Gas declares “now, more than 80 percent of (our) Oracle databases and 90 percent of (our) SAP applications run on Red Hat Enterprise Linux 5 with integrated virtualization on Intel Xeon processor-based servers…” 

Also, don’t forget to register and participate in the Red Hat-Intel joint webinar, How and When to Migrate to Red Hat Enterprise Linux on Intel Xeon processors, tomorrow at 2pm eastern. 

BMW automobiles are known for speed, agility, quality, style and probably some other attributes I’m forgetting. Their IT infrastructure requires the same attributes for them to remain competitive in their industry.

Proactive server refresh, now using Xeon 5500 are part of that equation.  This recent case study outlines how BMWs migration to Xeon 5500 series lowers total cost of ownership and increases flexibility for their business.

Server refresh with Xeon 5500 delivers 30% higher IT performance with 75% less hardware, compared to dual core Xeon 5100 technology. 

The case study also says that BMW’s next refresh target are their RISC based servers

Can you gain a competitive edge replacing aging servers in your infrastructure

Estimate your savings today (www.intel.com/go/xeonestimator)

For the past several months we have been hard at work on a new training class for developers. This class teaches the main concepts of threading and scalability to C++ developers new to parallelism. If this means you, and you work in the Bay area, join us for the pilot class, which will be free. Seating is limited, so register early!

 

When: Friday, July 17, 2009
Where: Intel Santa Clara site, building SC12 lobby
Time: 9AM - 4PM, lunch provided

 

Our invite has more information including the content agenda and how to register.
Hope to see you there!

ChrisPeters

What Are CIO's Saying?

Posted by ChrisPeters Jun 19, 2009

Today, I watched a video from Gartner where they shared insighs from their recent CIO survey and offered recommendations for 2H.  In the survey, about 1/2 the CIOs saw a drop in budget this year (no surprise there) and they reported the average budget was about 7% down.  However, Gartner also noted that moving forward, these CIO budgets appear stable.

 

Recommendations from Gartner to CIOs

  1. Be Decisive - CIO Must Make Tough Decisions in the downturn (don't have time or resources for lengthy analysis)
  2. Be Resourceful - Change the way you work .. to be more efficient (more with less mentality)
  3. Prioritize - Do First Things First and Faster (accelerate the things that are important)
  4. Focus on Greater Results (ROI, Benefits, payback, savings, productivity)

 

In my opinion, server refresh is ripe to capitalize on these Gartner reccomendations (boosting efficiency, must act to get benefits, and the benefits are big with estimated 8 month ROI when using then new Xeon 5500).  The good news is that with the server refresh estimator (www.intel.com/go/xeonestimator), you don't have to reinvent the wheel.  This tool helps you estimate the benefits of server refresh and helps you communicate those benefits to a variety of stakeholders in your organization including business heads, finance, facility managers, sr management and others. Getting everyone behind refresh is critical since server refresh affects everyone, and positively.

 

For Intel, proceeding with server refresh in 2H is worth $19M in savings http://communities.intel.com/docs/DOC-3271

 

Waiting on Server Refresh Means Wasting Valuable Resources.

 

Chris (http://twitter.com/chris_p_intel)

I'm not sure who first coined the phrase "Innovate or Die" and there is some debate  - however in tough times, innovation is a proven way to both save money and get ahead.  Recently I had the opportunity to co-present a seminar on this topic with Anita Campbell of MidMarket Innovators and Sun's Mac McConnell.

 

This seminar was more interesting from others i've done or attended since it was discusion heavy and content light.

 

Also surprising was that over 40% of the participants were anticipating IT budget increases in the next 6 months.   I haven't heard that kind of a statistic in a long time.

 

We touched on the key subjects below

  • Business Outlook
  • Strategies & Tips for Success
  • Customer Case Studies
    • Practice-IT, an online training technology provider, utilized VMware and was able to significantly expand its capacity to support increasing workloads without increased cost as the company added new customers.
      • NaviSite, a medium-sized hosting company, has 17 data centers around the globe. The company has a large VMware environment and needed to consolidate this environment to save costs while increasing the agility to respond to their customer’s needs.
      • Catholic Diocese of Boise, a non-profit providing services and support for 54 parishes, 33 missions and chapels, 14 K–12 schools, a library, and 40 offices, has 38 employees who work at a central administrative office.  They were able to reduce 28 servers down to 4, by implementing Windows Server 2008 Hyper-V to minimize costs and increase server utilization.

             

    Anita's recap of the seminar is here where you can also hear a re-play

     

    Chris

    I had an opportunity to travel to San Franciso a couple weeks ago to attend and capture some video at the Sun JavaOne conference.  Here are the video's as they are posted to YouTube:

    Sun JavaOne Conference Keynote with Intel's Diane Bryant

    This video shows the Keynote where Jonathan Schwartz and Diane Bryant are talking to a customer who implimented Sun systems based on the Intel Xeon 5500 servies processor.  The customer is impressed, to say the least.

     

    Sun JavaOne Conference Intel Booth and Demonstration

     

    This video is a tour of the Intel Booth in the conference with a walk through of the demonstrations being shown.  A perspective you don't often get unless you attend a conference directly.

     

    Overall an interesting experience gathing and working to creat this content.  There are so many details that go into gathering the raw content and getting it turned into something that is more consumable.  I have a new found respect for anyone that does this regularly.

     

    Hope you enjoy.

    Greg

     

    I haven’t been to see the new Terminator movie yet, but I certainly remember the Arnold movies of my youth, and the similar theme of machine vs. man in the geek lovin’ Matrix series.  Really, we all have a fear in a remote corner of our minds that someday the machines we all love will be smarter than us and somehow realize that we’re disposable.  Or useful as human batteries.  Which is why I love to work on the future of datacenter technology…after all, we’ll be the first to know the Top 500 list will mean something much more sinister.

    Of course…I’m kidding.  The future of datacenters will bring great things to our planet from speeding the discovery in science, to making us much more efficient and lowering our collective carbon footprint.  And of course it’s datacenters that bring us Facebook, and who could really live without that?

    The next transformation of the datacenter is almost within our grasp with the evolution of the enterprise cloud.  I wanted to shift focus from the nearer term technology innovation covered in our most recent podcasts to this broader technology movement, and to do so I recently chatted with three very smart people.  First, I talked with Dylan Larson about Intel’s view of the enterprise cloud and what technology trends he sees as critical to the creation of the architectural framework for the cloud.  I then spoke to Jim Greene about the future of security in the datacenter.  Finally, I visited with David Jenkins about our vision of instrumentation and why this technology is so important to the datacenter of the future. None of them mentioned anything about Christian Bale or Neo…but they did say a lot about where we’re going to create the next stage in datacenter computing. Take a listen at our Chip Chat Channel…

    ...and if you like what you hear subscribe to Chip Chat on Intel's RSS feed or on iTunes.

    This past April, as Intel was releasing their new Xeon E5500 series processors, we showed you some remarkable test results demonstrating a solid 53% performance improvement between E5400 and E5500 based servers when running a DBHammer SQL Server 2008 workload http://community.citrix.com/pages/viewpage.action?pageId=73564465. We now wanted to move onto a workload that represents the largest segment of the Citrix user community, XenApp. More specifically, XenApp 5.0 virtualized with the new XenServer 5.5. As we've seen in previous similar virtualization performance tests with XenApp on XenServer, when the XenApp guests are 32 bit (the majority of XenApp users still use use 32-bit applications), the opportunity for server consolidation can be significant. We wanted to see just how good the server consolidation opportunity is when an Intel Xeon E5500-based server is used as a XenServer host. In this case, we looked at how the server consolidation might look when going from 2.93 GHz Xeon X7350 physical XenApp servers to 2.93 GHz Xeon E5570 XenServer hosts.

    For the purpose of this test, we ran the physical XenApp server with a single 32-bit workload (Windows 2003 SP2 with MS Office). It was given 2 CPUs and 4GB RAM, typical for this XenApp server workload. Using EdgeSight for Load Test (ESLT) version 3.5 we established a baseline of 25 seconds for users to login, run a standard MSOffice task script, and then logout (including network connect time). We added users until the threshold to run this sequence reached a latency of 30%, at which point the server was deemed to be at capacity. Using this configuration and test program, the maximum number of users was 47. This was a relatively small, single physical XenApp server, so 47 concurrent users was considered respectable.

    Since we were testing with a Xeon E5570 server with dual quad core CPUs and 32 GB of RAM, we wanted to see how many users we could get onto a single host using multiple XenApp VMs, each with the same resource configuration as we used in the physical server test. We built 2 vCPU, 3.5 GB RAM XenApp virtual servers on the E5570 and ran two tests using the same ESLT workload. The difference between the 4 GB of RAM used in the physical server test and 3.5 in the virtual server test is due to the need for memory overhead when running multiple VMs. In the XenServer setup screen, we selected the option of running XenApp which automatically configured the VMs with the appropriate amount of shadow memory for XenApp workloads.

    We also wanted to see the impact of hyperthreading to VM density per host as well as the number of concurrent users per VM. Intel describes hyperthreading as “delivering thread-level parallelism on each processor resulting in more efficient use of processor resources, higher processing throughput and improved performance.” It would be interesting to see how many more concurrent XenApp users we could get with an upgrade to the E5570 and by virtualizing with XenServer 5.5 and then see how many more users we might get once hyperthreading was enabled. Would hyperthreading allow us to run twice as many VMs on a single host? To find out, we ran our first virtualized XenApp test with hyperthreading activated and then repeated the test again with it turned off. With hyperthreading, the first thing we noticed was that even though there were only 8 CPU cores on the E5570 host server, XenServer was able to see 16 vCPU cores as resources available to be assigned to VMs. As a result, we were able to successfully run a maximum of eight VMs, each with the necessary 2 vCPU cores, generating an average of 69.25 users per VM for a total of 554 users.

    When we ran the second test, this time with hyperthreading turned off, and noticed that the number of users per VM increased slightly to 88. However, the maximum number of VMs was now only four, due to the fact that we now only had 8 vCPU cores to work with. As a result, the total number of users for the host was now only 352.

    multi vm test (640x337).jpg

    Single VM test (640x359).jpg

    In the end, we discovered that while hyperthreading doubled the number of assignable vCPU resources, it didn’t directly translate to a 2:1 increase in the number of users per VM. That’s a reasonable trade-off, since hyperthreading effectively doubled the number of VMs that we could create with the same number of CPU cores. So, while were able to generate 6.5x the number of concurrent XenApp users onto a single Xeon E5570 host server without hyperthreading as compared to a single X7350 physical XenApp test server, the number of concurrent users increased to an incredible 10.8x with hyperthreading. That’s a remarkable server consolidation opportunity for any 32-bit XenApp administrator. And while XenApp will virtualize very nicely with XenServer on that same dual quad core X7350 server, remember that the number of users per VM when using this test schema will be 47. Since hyperthreading isn’t available on the X7350, the maximum number of VMs on the X7350 host would be 4 making the maximum number of concurrent users 188. Not bad, but nowhere near the 544 concurrent users we get on the E5570 with hyperthreading. That’s an increase of 356 users, almost three times the number of concurrent XenApp users.

    Pretty hard to ignore.

    As we’ve seen here, the promise of Intel’s Nehalem technology is being realized in some very practical ways. As a result, the performance bar for XenApp, when virtualized with XenServer, is now higher than ever.

    Recently in our test lab, we experienced a cooling failure... and I wasn't even sitting in the lab to realize it.  In fact, I wasn't in the same state!

     

    With the recent launch of the Xeon 5500 Series servers - I have been testing some use-cases against four of our servers in our lab when I noticed that the temperature was rising pretty drastically in there.  How did I see this?  Using Intel® Intelligent Power Node Manager embeddd in our Xeon Servers and using our Intel Data Center Manager (DCM) SDK software interface - the data is presented in a visual format.

    thermal trip.JPG

    In the graph above, the dark colored line is the "front panel inlet" temperature, and in a matter of minutes, the temperature in the lab rose from 71F to 87F - 16 degrees!  What I didn't have setup is the scenario is a power policy that activates on a thermal trip.  Here is how you would setup this policy in Data Center Manager under the Policies section for this rack:

     

    thermal-policy.JPG

    In the event that a thermal event occurred that would cause the room to heat up to 78F (as shown above) - Intel DCM would send the IPMI commands to the platform which in turn would tell the Node Manager firmware to throttle-back the Xeon CPUs to their lowest P-state possible.  This would reduce energy consumed across the systems in the policy group as well as reduce the thermal output of each server.  This would in turn generate less heat across the servers thereby reducing the load placed on an already overheated lab or datacenter.

     

    This gives the server managers more time to gracefully shutdown systems, and/or move the workloads to cooler sections of the datacenter.  If you have ever experienced a cooling failure in the datacenter, it's a usually a frenzy to shutdown machines to minimize heat and/or power utilization overall.  This thermal policy can give you more time before you reach a critical temperature where you start losing components, servers and ultimately - loss of data and productivity.

     

    Using standard the standard IPMI interface, the Data Center Manager SDK and Node Manager on the Xeon 5500 series platform enable power monitoring, power management, and front panel inlet monitoring.   This gives a server/datacenter manager the capcity to measure power usage per server, where you'd have to previously have more expensive power measurement tools.  External power meters cost anywhere from a cheap $15 to spendy $1000 - but now the technology is embedded into the firmware on the machine.

     

    You can learn more about the Xeon 5500 Series Processors on the Intel Xeon website.

    Every day in our personal lives, we’re bombarded with “opportunities” to get a better deal.  At the grocery store, we might be able to buy a single item for $2.50 or 3 for $5.00…which then forces us to go thru the mental gymnastics of figuring out how good of a deal it is, and whether or not we really need three 96 oz. bottles of salad dressing.

     

    But there are some opportunities out there for adding a bunch of compute performance are a bit more straight-forward.


    Case in point: Dell recently had Principled Technologies compare the performance for the Intel® Xeon® Processor E5520 and E5506 CPUs each running on a PowerEdge R710 server.  Both are 4 core processors, but the E5520 has many advantages over the E5506: 

     

    • higher frequency (2.26 GHz vs. 2.13 GHz)
    • faster QuickPath speeds (5.86 GT/s vs. 4.8 GT/s)
    • faster memory support (1066 MHz vs. 800 MHz)
    • Turbo Boost
    • Hyper-Threading support.

     

    Long story short:  Buying a slightly better processor with a server purchase can drastically increase your performance.  So if you are looking to buy a Dell PowerEdge server configured with Microsoft SQL Server 2008* and an Intel® Xeon® Processor E5506, for an additional $300 you can get up to 75% more performance by upgrading to an E5520 CPU.  More performance headroom in a similar power envelope, faster QuickPath and memory speeds, Hyper-Threading and Turbo Boost functionality – all for $300.  NOW THAT’S A GREAT VALUE!

     

    Check out the summary document for the Dell R710 Principled Technologies performance testing, which also has comparative performance testing for the Xeon® E5540 and X5550 CPUs (also a great value for the money!), along with results for Microsoft Exchange.

     

    NOTE:  System pricing from www.dell.com as of May 13, 2009.  Actual performance will vary based on configuration, usage and manufacturing variability. See the actual Principled Technology report in the following link for complete system configuration

    Are you a developer writing applications to run on the Solaris operating system?. Are you looking for ways to optimize your Solaris solution on industry standard architecture based on Intel microprocessor? If you answer yes to either of these questions then please read on.

     

    Intel and SUN have been working closely together to optimize the Solaris operating system on the Intel Xeon 5500 processor. Most of you probably know the Xeon 5500 better by its product codename Nehalem. The Xeon 5500 is the the product that fits into 2 socket platforms.

     

    SUN have just published a very compelling quick reference guidethat will assist both Developers and System Administrators looking to optimize Solaris solutions on Xeon based processors. The guide talks about the work that Intel and SUN are doing together, technical descriptions of specific features and capabilities that can be implemented in the Solaris OS to optimize the capabilities of the Xeon.

     

    I have just finished reading this and it is a very compelling paper covering topics such as

    - How Solaris takes advantage of Intel Turbo Boost Technology to use available power headroom to deliver higher performance based on workload demand

    - How Solaris can take advantage of new Intel Quickpath Interconnect (better known as QPI) and other innovations in the OS to reduce memory latency

    - How Solaris performance counters help to better manage workloads

    - How Solaris takes advantage of many of the power efficiency capabilities in the processor. Things like Power Aware Dispatched in Solaris enable the processor to stay longer in idle states. In non tech talk this saves power.

     

    Solaris has been a tried and tested operating system for along time for companies running their most business critical workloads. This paper talks about the combination of Solaris and Xeon to deliver improved reliability and availability for these critical workloads. Detail information on predictive self healing, fault management, leveraging Intel Machine Check Architecture and more all included in this paper.

     

    Probably my favourite section is around the developer tools optimizations and the different tools available for developers that want to run and optimize their applications on Solaris and Xeon.

     

    Ok, I'll stop waxing lyrical now. This is a very compelling paper and it does certainly construe that Solaris and Xeon 5500 could be the perfect combination for your Solaris solution. What do you think?

    The current economic environment is leading to Customers becoming increasingly aware that there is an economic benefit in migrating from RISC/UNIX environment to Intel platforms. Here at Intel we have seen a significant increase in requests from Customers that are considering this opportunity. There are several paths available to migrate with multiple operating systems supported on the Intel architectures. Customers will decide on their operating system environment, and in some industries we are seeing a demand to move from Unix to Linux. Customers are understanding the TCO and economic benefits of moving and are now focused on ‘when’ and ‘how’ to migrate.  Eoin McConnell and others blogged about this in greater depth. Resources are available from Redhat to assist with migration.  Red Hat and Intel team is responding to that and moving to the next phase of delivering "when" and “how-to.”  Following up two webinars the team delivered earlier (Apr 28 webinar, May 14 webinar), on June 24, at 11am pacific, 2pm eastern, we will be delivering our first "when" and “how-to” webinar putting solution experts from both companies under the spotlight.  Please register, spread the word, and join this webinar:  Register!!! 

    The debate on how to best increase system capacity to accommodate growing applications has raged on for years; “scale up” with more CPU, memory, and I/O, or “scale out” with loosely connected systems.    Scaling out by adding networked systems to increase capacity has been a good economical solution for many IT managers because it allows them to grow by using less expensive, industry standard building blocks.  However, there are some notable exceptions to this line of thought.  One is that the class of applications that require shared memory and large database support are much better suited to run on a single, expandable system that scales up.  These are typically transaction processing, business intelligence and ERP solutions.   Until now, IT managers running applications that require scale-up systems larger than 4 or 8 CPUs have had limited platform choices and most were proprietary and expensive RISC-based servers.

     

     

    The other problem with the scale out approach is the people, facilities, software and overhead costs and complexity of managing very large numbers of servers, which can grow to a point where the costs outweigh the performance and system cost benefits.  The industry solution to achieving better ROI has been to consolidate multiple scale-out servers onto single industry standard scale-up servers with virtualization solutions.  This is a good solution, but is limited by the number of application loads the IT manager feels comfortable placing on a single server, given the need to maintain peak performance and availability for each application.

     

     

    Well, it looks like the scale-up, scale-out debate is about to take another turn.  In the server product update Intel gave on May 26th, they talked about new levels of system scalability and choice supported by the upcoming Nehalem-EX processor.  This processor will support systems that scale up to 8 sockets natively (shared memory, without any additional silicon), and up to 16 sockets and higher with node controllers from system manufactures that allow single systems to share memory beyond 8 sockets.   So far there are over 15 different designs from 8 OEMs that offer 8 socket or higher scalability.  But of course, for the class of application where scaling is important, socket count doesn’t tell the whole story of what’s needed for scalable performance.  Thread support, key for transaction processing and virtualization, scales at the rate of 16 threads per socket with 8 cores and Hyper Threading (2 threads per core).  That would be 128 threads for an 8-socket system, and 256 threads for 16 sockets.   And in order to keep those threads fed with data close to the CPU, each processor supports up to 24 MB of shared cache (1.5X current generation Xeon), and an impressive 16 memory slots per socket or 128 DIMMs on an 8-socket system.  In addition, the Scalable Memory Interconnect gives these systems 9 times the memory bandwidth of today’s top Xeon processor.  Finally, four QuickPath interconnect links per socket allow for high-bandwidth sharing of data across the system.

     

     

    So the net of it is that the industry is going to see a broad selection of highly scalable, next-generation servers that significantly extend the economic advantage of industry standard scale-up solutions for business-critical, large database, and high-end virtualization/consolidation deployments.     I would expect these systems to give IT managers a very cost-effective alternative to the much more expensive and proprietary RISC-based servers they use today.

     

    What are your thoughts?  Mike

     

     

    Related Topics:

     

     

     

     

    As companies face the economic downturn, they are being asked to trim their IT budgets -- essentially, do more with less. Meanwhile, IT folks are also being asked to make sure their companies remain competitive with the best server performance running best of breed IT solutions that operate in extremely efficient data centers as well as ensuring every IT dollar spent is showing an RIO within 12 months or less. That raises the question: “Can migrating applications from a RISC architecture to an Intel architecture save a company money and allow them to remain competitive?” In many cases the answer is “YES!”

     

     

    I have been an Intel Enterprise Technical Specialist supporting many of the large financial customers in the NYC area. My customers have a mix of all sorts of platforms, from commodity X86 servers to large RISC servers and from Midrange to Mainframe systems. Customers perform tests to measure Performance, Performance per Watt and Performance per Dollar. The outcomes will determine the architecture that is best suited for their applications. Customers have also relied on industry benchmarks such as CPU2006, SPECint, SPECfp, SPECpower_ssj2008, and SPECjbb2005 whose results can be found at www.spec.org.

     

     

    I have seen many custom and commercial applications that used to run on other architectures which have been ported and are now running on commodity Intel architectures. Why? The Intel Xeon 5500 Series microprocessor (codename Nehalem) is delivering increased performance, power efficiency, and overall lower cost needed to meet the IT requirements for their need. For example, in the financial sector several applications exist, such as Market Data Feed Handlers, High-Frequency Automated Trading, Risk Analytics, Monte Carlo (compute farms) which require high performance servers to gain a competitive advantage and increase revenues for the firm.

     

     

    As an example, one of my customers migrated several of their company’s in-house developed applications that were running on legacy RISC servers. Migrating applications to Intel servers was a straight forward process since many of them were written in Java and were fairly easy to port. Other applications that were written in C/C++ could be migrated using Intel software tools, (i.e. Intel C/C++ compiler, Thread Checker, Thread Profile and Vtune) to make the job were extremely helpful in migrating their applications to the Intel architecture. For example, using Intel servers for their Risk Analytic application provided increased compute performance over their legacy RISC servers which helped complete their Risk Analytic runs much faster with fewer servers leading to an overall lower TCO.

     

     

    Using Intel Xeon 7400 & 5500 Series has not only provided increased overall performance but has decreased the number of servers through server consolidation in the data center which also requires less energy.  This has helped prevent the data center from reaching the capacity of power and cooling. For some of my customers, using Intel Xeon 7400 & 5500 servers has extended the lifespan of their data center, saving millions of dollars not having to build new data centers due to its increased power efficiency while reducing overall operational costs.

     

    If you read my blog about server refresh and quarterbacks, you will understand how important it is to have a good quarterback inside your organization leading the server refresh effort.  Well at Intel IT that person is Matt Beckert.

     

     

     

    I have had the opportunity to work closely with Matt over the past couple years and have watched Intel’s server refresh strategy develop, get ratified and … because of the economic conditions … get questioned.  It was interesting to sit on the sidelines and watch how the economy caused intel to question a proven strategy that delivered $45M of savings to intel in 2008 (Intel IT Performance Reports).

     

     

    Ever since I was a kid, I have been an avid New England Patriots fan and Tom Brady is worth every dollar of the over $14M the Patriots will pay him in 2009

     

     

    However, I’m sure glad that Matt is on the Intel IT team as his efforts have demonstrated to Intel that proceeding with server refresh in 2009 inside Intel IT’s infrastructure is worth $19M of savings versus deferring refresh to 2010.  Read more about “Staying Committed to Server Refresh Reduces Cost” and find out where the savings came from, how Intel IT overcame the capital budget constraints internally to make this priority investment.

     

     

    • Who is your server refresh quarterback?
    • What is your savings opportunity?
    • Model your potential savings for server refresh at www.intel.com/go/xeonestimator

     

     

     

     

     

    Chris (Go Patriots )

    I spend a lot of time thinking about computing efficiency, but there's an interesting statistic that really blows the doors off of what computing represents to the world's sustainability.  According to Gartner, if you measure all of the energy savings that computing can bring to our planet, 2% is from making computing platforms more efficient.  A whopping 98% stems from how we utilize computing resources make how we work and live more efficient.  The sources of this efficiency are vast but many come immediately to mind...telecommunting, design of products via workstations vs. physical prototype models, downloading music via iTunes negating the need to produce millions of CDs.  If more proof were needed you just need to look through the government stimulous package to see how critical the role technology plays in driving more efficiencies across industries.

     

    When we created the Data Center Efficiency Challenge we specifically pointed out that part of this competition would be judged not on the efficiency of the datacenter but on how the datacenter was making the organization more efficient.  To take this notion further we've started a new competition...on JustMeans.com...to spur more discussion on how companies are utilizing technology to re-map the way they do business for an energy-aware 21st century world.  Check it out.

    Looks like the Intel® Xeon® processor 5500 series is making lots of noise in HPC.  The QPI and integrated memory controller are really providing the boost necessary to make it an all around performance leader for HPC applications.  With all this performance why did Intel add a third memory channel?

    The third memory channel enables the platform to support a boat load of memory.  Matter-of-fact, up to 192GB can be supported in a two socket configuration.  It wasn’t too long ago when only 32GB was supported in a dual socket configuration.  By having the ability to support so much memory you can now meet the needs of almost every HPC application.  The 5500 series is intended for all server markets, but let’s face it, with the design changes Intel made with the new architecture the server segment gaining the most benefit appears to be HPC. 

    It seemed like yesterday when the only way to have access to large memory configurations was through expensive, proprietary SMP systems.  The HPC market for large SMP systems is still out there but it is shrinking…fast.  Today, we are clustering low cost solutions to create some of the most powerful systems in the world.  Standard components are leading to lower and lower system costs, delivering a price/performance advantage alternative solutions cannot meet.

    Now that a single dual socket node can support up to 192GB’s it is important to understand how to get there.  First, to enable 192GB you need 16GB DIMMs x 12 memory slots.  There will be a premium for a 16GB DIMM.  Knowing the options and determining the best, most cost effective solution is going to be dependent upon your environment.  When a large memory node is required, do you purchase the 16GB DIMM’s or go up to a Multi-socket solution?  If I decide to scale back on the memory (use 4GB or 8GB DIMMs instead of 16GB DIMMs) what is the performance impact to my application?  If I am cost sensitive, will the lower cost outweigh the lack of performance?  Can I use SSD’s (Solid State Disk drives) to compensate for any performance loss due to lower memory capacity?  There are many questions to think about when deciding the right configuration for your application and environment and I certainly can’t answer them here.

    Let’s not forget the third memory channel enables a different set of optimal memory configurations.  Think x3 when deciding on how much memory to install into your node; 12GB, 24GB, 48GB, etc.  What happens when you don’t use an optimal configuration?  Well it depends, in most cases the impact is minimal, but let me add a bit of context around minimal:

    ·         Low bandwidth sensitivity (more dependent upon the processor for performance)

            E.g. Monte Carlo, Black-Scholes (financial modeling), BLAST (bioinformatics), AMBER (molecular dynamics)

            Expect less than a 2% difference between memory configurations*

    Ÿ  Medium bandwidth sensitivity (somewhat balanced between memory and CPU usage)

            E.g. CFD, Explicit FEA, Implicit FEA (with robust I/O system)

            Expect approx. 5% degradation for non-optimal symmetrical configurations*

    Ÿ  High bandwidth sensitivity (high access to the system memory)

            E.g. WRF (weather), POP (climate), MILC (physics), Reservoir Simulation

            Expect approx. 10% degradation for non-optimal symmetrical configurations*

    The results are interesting.  In all three cases above, the degraded performance is always better than the performance you would have with only two memory channels.

    When you hear about performance impact of non-optimal memory you can see by the examples above, it is application dependent and will not have a severe impact on your overall system performance.   

    The Intel Xeon processor 5500 series offers support for huge memory nodes with the addition of the third memory channel.  Memory configurations in multiples of three are ideal, but if you decide to stay with a power of two configuration the performance should still exceed that of a solution based upon only two memory channels.

    *Based upon Intel internal measurements

    Here's the 5th video in my VMWorld Chalk Talk Series. In this one, Gerhard Schlabschi, Systems and Storage Marketing with Sun Microsystems gives a chalk talk on various virtualization systems and discusses some of the trade-offs in a virtualized environment. Enjoy .

     

     

     

    A MONSTER CHIP IS COMING. The next generation of MP processor is targeted for production later this year, and by all accounts it is going to be a monster. Nehalem-EX is part of the Nehalem family of processors, but compared to its siblings it has the highest cores/threads count, largest shared cache, highest CPU-to-CPU bandwidth, highest I/O bandwidth, highest memory capacity, highest memory bandwidth, greatest scalability, and highest level of Reliability/Availability/Serviceability. It’s expected to bring a gargantuan, unprecedented leap in capabilities and performance--the biggest leap in all of Xeon product history.

     

    IT’S TARGETED AT “BIG BOXES”. Big box servers are multiprocessor systems using the most capable processors and platform components. These systems are targeted at applications and usages that require the largest memory footprints, the highest amounts of single-box processing power (for workloads that don’t decompose well into lots of independent threads) and/or advanced levels of RAS. Such systems are typically the best choice for large databases, ERP apps, Business Intelligence apps, large-scale server consolidation and business-critical virtualization, mission critical applications and large scale high performance computing.

     

    IT USES THE SAME PROCESSING TECHNOLOGY AS THE SUCCESSFUL XEON 5500, BUT MORE OF IT. Just like with Xeon 5500, the Nehalem micro-architecture brings improved single-threaded performance via IPC (Instructions per Clock) enhancements and Intel’s Hi-k 45nm manufacturing process. Greater multi-threaded performance comes via Hyper-Threading and more cores. But while the Xeon 5500 has up to 4 cores/16threads per socket, the Nehalem-EX monster doubles that to 8 cores/16 threads.

     

    HAS A BEEFIER MEMORY AND INTERCHIP COMMUNICATION SUBSYSTEMS. Monster thread processing capabilities require monster size feeding to bring out the best performance. Nehalem-EX’s raw processing potential is made viable by a heavy duty memory subsystem and inter-chip communication system.

    Nehalem-EX has 24MB of shared level 3 cache--that’s 50% more than the current Xeon 7400 and 200% more than Xeon 5500. The memory channel bandwidth was increased to 9-times that of Xeon 7400. And it’s all attached to up to 16 DIMM slots per socket (that’s 64DIMMs slots for 4 sockets)—double the current generation of Xeon 7400.

    In a multi-socket system, processors need to communicate with each other in order to most efficiently coordinate their shared workload. They also need lots of I/O bandwidth. Nehalem-EX has four QuickPath Interconnects on every socket--double that of Xeon 5500. The four QPI links enable Nehalem-EX processors to be directly connected to each other in a 4 socket system. This offers significant performance advantage over a so-called ring architecture wherein some processor-to-processor communication must go through an intermediary processor. The extra QPIs also mean that there’s plenty of CPU to I/O bandwidth.

     

    EXPECTED TO BRING THE GREATEST LEAP FORWARD IN XEON PERFORMANCE EVER. On key server performance benchmarks (e.g. SPEC_int_rate, SPEC_floating point_rate, TPC-C, etc) Xeon 5500 using Nehalem technology brought gains of over 100-200% greater than prior generation. Generational gains of this magnitude come along just about once a decade. Nehalem-EX’s generation-to-generation performance gains are expected to be substantially higher than those of Xeon 5500. We’ve already seen measured memory bandwidth of 9X vs. prior generation. That’s an early indication of the level by which new performance records will be set when this monster chip comes to market.

    Related Topics:

    NHM-EX Press Fact Sheet

    NHM-EX May 26th Press Briefing Video – condensed version

    IBM 8Socket Demo Video

     

    NHM-EX--A New Standard

    The Intel(r) Dynamic Power Node Manager technology allows setting a power consumption target for a server under load as described in a previous article.  This is useful for optimizing the number of servers in a rack when the rack is subject to a power budget.

     

     

     

     

     

     

     

     

     

     

    Higher level software can use this capability to implement sophisticated power management schemes, especially schemes that involve server groups.  The range of control authority for servers in the Nehalem generation is significant.  The power consumption of a fully loaded server consuming 300 watts can be rolled back by roughly 100 watts.  In virtualized utility computing environments additional control authority is possible by migrating the virtual machines out of a host and consolidating them into fewer host.  The power consumption of the power capped host now at 200 watts, can be brought down by another 50 watts, to 150 watts.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

    The reader might ask about the possibility of constantly running servers in capped mode to save energy.  Unfortunately capping entails a performance tradeoff.  The dynamic is not unlike driving an automobile.  The best mileage is obtained by running the vehicle at a 35 MPH constant speed.  This is not practical in a freeway where the the prevailing speed is 60 MPH.  The vehicle could be rear ended, or perhaps a more mundane motivation, the vehicle driver drives the vehicle at 60 MPH because she wants to get there sooner.  Like a server, the lowest fuel consumption in a running vehicle, at least in gallons per hour, is attained when the vehicle is idling.  No real work is done with an idling engine, but at least the vehicle can start moving in no time.  Continuing with the analogy, turning a server off is equivalent to storing a car in the garage with the engine stopped.

         

     

     

     

     

     

     

     

     

    This document provides an example of the performance tradeoff with power capping.  Please look in page 5, Figure 2.

     

     

     

     

     

     

    The following example illustrates how group power capping works.  The plot is a screen capture of the Intel(r) Data Center Manager software managing the power consumption in a cluster of four servers.  The four servers are divided in a cluster of two server sub-groups of two servers each, labeled low-priority and high-priority

     

    DCM-GUI.png

     

    The light blue band represents the focus of the plot. The focus can be changed with a simple mouse click.  The current focus in the figure is the whole rack.  Hence the power plot is the aggregated power for all four servers in a rack.  If the high priority sub-group were selected, then the power shown would be the power consumed by the two servers in that sub-group.  Finally, if a single server is selected, then the power indicated would be the power for that server only.

         

    There are four lines represented in the graph.  The top line is the plate power.  It represents an upper bound for the server’s power consumption.  For this particular group of servers the plate power is 2600 watts.  The servers are identical, and hence rated at 2600 / 4 = 650 watts. 

    The next line down is the derated power.  Most servers will not have every memory slot or every hard drive tray populated. The derated power is the data center’s operator guess about the upper bound for power consumption based on the actual configuration the server.  The derated power is still a conservative guess, considerably higher than the actual power consumption of the server. As a rule of thumb, it is ~70% of the nameplate. The derated power has been set at 1820 watts for the rack or 455 watts per server.

         

    Finally, the gold line represents the actual power consumed by the server.  The dots represent successive samples taken from readings from the instrumented power supplies. 

         

    The servers are running at full power using the SPECpower benchmark.  The rack is collectively consuming a little less than 1300 watts.  At approximately 16:12 a policy is introduced to constrain power consumption to 1200 watts.  DCM instructs individual nodes to reduce power consumption by lowering the set points for Node Manager in each node until the collective power consumption reaches the desired target.

    When we instructed Data Center Manager to hold a power cap for the group rack (2), it makes an effort to maintain power at that level, in spite of unavoidable disturbances in the system. 

     

    The source of the disturbances can be internal or external.  An internal disturbance can be the server fans switching to a different speed causing a power spike or dip.  Workloads in servers go up and down, with a corresponding uptick or dip in the power consumption for that server.  An external disturbance could be a change in the feed voltage or an operator action.  In fact at T = 16:14 we introduced a severe disturbance: we brought the workload of the bottom server, epieg3urb07 down to idle. 

     

     

     

     

    Note that it takes a few seconds for Data Center Manager to react and to reach the original power level.  Likewise, when the bottom server is brought to idle, it also pulled back the power consumption for the group.  However, the group power went back to the target power consumption after a couple of minutes.  If we look at the plot of the individual servers, we can see Data Center Manager at work maintaining the target power.

    Combined Power.png

    The figure above captures the behaviors of the individual servers.  Note how DCM allocates power to individual nodes yet it maintains a global power cap. When the server at the bottom is suddenly idled, there is a temporary dip in power server consumption for the group, but it soon recovers to the target capped level.  Also note that the power not used by the bottom server is reallocated to the remaining three nodes until they get close to the previously unconstrained level.

     

    I’m quite pleased with the ramp of the Intel Xeon processor 5500 series. As reported during the Nehalem-EX briefing (more info here), we expect the Xeon 5500 to reach more than half of our 2S server shipments by August 2009. All of our key OEM customers have embraced the new architecture with complete product offerings, which provides IT administrators a plethora of Xeon 5500-based systems to choose from.

    For this blog, I am focused on data center advancements that Cisco is pioneering with their Unified Computing System (UCS). They intend to combine best practices for network infrastructure with data center virtualization. The Intel Xeon processor family, Intel Virtualization Technology and Datacenter Ethernet are foundational elements to Cisco’s strategy.

    Already the industry has recognized the Cisco UCS blade platform with awards such as “Best of Interop 2009” for Data Center & Storage (link) and “Best Data Center Innovation” from BladeSystems insight (link). It is a bold move on Cisco’s part, and makes a lot of sense in light of the convergence of servers, storage and networking. See a video testimonial about Cisco UCS and Fiber Channel over Ethernet (FCoE) from our own Intel CIO, Diane Bryant, here.

    Recently, Cisco announced further expansion of the UCS product portfolio with addition of three new rack-mount servers. All of them are based on the Intel Xeon processor 5500 Series and are expected to provide compelling performance, memory expandability and integrated virtualization capabilities. They are expected in the fourth quarter. You can read more about the new rack-mount servers here.

    Have you had a chance to evaluate the new Cisco offering yet? Have you made any plans to deploy in 2009? What do you think of unified fabric and the concepts that Cisco has put forth? Let me know!

    -steve

    Your most valuable employee is the one that creates tomorrow’s successes.  Providing them tools that help them do that faster will help your organization create new products or optimize old ones more rapidly.  The benefit to the organization is increased opportunities to win the customers attention via new products or your responsiveness to their request; the employee gets to brag on what he or she just helped bring to market.

    Before we get to far let’s look at Intel’s mission with respect to workstations.  We are laser focused on supplying technology that provides users with an uncompromised experience in transforming their ideas into reality.  With that in mind we look at how users create; we try understanding their obstacles and work with the ecosystem of hardware and software providers to deliver solutions to real problems that may be inhibiting their opportunity to innovate.  

    One technology that is helping users innovate faster is virtualization. 

    The Observation

    We saw workstation user’s innovation slow as they multitasked between tasks – some of them not even theirs.  The involuntary task included IT security patches, updates, and system backups to name a few.  We also saw that users were no longer just doing CAD, but they were doing CAD, using productivity tools, meshing, web surfing for supporting facts, collaborating via video, digital white boarding and trying to do analysis driven design.  They were very busy people.

    In some cases we noticed that some users actually had not one, but two workstations running in completely different environments, many times different OS’s.

    The Problem

    What the above really lead to is a conclusion that too many task were going after too few resources and that the experience we had hoped the user would encounter was not happening.  In fact the reverse was happening – interactive creative task were slowing, system sluggishness was at an all time high.  The “uncompromised experience in transforming their ideas into reality” we wanted for a workstation user was not there and any innovation that was possible was slowed down to a crawl.

    A Potential Solution

    Intel® Virtualization Technology for Directed I/O, once just thought of for servers actually has a place in the workstation market. 

    This technology provides an important step toward enabling a significant set of emerging usage models in the workstation. VT-d support on Intel platforms provides the capability to ensure improved isolation of I/O resources for greater reliability, security, and availability.  That is a mouth full let’s see it in action.

    There are two key requirements that are common across workstation usage models.

    1.       The first requirement is protected access to I/O resources from a given virtual machine (VM), such that it cannot interfere with the operation of another VM on the same platform. This isolation between VMs is essential for achieving availability, reliability, and trust.

    2.       The second major requirement is the ability to share I/O resources among multiple VMs. In many cases, it is not practical or cost-effective to replicate I/O resources (such as storage or network controllers) for each VM on a given platform.

    In the case of the workstation, virtualization can be used to create a self-contained operating environment, or "virtual appliance," that is dedicated to capabilities such as manageability or security. These capabilities generally need protected and secure access to a network device to communicate with down-the-wire management agents and to monitor network traffic for security threats. For example, a security agent within a VM requires protected access to the actual network controller hardware. This agent can then intelligently examine network traffic for malicious payloads or suspected intrusion attempts before the network packets are passed to the guest OS, where user applications might be affected. Workstations can also use this technique for management, security, content protection, and a wide variety of other dedicated services. The type of service deployed may dictate that various types of I/O resources, graphics, network, and storage devices, be isolated from the OS where the user's applications are running.

    The Result

    Working with Parallels Workstation Extreme VM application we looked at two problems.  First was the general overhead related to too many request and too few resources and then we explored the more complex problem of a single workstation with a need to display at near native performance in two different OS’s.

    The former was straight forward, create VM’s, partition resources and your innovator now has a very resilient workstation that is capable of delivering the intended experience.  IT can have their VM’s and the user has his or her workstation back and the concept of digital prototyping to create and explore a complete product before it is built is a reality.  Your innovator can now iterate through more ideas in less time and your company’s opportunity to catch the customer’s attention just went through the roof.

    The former provided a much harder challenge.  We tested the idea in the oil and gas market where users actually had two workstations; one running Windows, one running LINUX. Both had a requirement for visual display and both acted on that same reservoir data with applications that while similar in many ways, they were still different.  When preparing to drill a multimillion dollar well – the idea of more data saying the same thing is a very good thing.

    The Proof Point For Virtualization In A Workstation Engineers from Schlumberger, a leading oil field service provider, run performance-demanding applications such as GeoFrame* and Petrel*.  These applications serve to analyze complex geologic and geophysical data and determine the viability of potential reservoirs, or to optimize production at existing sites. With GeoFrame running on Linux* and Petrel on Microsoft Windows*, Schlumberger engineers have been running these applications on two separate workstations, driving down productivity and increasing both power consumption and IT maintenance costs.

    A New Paradigm

    With the advent of Intel Xeon processor 5500 series-based workstations running Parallels Workstation Extreme, virtualization software has opened new horizons with breakthrough graphics performance.

    Schlumberger compared the concurrent performance of applications running on a virtualized Intel Xeon processor 5400 series-based workstation with the same setup on the Intel Xeon processor 5500-based machine. The results were astounding. The first machine ran Petrel at full native speed, but performance for GeoFrame slowed enormously. While Petrel refreshed its graphics at a rate of 30 frames per second, GeoFrame crawled along at a graphics refresh rate of JUST one frame every 19 seconds, an agonizingly slow performance.

    When the group tested both applications on the Xeon 5500 series workstation, the results were striking: Both applications ran at full native speed, and both were able to refresh graphics at 30 frames per second—a 570 times improvement over the first workstation.

    Russ Sagert, Schlumberger’s Geoscience Technical Advisor for North America said “our engineers were blown away by the performance. We hammered these machines with extreme workloads that stressed every aspect of the system. Amazingly, the new workstation based on the Intel Xeon processor 5500 series provided performance enabling this multiple OS, multiple application environment for the first time.”

    The key element in Schlumberger’s new environment is Intel Xeon processor 5500 series-based workstations with Intel® Virtualization Technology (Intel® VT) for Directed I/O (Intel® VT-d).  Together, these technologies enable direct assignment of graphics and network cards to virtual machines, enabling the machine to circumvent the interrupt and exit loop and clearing the previous performance problems.

    Running in conjunction with Parallels Workstation Extreme, which effectively leverages Intel Virtualization Technology, including VT-d, the solution revolutionizes virtualization for high-end users. “High-performance virtualization on Intel Xeon processor 5500 series-based workstations is a game-changing capability,” says Sagert. “We can allocate multiple cores, up to 64 GB of memory and a dedicated graphics card to each machine. The results are spectacular.”

    In the final analysis, moving to the Intel Xeon Processor 5500 series of next-generation workstations does far more than cut costs. It impacts the way that work gets done. If you have clients running the kind of resource-intensive, graphics-rich applications that traditionally slow to a crawl in a virtualized environment, consider the benefits of finally moving beyond the I/O barrier.

    A fully configured Intel Xeon Processor 5500 series-based workstation running Parallels Workstation Extreme delivers the performance level that makes virtualization a contender for these users. A streamlined work interface, reduced office noise and clutter, and significant performance gains works on the user side. But the IT organization also gains benefits by lowering capital, management, support, space, and energy costs.

    Moreover, the IT team can now standardize on a single OS image while addressing alternative requirements.

    Learn More

    Intel Workstation Processors http://www.intel.com/products/workstation/processors/index.htm

    Parallels Workstation Extreme

    http://www.parallels.com/products/extreme

     

    ChrisPeters

    Are You an SAP Insider?

    Posted by ChrisPeters Jun 3, 2009

    Today, I came across this website and special offer to become an SAP insider.  While I started looking at some joint papers and technology proof points developed in collaboration between SAP and Intel on the new Nehalem (xeon 5500) products and SAP's latest solutions.  I also found a bunch of information that SAP does in collaboration with many other vendors on technology designed to boost IT value.

     

    Special features included collaboration with Sun, Citrix, RedHat, Novell, VMware

     

    Registration was quick, easy, free and very informative. Highly recommended!

     

    Read How Intel and SAP Deliver Business Value Through Strategic Technology Investments (registration page) and take your first steps to becoming an SAP Insider.

     

    Don't want to register for another site or newsletter ? ... go to http://www.intelalliance.com/SAP/

     

    Chris

    Back in 2001-02 when virtualization started to garner interest in the IT world, I wondered about running different Operating systems simultaneously on a server, I remember setting up a small environment in the lab using VMware GSX server and trying to run multiple operating systems side by side to each other. I had to take my focus out of virtualization after that due to change in my job role. I seldom spent much time looking into virtualization technology till I accepted another new role after Six years later.

    A lot of development had happened over these years, with virtualization widely accepted among IT techs and management as a instrument to save money on IT expenditure, decrease the TCO, increase ROI and spend money wisely n ever shrinking IT budget. Processor technology moved mutlicore with more than Two cores available on a server and very few software which could take advantage of all those increased number of threads the multicore provided by processors, consolidating physical servers in form of virtual machines on a multicore processor based server helped IT to leverage additional threads and run their datacenters much cooler reducing the number of physical servers. Virtualization also brings in other goodies in terms of redundancy and disaster recovery. Since there are tons of material available on virtualization technology, I will stop here and won’t dwell deeper on virtualization.

    As we all know the hypervisor also referred as Virtual Machine Monitor handles all the hardware resource slicing for the virtual machines running on top of it, providing identical execution environment. While VMM takes of time sharing hardware resources and allocating processor, memory and I/O slices to the virtual machines, it introduces significant latency and overhead since it has to translate every request concerning wit CPU/Memory and I/O and pass it on the actual physical device to  complete the request. This has been a Achilles heel for virtualization technology, IT organizations try to keep critical applications running in physical server since they fear the latency and overhead of VMM would bring down the performance of the application if virtualized. Intel introduced hardware assisted support for Virtualization within the processor in year 2005 called as Intel-VT technology. While Intel VT technology alleviated many performance issues associated with Processor virtualization solving part of the problem, memory overhead still remains.

    Intel released new Intel Xeon 55XX series processor in March 2009; the new Xeon’s brings a many new technologies. Among the list of new things Xeon 55XX series brought in. One feature called Extended Page Tables also called as EPT. As I discussed earlier Hardware assisted virtualization support with Intel VT alleviated processor overheads, EPT takes care of memory overheads and provides virtual machines to perform much faster than software or VMM translated memory access. I would spend some time to discuss the three modes of memory addressing. A) How a normal process access memory on a physical machine. B) How software assisted memory management in Virtual machines with EPT support. C) How memory management is done when EPT is enabled.

    Memory Management in Native Machines

    In a native system the task of managing logical memory page numbers to Physical memory page numbers is handled by Operating System. The Operating system accomplishes this by storing the entries in something called as page table structures. When a process of any application access the logical address of the memory where it thinks the data is stored the hardware goes through the table structure to find out the physical address location of where the data is stored. Frequently accessed Logical page number to Physical page numbers of memory address locations are cached by the hardware system in Translation Look aside Buffer also called as TLB.  TLB is a small cache on the processor which accelerates the memory Management process by providing faster LPN to PPN mappings of frequently accessed memory locations.  Picture A shows Memory Management on a native machine.

     

    Memory Management using VMM

    When Virtual Machines are run on a hypervisor, the guest operating systems won’t have access to the hardware page tables like the natively run operating systems. The Virtual Machine Monitor emulates the page tables for the Guest operating systems and gives the guest Operating systems an illusion that they are accessing actual physical page numbers when mapping from Logical Page Numbers from the processes running.  The VMM actually runs a page table of it’s own called Shadow Page tables which is visible to the system hardware. So whenever the guest OS makes a request for virtual address translation to physical memory address the request is trapped by the VMM, which In turn run through its shadow page tables and provides the address of physical memory location. Picture B shows Memory Management using VMM.

    While the VMM handles the LPN to PPN mapping quite efficiently, there are times when the page fault occurs considerably slowing down the application and the operating systems. The major penalty comes when the guest OS adjusts its logical mapping, this will trigger the VMM to adjust it’s shadow pages to keep in sync with the logical mappings of the Guest OS. For any memory intensive application running inside the guest OS , this process of syncing pages causes a hefty drop in performance due to the overhead of virtualization.

    Memory Management using EPT

    Hardware assisted memory management using EPT makes life easier for VMM. With EPT the TLB cache assumes an additional role to keep track of virtual memory and physical memory as seen to the guest OS. The individual virtual machines are tracked by the TLB by assigning them with an address space identifier.  Using the address space identifier the TLB can track the virtual machine address space and need not have to flush the TLB cache if one VM switches it space.

    The advantage of having EPT manage memory for Virtual machine reduces the need for VMM to keep syncing the shadow pages eliminating the overhead, since the number of times the Shadow pages needs to be synced depends on the number of virtual machines running the server, elimination of sync produces tremendous increase in performance for server with larger number of virtual machines. In addition to this, the benefits of EPT scales with the number of virtual processors assigned to a particular VM, since the rise in processor count also increases the shadow page syncs. Using EPT to eliminate shadow page syncs enables the CPU’s to just sync TLB as the changes occur in the virtual pages, this process is close to achieving management of memory on a natively run operating systems. The only possible downside of managing memory using EPT is that the additional overhead it when there is a TLB miss, typically by many number of TLB stressing applications running on the same physical server. However the Hypervisors take an approach to reduce TLB misses by using large page tables.

    Picture C shows Memory Management using EPT.

    As a follow-up to my blog, I am setting up a quick lab environment to verify the EPT advantages. I think I will be able to post the results from my quick hands-on experiment in couple of week’s time.

    -Bhaskar Gowda.

     

    I have received a number of customer questions recently on Intel® Hyper-Threading Technology. Hyper-Threading Technology is available on the new Intel® Core™ i7 processor and the Xeon® 5500 series processors. Here are a few of my favorite questions and answers - ranging from the basics to more advanced topics.

    What is it?

    Intel® Hyper-Threading Technology is a performance feature on our new Intel® Core™ i7 processor and the Xeon® 5500 series processors. Put simply, it allows one core on the processor to appear like 2 cores to the operating system. This doubles the execution resources available to the O/S, which potentially increases the performance of your overall system. For the visually-oriented, you can view a graphical explanation of Intel® Hyper-Threading Technology by clicking on the demo here.

    Talking about cores, threads, and Hyper-Threads can get a bit confusing. To make things simple for the rest of this blog, I'm going to call Hyper-Threads hardware threads, and O/S level threads software threads. Just as a refresher, a core is 1 CPU. Each Core™ i7 or Xeon® 5500 series processor shipping currently has 4 cores (we may offer other versions in the future).

     

    How can I tell if my system is using Hyper-Threading Technology?
    You must have a processor, chipset, operating system, and BIOS that all support the technology. Luckily, that is not much of a problem. Many of the desktop and server platforms that ship with Nehalem-based processors include this support. Most of these platforms will allow you to enable or disable Hyper-Threading Technology as a BIOS option (it should be enabled by default). You can view your CPU information using the Task Manager in Windows*, and /proc/cpuinfo in Linux*. If you have a supported platform and Hyper-Threading is enabled, you should see twice the number of CPUs as you have physical cores in your platform. For example, if you have a dual-processor Xeon® 5500 series server, you should see 16 CPUs. (16 hardware threads running on 8 physical cores, 2 threads per core.)

    http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/06/hyperthreading_disabled1.jpg

    http://software.intel.com/en-us/blogs/wordpress/wp-content/uploads/2009/06/hyperthreading_enabled.jpg

    Available CPUs on the same platform with Hyper-Threading Technology disabled (top) and enabled (bottom).

     

    Can I run 2 applications simultaneously on 2 different threads on the same core?
    Yes. The 2 software threads running on a single core do not have to be threads of the same process. They could be (in the case of multi-threaded software), or they could be from 2 separate applications. Which 2 software threads would run on the 2 hardware threads of a Hyper-Threaded core would be up to the operating system. So, yes, you could have 2 different applications running on the same core at the same time. (Whether you would get equal performance in this scenario as you would with the 2 apps running on separate cores is a different issue – see question 6.)

     

    Now that you know the basics, visit my article in the Intel Software Network knowledgebase to learn more. Get the answers to these 3 advanced questions on Intel® Hyper-Threading Technology:

    How is it implemented, under the covers?
    • Can I give one hardware thread priority or ensure that it doesn’t get “starved” for execution time?
    • What kind of performance benefit will I get from using Intel® Hyper-Threading Technology?

     

    What other questions do you have on the performance features of the new Nehalem-based processors?

    In this version of my VMWorld Chalk Talk series, we have Intel's Marco Righini, Virtualization Solutions Architect discussing new technologies for virtualization. Check out his video here.

     

    Last week I wrote about the server product update for the upcoming Nehalem-EX processor and the expandable platforms based on it.  Today I wanted to provide you with a short 10 minute video captured from the event.  It’s a really good summary for those of you that want to learn more about Intel’s Xeon product roadmap but with limited time.

    Also, as I mentioned earlier, look for some informative blogs over the next 1-2 weeks that will offer more of an in depth view of Nehalem-EX’s 4 Socket capabilities, performance, scalability, RAS, and Virtualization. 

    bryce

    I’ll not debate whether Cloud Computing is a passing fad, marketing hype, a revolution in computing, etc.; what I do know for a fact is that the interest in this model, from equipment vendors, service providers and end users is staying strong.  As much as Intel is reaching out into the industry to learn how people are hoping to take advantage of this phenomenon, what’s exciting to someone like me is that more and more service providers are approaching us on this is topic and seeking our input and guidance.  Service providers of various kinds are asking for Intel’s opinion and advice on how to prepare and evolve their data center architecture and practices to align with the expectations their customers have for cloud computing.  I’m not trying to brag, especially since it is obvious that there’s a ton of things “we” still need to figure out in this area; but when I see some of the giants in this community express appreciation of the contribution Intel is making, I can’t help but feel glad that we have done at least some of our homework right!

    So what’s a chip company doing that could be remotely interesting to service provides?  Aren’t these the guys whose job it is to abstract all the hardware?  Absolutely!  But service providers are realizing that their solutions are better delivered and their business models are more competitive when they have a deeper understanding of what the underlying hardware is capable of.  For example, many of the customers I work with tell me that they were unaware of the technologies enabled by our platforms to intelligently manage server power consumption, not just at the individual node level, but for the whole of the data center.  My colleagues at our customers are pleasantly surprised to learn how Intel is pushing the boundaries for virtualization deployment and in collaboration with the leading vendors of virtualization software is making the use of this foundational technology more efficient for cloud computing.

    There are many more topics I can add to this list, and service providers have a lot of places to go besides Intel for information.  But what I hear often from the customers I work with is that Intel’s ability to be an impartial (vendor neutral) technology advisor is most appreciated.  Of course not everyone is in a position to take advantage of the latest technology, nor does every new technology we enable serve everyone’s purpose.  But if you are a service provider interested in topics on data center optimization whether that be at: the cpu or chipset, the server, the software or the facilities, I’d encourage you to read up on our products and technologies found in this forum, and in other places on our intel.com sites.  And if there is something you need but can’t find, or need more information feel free to drop me note.

    Let me start by offering a big thanks to Intel for the invitation to join this group. It's a great opportunity to talk about how Intel’s virtualization technology directly impacts end-users by enabling ISVs like Parallels to deliver exciting new capabilities and solutions.


    The most recent example of this impact is Parallels Workstation 4.0 Extreme, available today through HP on the Z800 Workstation. Another milestone in desktop virtualization, Parallels Workstation Extreme is the first virtual workstation solution to enable the direct assignment of graphics and networking hardware resources to a virtual machine. Our underlying Parallels FastLane Architecture makes this possible, utilizing Intel virtualization technology such as Intel VT-x, VT-d, and Extended Page Tables (EPT).


    Parallels has long worked with Intel to bring performance and ease of use directly to virtualization users.  For example, Parallels has over two million desktop virtualization users already enjoying the power of Intel VT-x combined with our patented ease of use features such as Coherence and SmartSelect. These types of innovations show why Parallels is focused on working closely with market leaders such as Intel.


    Parallels Workstation 4.0 Extreme builds on this well-established path, introducing innovations such as near-native virtualized graphics performance using Intel’s VT-d technology as well as new ease of use innovations such as SmartMouse, allowing users to seamlessly move between multiple operating systems, each on its own display.

    SmartMouse.png


    Ah, but we're not done yet! There are many more great things to come, and together with strategic partners like Intel, we’ll continue to push the envelope and bring together new innovations in virtualization performance and ease of use to take virtualization to new markets.

    Intel and Emulex will be hosting a webinar on June 3 ant 9am PDT to discuss how Emulex adapters and Intel Xeon 5500 processor based servers can help manage server sprawl, lower capital & operating costs and enable deployment of larger virtual servers & increase the number of VMs per server.  During the webcast the speakers will discuss new technologies, share benchmark results and provide tips and tricks on how to supercharge your virtual server.

     

    Event Synopsis:

     

    Challenging economic conditions are driving requirements to optimize performance and reduce costs in the data center. Since a majority of IT costs are related to the number of servers deployed, it’s imperative that servers are selected which provide scalable performance, automated energy efficiency and superior virtualization ratios. The time is right to leverage new technologies from Emulex® and Intel® to drive critical IT initiatives.

    The webcast registration link can be found at http://www.emulex.com/company/events/webcasts.html and selecting “Next-Generation Server Technologies from Intel and Emulex”.

     

    Filter Blog

    By author:
    By date:
    By tag: