Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Authors > Benjamin Hacker

The Server Room Blog

13 Posts authored by: Benjamin Hacker
0

I’m a bit late in relaying my thoughts from Intel’s Developer Forum (IDF), but there was definitely some excitement around virtualization and high performance networking that I wanted to get the word out about!

In the past I’ve shared some details about SR-IOV and the advantages you can gain by being able to present virtual LAN hardware to each Virtual Machine (VM), effectively avoiding the Hypervisor when presenting virtual devices to each VM.  The advantage of being able to do this is clear:  The less interaction in the networking stack there is from the hypervisor, the less processing overhead is required for the system process the data.

That’s all good because if you have a dual 10 Gigabit adapter, you can segregate those two physical pipes into perhaps 16 virtual pipes that get exposed to 16 VMs.  By segregating these LAN pipes at the hardware level with SR-IOV instead of using Hypervisor switching, the performance gains in both CPU utilization as well as maximum total throughput can be very large.  There were several demos at IDF with various configurations, but reductions in CPU utilization of 40% were possible coupled with dramatic improvement in throughput!

But there is unfortunately one minor complication that I didn’t mention in my last post on the topic of SR-IOV.  There is the little fact that when VMs move between physical boxes (a usage that is highly desired and commonplace these days) you run into some problems with this SR-IOV capability.  When the hypervisor owned the network hardware abstraction, the performance was worse, but the functionality was better because you could seamlessly migrate from one box to another and the virtualization application would handle the details.  But with SR-IOV, a new layer needs to be added so that the direct hardware connection between the VM and the LAN hardware can be moved to a new box.

The really exciting part of IDF demos that I saw was the demonstration not just of the SR-IOV functionality on multiple hardware and virtualization configurations, but that these demonstrations also showed updated software from two virtualization vendors allowing mobility of the VMs while supporting SR-IOV! 

There was a demo on Dell systems showing this fully functional SR-IOV implementation with Citrix’s Virtualization suite.  There were two separate demonstrations on Dell systems, with VMWare displaying their new Network Plug-In Architecture (NPIA) solution that allows for the migration of SR-IOV connected VMs seamlessly between servers.

For those hungry for more detail, I’ve included the three SR-IOV demonstration videos here:

The first is the Citrix demonstration on Dell and Intel hardware of SR-IOV with VM mobility:

 

These next two are two videos are demos on Dell and Intel hardware with VMWare and their NPIA software implementation.

 

Each virtualization demo shows the massive performance benefits under various workloads when moving from Hypervisor based LAN segregation to SR-IOV implementation.  But most importantly, each demonstration proves out the capability to migrate VMs between physical hardware.  The only system hardware requirement is that the server itself supports VT-d.  If the networking hardware in the newly migrated-to box supports SR-IOV you get better performance, and if not, the solution falls back on the legacy Hypervisor virtualization.  Backwards compatibility is maintained!

I didn’t get firm details on when this full support for SR-IOV and migration will be available in Citrix and VMWare’s releases, but the demos looked pretty clean, and hopefully these suites will be available soon with this new functionality.  The LAN and Server hardware ecosystems are ready today, and it looks like the software vendors are just around the corner.  Virtualization momentum continues!

While virtualization was the big takeaway for me from IDF, there were also several other interesting demos for us networking hounds.  I’ve linked a couple videos of them below for anyone still thirsting for more of the latest networking technology and performance details!

The first video is a demonstration of Intel’s 82599 10 Gigabit Ethernet-based adapter card with Fiber Channel over Ethernet (FCoE) support.  Storage and Ethernet together at last!

The second video is a demonstration of Intel’s NetEffect 10 Gigabit Ethernet card publishing 1 million messages per second in a simulated NYSE floor trading scenario.  Oh yeah, only 35uS of latency.  That is fast.

So although I am two weeks after IDF, I hope some of you got a little taste of the networking excitement that took place.   Industry wide, hardware and software vendors alike are delivering ultra high performance low latency applications for the financial services industry, as well as mainstream performance increases for virtualization.  The performance and technology beat moves forward.  Exciting times!

--

Ben Hacker

0 Comments Permalink
0

Intel has just launched the Intel® Ethernet Server Adapter X520 family.  These NICs are Intel’s first 10 Gigabit adapter products that support “pluggable” optics.  This additional configuration option gives IT users a great deal more flexibility in how they deploy 10 Gigabit in their servers and datacenters.

 

The X520 Family of adapters support bailed optics that allow the removal or addition of different kinds of optics or support with no optics at all.  For previous 10 Gigabit products, if you wanted 10 Gigabit SR Fiber connectivity, you had to purchase a 10 Gigabit SR adapter.  But with the pluggable X520 adapter family, you can support SR, LR, or simply an SFP+ direct attach cable via the same card by simply removing / exchanging the optics.

 

 

X520Images.jpg

 

 

With the X520 you can still buy an SR or LR fiber configured adapter, but you can also switch back and forth after purchase by ordering only the new optics that you want to support (not a whole new adapter).  In the case of the Direct Attach adapter that supports an SFP+ cage, but comes without optics inserted, you can still use Twin-Ax copper cables to run in the rack less than 7m length runs of 10 Gigabit, but you can also upgrade the Direct Attach adapter later with SR or LR optics as the needs for the particular adapter may change. You can also mix and match optics modules in a dual-port adapter, meaning you could have an LR module in one port and an SR module in the other. You could also throw a Twinax cable into the mix.

 

The Intel® Ethernet optics modules for the X520 family of adapters also support both 1 Gigabit and 10 Gigabit speeds to help with backward compatibility – an industry first.

 

Finally, while this new pluggable capability of Intel 10 Gigabit adapters adds a bit more usage flexibility from an IT perspective, the performance capabilities and advanced features for the datacenter I’ve discussed over the past 18 months are also supported.  The X520 is based on the Intel® 82599 10 Gigabit Ethernet Controller, so the end result is a flexible product that can help unleash server IO performance whether FCoE, iSCSI, Virtualization, Security, or just raw IO performance.  Regardless of your 10 Gigabit needs, the X520 probably has what your Server environment needs.

 

--

Ben Hacker

0 Comments Permalink
0

I’ve spent a fair number of words in the past on the benefits of 10 Gigabit and what it means for the server market.  Through the addition of FCoE and DataCenter Ethernet as well as advanced virtualization features 10 Gigabit seems likely to have its big day in the sun here pretty soon.  But the question is still “When”?


While the proof is ultimately in the raw volumes of 10 Gigabit that ship, and the number of IT users who utilize the higher performance, there are some key reasons to think that 10 Gigabit momentum is accelerating beyond just the numbers* below:

10 Gigabit Forecast.JPG

 

Over the past year, there has been a raft of new 10 Gigabit switch announcements** from Cisco (Nexus 5k/7k), Arista (7100, 7124, and 7148), BNT (G8100), Extreme Networks (Summit X650) Juniper (EX8200), Voltaire (8500) and many others that have increased the choice, and the density of 10 Gigabit switches in the marketplace.   There are now many 48+ port 10 Gigabit switches available and even a few 200+ port models.  Also, the improved density and feature set of certain switches (such as Voltaire’s 280+ port 8500 series switch) provide a path for 10 Gigabit’s ascent into the clustering market by improving port density and latency for clustering applications.

 

Broad acceptance of SFP+ has also helped to drive a rapid improvement in price, density, and power.  SFP+ provides a smaller form factor standard for optics, as well as a standard connection methodology to connect directly from switch to NIC via a Twin-Ax copper (read: ‘low cost’) cabling solution inside the rack (up to 10m).  The widespread adoption of SFP+ form factors has dramatically reduced the entry level price points for switches, and through the ‘direct attach’ copper connection capability it has also reduced the overall cost for initial and ongoing deployments of 10 Gigabit by providing a lower cost bridge to optical or full 10GBase-T support.

 

There are also a few data points to suggest that the Server side cost for 10 Gigabit will also be dropping fast going forward.  As power for 10GBase-T continues to drop quickly, more and more Server vendors are looking at the options available to embedded 10 Gigabit directly into their systems.  This will not likely be a 2009 story, but it is approaching quickly.  Additionally, the acceptance of SFP+ form factors for optics/direct attach cabling has provided a path that some Server vendors may use to design 10 Gigabit down on motherboards without adding the extra cost and power of a 10GBase-T solution.  This looks like a likely near term given that the solution power and design are robust and ready for motherboard based designs today.

 

Finally, the continued cost reduction provides an attractive long term value of standards based 10 Gigabit Ethernet.  There is clear indication downward pressure on 10GbE prices already present today.  We will see 10 Gigabit pricing follow a similar price curve as we saw with Single Gigabit.  This is evidenced in the recent pricing announcement where Intel reduced the cost of single port 10GBASE-T adapter 40% from $999 to $599.  The competitive economics of standards based hardware will continue to drive down 10 Gigabit prices even further and we will see 10GBASE-T pricing below the $500 / port price in the near future. Once it gets on the motherboard, prices will drop even further.


Overall, the power, density, latency, and cost of 10 Gigabit are all improving at a rapid rate.  Form factor flexibility coupled with a wide array of switch and NIC vendors in the marketplace will provide choice and low cost for IT departments while virtualization and convergence in the datacenter and elsewhere continue to provide demands for ever greater I/O bandwidth and performance.

 

--

Ben Hacker

 

 

 

* Del’Oro Forecasts as of Q1 ‘09

0 Comments Permalink
0

Manageability, security, and performance are always hot topics in the computing world. At times the focus shifts between them as needs and technologies change, but these areas have remained key vectors of enterprise computing for a long time. However, in many cases these usability vectors conflict with each other. IT managers’ desire for security and manageability may lead to extra applications and process hoops for end users, which can decrease performance. Increasing the ability to remotely and seamlessly manage a pc almost always adds security headaches that must be dealt with. Enterprise IT design is always about finding the right tradeoffs and improving the process over time.

 

One technology that has been around for quite a while to help improve security is IPsec (aka, IP Security). IPsec is a set of protocols for securing and authenticating IP packets by encrypting their contents in an end-to-end manner. Most people are familiar with IPsec as the underlying technology for facilitating Virtual Private Network (VPN) connections from the outside of an organization’s LAN to inside the network. IPsec secures the Internet to Intranet tunnel in this case.

 

Using IPsec to set up a VPN can be a bit of a pain because you have to key in an access code or password and it’s far from seamless. On the IT manager’s side, this setup does not eliminate security problems because the VPN tunnel only secures the network pipe once it is established. There is nothing stopping the end user from browsing the web on their work computer or somehow exposing it to a virus before connecting to the corporate network in a secured way. This has a few downsides from a manageability perspective. First, the security is compromised because of potential infections transferred from an insecure network to the corporate network due to lack of continuously active protection. Second, the manageability of this solution is lacking because enterprise systems outside of the corporate network are not manageable until the user manually connects to the VPN gateway.

 

So while using IPsec to help create a VPN connection provides functionality that is secure and provides outside-in access to the corporate network, it requires additional configuration by the end user, is not seamless for either user or administrator, and is generally provided by an additional application running on the system. This is all non-optimal.

 

Enter Microsoft* DirectAccess*. In Windows* Server 2008 r2 for servers and Windows* 7* for clients Microsoft* will be supporting a seamless IPsec support layer called DirectAccess*. What this will provide is the ability to integrate the encryption/authentication of IPsec directly into the Operating System so the end user connects securely outside and inside the corporate network to the systems and applications they need via IPsec. Because this is integrated into the OS, the set up of the security and connection details are more seamless from both an IT person and end user perspective. Initial configuration is obviously required, and each IT organization must set up the security policies to their own specifications, but once that is done the system is up and running.

 

Microsoft*’s implementation of this functionality at the OS level, so each application can have its own secure IPsec tunnel. This can provide secure access both outside and inside of the corporate network. Up until recently, using IPsec internally has not been of much focus, but recent estimates suggest 80% of successful attacks come from internal threats, so encrypting and authenticating internal data is now in focus for IT administrators. Microsoft* DirectAccess* allows for this new seamless security model.

 

Now this all sounds well and good… but what’s the catch? Well, a key angle here to note is that IPsec is a highly CPU intensive technology. Encryption and decryption of IP packets in real time can easily swamp a CPU core when attempting to push much more than a few hundred megabits of network data. For a typical end user system, a few megabits of data across a few IPsec connection applications will likely not cause much heartache, but for network servers that are hosting potentially thousands of simultaneous IPsec connections while trying to drive multiple Gigabits of I/O the performance results will be much more… uhh, what’s a nice way to say ‘unimpressive’?

 

In order to solve this issue, Intel networking products offload the computationally expensive encryption engine (AES-128) onto the LAN Controller while the IPsec configuration, management, policy creations etc all remain in the OS to keep usability simple. Intel offers both dual port 1 and 10 Gigabit networking solutions that support not only solid performance on standard networking workloads and advanced virtualization features, but also the ability to offload IPsec in hardware to improve system performance under large IPsec I/O workloads.

 

For companies looking to enable IPsec into their network environment using DirectAccess*, they have the potential to improve security, reduce complexity, and enhance manageability of their end clients. They just need to remember that in order to make this all work seamlessly on the server side without choking off processing performance, offloading the IPsec workloads to I/O hardware will be a requirement.

 

Intel® Ethernet® can deliver this support in adapter or down on motherboard form factors while supporting a wide range of Enterprise class performance and virtualization features. So is this a way to improve security and manageability without impacting performance? It seems that way to me.

-----


 

Ben Hacker

For more information on DirectAccess* -- http://www.microsoft.com/servers/directaccess.mspx

0 Comments Permalink
2

Sometimes the next step up is a big one. The Intel® Xeon® processor 5500 series (formerly codenamed “Nehalem”) is one of those kinds of steps.

Over the last few years 10 Gigabit has started to take off, but there have always been some negative mutterings: “Why do I need 10 Gigabit?”, “Why do we need this much bandwidth?” or “My server can’t support 10 Gigabit per second bidirectional traffic anyway.” Despite the volume of 10 Gigabit products shipped, there is still the reality that if you intend to use the entire 20 Gbps (10G both directions) or heaven forbid you try to use 40 Gbps with a dual port product; you will likely be sorely disappointed with the results.

 

The reason for this is simple. Most current mainstream servers and 10 Gigabit products don’t support the intense usage models needed to drive that much network I/O and they also don’t have the memory architecture to unleash the full potential of dual 10 Gigabit links.

 

Luckily, that all just changed with Intel® Xeon® processor 5500 series.

 

In addition to the great processing improvements that the Intel® Xeon® processor 5500 series brings to the table, Intel has also introduced our third generation 10 Gigabit product, the Intel® 82599 10 Gigabit Ethernet Controller which provides two ports, and new capabilities and enhancement to the 10 Gigabit product landscape that help unshackle the new processor from its predecessor’s network I/O handcuffs and unleashes blazing performance in a variety of usage models. These improvements, coupled with the new architecture of the Xeon 5500 provide a symbiotic processor-networking combination that makes new usages possible and expands server and datacenter computing by a big leap… not just a baby step.

 

One of the key changes with Intel® Xeon® processor 5500 series architecture is a step function improvement in the internal system I/O. The new local memory controller design, faster cache architecture, and support for DDR3 help push Xeon 5500 to be able to support peak memory bandwidth of ~32 Gigabytes, per socket. In a dual socket system this provides for ~64 Gigabytes of bandwidth which is dramatically more than the previous generation server configuration. In addition, the new Intel® QuickPath Interconnect (Intel® QPI) improves the speed both for inter-Processor communication as well as a faster path to the I/O hub. Finally, PCI Express* 2.0 I/O Bus support has been added to improve the entire data path from Processor to the 10 Gigabit Ethernet link.

 

Taken together, the above improvements are a performance game changer for 10 Gigabit Ethernet.

 

The chart below** shows the previous generation Intel® Xeon® paired with the previous generation Intel 10 Gigabit Ethernet Controller compared to the latest platform using the newest Intel silicon for both processor and networking. Not only is the performance better in 1-4 port configurations, but the performance scales dramatically better to above 50 Gigabits per second of total LAN throughput in a four port configuration vs. *just* 17 Gigabits on the previous generation! A complete platform architecture solution makes this huge improvement possible.

82599 + Xeon 5500.jpg

Now, it’s great that Intel® Xeon® processor 5500 series coupled with the Intel® 82599 10 Gigabit Ethernet can deliver such raw performance, but there is the forever nagging question of usage model. Luckily, the new headroom breathes new life into both Virtualization and storage over Ethernet usages (both of which I’ve talked about here and here) and provides new opportunities to more efficiently utilize your network link.

 

Intel® Xeon® processor 5500 series allows the vision of consolidation in the datacenter to scale new heights, increasing the number of Virtual Machines (VM) that can effectively live inside a single system enclosure. Each incremental VM will add additional network I/O that is already starting to exhaust a 4 or 8 port single gigabit interface configuration with today’s server capabilities. As more VMs are added in the Xeon 5500 generation, 10 Gigabit will no longer be seen as optional; it will be required. For its part, the Intel® 82599 10 Gigabit Ethernet Controller supports Intel® Virtualization Technology for Connectivity (Intel® VT-c) to improve overall system performance in virtualized server environments. Intel VT- c includes hardware optimizations that help reduce I/O bottlenecks, boost throughput and reduce latency. Components of Intel VT-c include VMDq, and VMDc. VMDc consists of SR-IOV which I’ve covered before, and the ability to support VM mobility; a critical usage model for modern a IT deployment. All together, server systems can support more VMs, more throughput, more flexibility and better performance in a datacenter environment.

 

Finally, the additional capabilities of the Intel® 82599 10 Gigabit Ethernet Controller product surrounding support for FCoE offloads and full support for the new Data Center Bridging (DCB) standards provide an opportunity for storage convergence over Ethernet in either a datacenter using a Fiber Channel SAN environment or an IT environment more focused on iSCSI. On the performance side of things, iSCSI acceleration along with FCoE data path offloads are supported in the Ethernet controller, and on the processor there is support for the CRC instruction set which insures iSCSI data integrity while minimizing processor overhead.

 

The ability to converge at least part of the additional storage infrastructure onto Ethernet is just another factor driving massively increased data rates over Ethernet… luckily, the Intel® Xeon® processor 5500 series and the Intel® 82599 10 Gigabit Ethernet Controller solutions are up to the task.

Over the past few days, there has been a lot of noise around Intel® Xeon® processor 5500 series and the many other platform components that help it shine its brightest. Improved processing power, memory controller bandwidth, faster and redesigned FSB, and improved 10 Gigabit networking all converge together to provide a fantastic performance, convergence, scalability and power story. Intel’s strong history in the server and processor markets, coupled with over 25 years in Ethernet makes this latest release a natural evolution of technology. Together these capabilities, along with the improved 10 Gigabit features and performance, are helping to transform the datacenter. It will be denser, more power efficient, more performant, and more consolidated with capabilities like FCoE and iSCSI.

 

As for “Why do I need 10 Gigabit?” We have the answer, and it’s the new Xeon®.

 

Ben Hacker

-----

** Source. Intel. Mar 2009. Up to 2.5x performance compared to Intel® Xeon® processor 5300 series. Performance result of a bandwidth intensive network benchmark (IxChariot). Network throughput was measured on 64KB I/O size transfers between the test system and multiple network targets. Intel pre-production system with two Quad-Core Intel® Xeon® processor 5500 series CPUs (2.93 GHz), 12 GB memory (6 x2GB DDR3 - 1066MHz) vs. Intel Production system with two Intel® Xeon® processors X5365 (3.0GHz, 1333MHz FSB), 8 GB memory (8 x 1 GB DDR 2 - 667). Windows Server 2008, stock unmodified installation.

2 Comments Permalink
1

The storage world is getting quite a bit of press lately. As IT organizations seek to reduce costs, streamline their management tasks, save power, increase capacity and access to their data and improve security, new types of storage solutions are being introduced continuously The growth rate of storage revenue is blistering, and the number of companies scrambling for the piece of the pie is impressive.

 

As with any hot area of technology, there are a lot of technological developments to keep track of, and of course there are disputes about which new trends will maintain dominance, and which technologies will end up carrying the day. I’ve writtenin the past about some of the potential improvements coming along with the introduction new Data Center Ethernet (DCE) [aka, Data Center Bridging (DCB)] IEEE standard improvements. These layer 2 improvements coupled with Fiber Channel over Ethernet (FCoE) can bring to the storage world a converged story for networking. However, noticeably missing from my previous post was a discussion of iSCSI. The iSCSI standard has been around since 2003, and many vendors including Intel have been selling adapters with various levels of support for iSCSI for quite some time. Microsoft has an iSCSI initiator built into their server OS stack, and VMWare also supports iSCSI in their virtualized system environments. This isn’t a comprehensive list, but just a summary to show that iSCSI is ‘real’.

 

So the question is how does FCoE fit in with iSCSI? Are they competing technologies, or complimentary, or neither? Did I leave iSCSI out of my earlier discussion because I believe the technology is doomed? Hardly…

 

The basic case for FCoE is pretty straight forward. If you have an FC SAN in your datacenter, as you roll out a new rack (for instance) you can eliminate the top-of-rack FC switch and also the need for two adapters in each of servers (one for Ethernet, and one for FC). There are additional potential savings as you move FCoE further and further into the network, but I think to keep the discussion simple, the key thing to note is “if you have an FC SAN…”; then you care about FCoE and can potentially gain from it. If you don’t, well then FCoE will not necessarily be your technology of choice. FCoE is not an Ethernet capability that will drive you to use Fiber Channel, but it may make expanding and growing your FC network cheaper, lower power, and more convenient, especially with virtualized blade servers.

 

To elaborate, today the iSCSI SAN market is growing quickly, but it is growing in areas where FC is not generally a popular solution. It is usually chosen where storage over IP on existing Ethernet was viewed as acceptable given the cost tradeoffs. In contrast, FC is deployed where key IT knowledge, and its higher cost hurdles are still overcome by its performance and high reliability; this is generally only the case in fairly large datacenters. iSCSI on the other hand tends to be used in smaller enterprises and potentially in Tier 2/3 datacenters. The question of iSCSI or FC/FCoE is likely not getting decided by the network card in the host initiator. It is decided by organizational needs, and the cost structure / ROI of the storage target deployment including IT personal / expertise. If an organization has FC deployed, FCoE makes sense. If you are designing a new datacenter, FCoE isn’t exactly going to drive your decision (although you may still want to use it). The better way to consider FCoE is that it is another storage over Ethernet option, specifically for FC (to add to iSCSI and NAS). So decide which is the best storage system solution for your Data Center and know that you can connect any of these solutions over Ethernet to your servers.

 

Finally, as the improvements in Ethernet proliferate and these DCE capabilities become mainstream the landscape may change for iSCSI and FC deployments, but the simple fact is that in most cases, the main commonality is that Ethernet is chosen fabric to deliver storage over the network. Whether the storage protocol that is used is iSCSI, or FCoE will be determined by a lot of things within an individual companies’ IT shop. The idea that iSCSI and FCoE are a pair of competitive technologies has been raised in various places around the tech industry, but in reality, these arguments are driven more by people who have some bias toward one solution or the other, it is not fundamentally driven by the end user demands for storage. Both technologies have their place, and both will be supported in varying ways by Intel as well as other ecosystem vendors in the future.

 

So iSCSI or FCoE? I think the answer is: Both.

 

As we move to 10 Gigabit networking, IT decision makers will have something great with both of these capabilities; convergence, and choice. Only Ethernet can deliver this.

 

Ben Hacker

1 Comments Permalink
0

It is inevitable that as processing power increases, and cores multiply on servers, that new bottlenecks will be found. If you make a processor that can do more, you will increase your system performance until you find where the next area of focus is. I’ve talked on several occasions about the system bottlenecks that are now apparent by using the latest Xeon processors or when a many virtual machines are loaded onto a single physical box, and what can be done to improve them. As you may have noticed, a key laggard that will frequently be found in modern multi-processor servers is the network connection. Moving to 10 Gigabit clearly alleviates this issue in many high horsepower systems, and the addition of advanced features like VMDq and SR-IOV will continue to push the envelope on Network I/O in virtualized environments.

 

However, it is clear that for some applications, especially in the HPC market and certain applications in the Financial Service industry, an even more performing I/O solution than standard 10 Gigabit will be needed when latency is of the utmost importance. A solution that has existed in this space for a few years, but is gaining momentum is what is referred to as Remote Direct Memory Access (RDMA), which lets one server directly place information into the memory of another server; essentially bypassing the kernel and networking software stack.

 

At first blush, this solution may sound odd. Why bypass the entire stack; doesn’t this complicate things? Well, it certainly does add some complications by requiring a modified OS stack and support on both sides of the network link, but there are some telling details about where real world latencies come from that make this methodology attractive in certain circumstances.

 

If you look at the typical breakdown of CPU utilization in the context of processing networking traffic, you see the workload consumed by buffer copies, application context switching and some TCP/IP processing. If you look at this (albeit, in a simplified way) visually, there is a vertical stack of tasks that need to take place in the server:

iWARP Stack Before1.JPG

There are application buffer copies from the app to kernel which then get handled by the NIC. Additionally, there is the TCP/IP processing that takes place within the OS and is a large consumer of CPU cycles, and there are also I/O commands that add additional latencies into the communication process.

So the question is what to do to help reduce these latencies? RDMA has been adapted for standard Ethernet via the IETF Internet Wide Area RDMA Protocol (iWARP). Until the iWARP specification, RDMA had been a capability only seen using Infiniband networking. By porting the goodness of RDMA to Ethernet, iWARP offers the promise of ultra low latency, but with all the benefits of standard Ethernet.


With Intel’s acquisition in October of NetEffect, we now have one of the leading iWARP product lines for 10 Gigabit Ethernet. Within these products, the iWARP processing engine can help eliminate some of the key bottlenecks described above, and provide very low latency and high performance networking for even the most demanding HPC applications.


The first item that an iWARP engine can address is to help offload some of the TCP processing task which can bog down processing power as bandwidth loads increase. The Intel NetEffect iWARP solution can offload this TCP processing by handling the sequencing, payload reassembly, and buffer management in dedicated hardware.


The next item that iWARP addresses is the extra copies that need to be done by the system when transferring data. iWARP extensions for RDMA and Direct Data Placement (DDP) allow the iWARP engine to tag the data with the necessary application buffer information and place the payload directly in the target Server’s memory. This eliminates the delays associated with memory copies by moving to a so called ‘zero copy’ model.


Finally, iWARP extensions also implement user-level direct access which allows a user-space application to post commands directly to the network adapter without having to make latency consuming calls to the OS for requests. This along with the other pieces of iWARP provides dramatically reduced latency and increased performance.


The diagram below summarizes what the new system stack looks like after the implementation of iWARP. Much simpler and which much lower latency.


iWARP Stack After2.JPG



One obvious issue that is raised after thinking about the above diagram is what modifications need to be made at the OS or application level. Clearly applications need to be modified to be iWARP compatible and this can be a time consuming process. This is one of the reasons that this solution has been slow to gain adoption. However, there is an Open Fabrics Alliance (OFA) which is working on a unified stack for RDMA for the open source community. The OFA has an Open Fabrics Enterprise Distribution (OFED) release which is a set of libraries and code that can unify different solutions that use RDMA. There have been several OFED releases so far and further plans are coming to align and expand various RDMA capabilities. In this way, applications that run using the OFED stack under Infiniband can be run without any changes over iWARP and Ethernet.


As more applications get modified to support the feature set of iWARP RDMA, there will be a wider understanding and acceptance in the HPC community of the incremental performance, cost, and standards advantages of using Ethernet with RDMA for the most performance sensitive applications. Moving from standard non-iWARP Ethernet to iWARP enabled Ethernet provide more bounded latency reduction from ~14 us to <<10 us… now that is fast.


We live in exciting times!


Ben Hacker

--

For those looking for some more detail, there is a nice whitepaper on iWARP located here.

For those interested in learning more about the Open Fabrics Alliance (OFA) please see here.

0 Comments Permalink
0

In my last I/O Virtualization blog, earlier this year, I discussed a fundamental problem with virtualizing I/O and one the solution that Intel and VMware have teamed up to deliver - VMDq and VMware NetQueue. These queuing technologies together can help to offload some of the virtual switching (vswitch) functionality to the network adapter from the hypervisor. VMDq provides a method for the Hypervisor to do less work, and also provides a way to share the I/O processing across multiple cores; improving system bandwidth and more fully utilizing its processing power.

 

Now, VMDq and NetQueue are a great solution together that scale well, support Vmotion, and are relatively simple to manage. However, is there a way to get even better performance from your Virtualized I/O?

 

What if there was a way to completely cut the Hypervisor software switch out of the picture and remove the associated latency and CPU overhead? The ideal scenario for optimum performance is for the VM to communicate directly with LAN hardware itself, and bypass the vswitch completely. For example, you could have a single 10 Gigabit port expose multiple LAN interfaces at the hardware level (on the PCI-e bus), and each VM could be assigned directly to a hardware interface. Alternatively, you have multiple physical NICs in the system that could be directly assigned to a given VM. Below is a diagram that summarizes the 3 main variations of attachment for I/O in a virtualized server. Below we will get into more detail to put the diagram in context.

 

 

 

 

In the diagram above, the left side represents an implementation of a virtualized environment with a standard I/O setup using the Hypervisor vswitch and VMDq for I/O performance enhancement. In the middle is an example of direct I/O assignment between a single physical LAN interface and a single Virtual Machine. The implementation on the right is showing what is possible with a single NIC that supports SR-IOV (we'll discuss this later) for a fuller, hardware level I/O virtualization. After taking a moment to understand the basic differences in these three implementations, there are immediately a few obvious benefits here for bypassing the Hypervisor vswitch and going with either of the two directly assigned designs...

 

 

By allowing the Virtual Machines to talk directly to the networking hardware, throughput, latency, and CPU utilization of the I/O traffic processing will be greatly improved. So the question is, "why hasn't this been done before?" Well, the answer is that there are several gotchas to make this implementation work well...

 

 

First, in order to implement this properly, the LAN hardware needs to support some physical capabilities to successfully route the networking traffic in this kind of virtualized system. In addition to all of the above the actual server hardware itself must also support VT-d so that the memory mapping between the Virtual Machine PCI-e memory space and the systems physical memory space are correlated correctly. Also, the actual system itself must also support VT-d so that the memory mapping between the Virtual Machine (I/O data memory address) and the systems physical memory address are correlated correctly.

 

 

Finally, and this is a big one, this kind of implementation while very good for performance just happens to break the ability to move a VM from one physical server to another (VMware Vmotion). This is one of the more widely used aspects of VMware's software that has been utilized heavily by most IT shops. Seamless vmotion support is critical for making any I/O performance improvement deployable in the real world.

 

 

Now, if you stop at the 2nd diagram, and use separate NICs for each VM, you will also miss out on a few key advantages of new Ethernet capabilities. You won't be able to allocate your overall bandwidth between your VMs (each VM will get a single Gig or 10Gig port), and more importantly, you won't be able to effectively share higher bandwidth pipes. For example, a server with a few 10 Gigabit ports may have enough I/O horse power to handle traffic for 30 VMs, but there would be no way to assign only a portion of the bandwidth of the pipe to an individual VM.

 

 

Additionally, the LAN hardware needs to support the ability for each virtual function of the LAN device to be able to support bandwidth segregation (think QoS per VM) and the ability to support multiple queues and traffic classes per LAN virtual function. This last piece is necessary for those who remember the discussion on Fiber Channel over Ethernet (FCoE), as the ability to support multiple traffic classes, and dedicated bandwidth links, are key needs for the storage over Ethernet market.

 

 

Now that I've set up what is needed to make this directly assigned virtualized I/O environment work, and called out the potential problems, you don't need to worry; I won't throw cold water on this idea. In fact, most of the pieces are in place today and there is already work being done to complete the solution as we speak.

 

 

First, Intel network adapters now support some fancy hardware capabilities related to virtualization. In addition to all the hooks for VMDq, our newest NICs support PCI-SIG SR-IOV (I know... technologist love acronyms) which provides the ability to virtualize the LAN at the lowest hardware level. The networking hardware also supports some smart logic to be able to function properly in a virtualized system. For example, VM to VM communication in the same server must be looped back before it gets to the wire or the switch connected to the machine won't know how to route the packet. This is all taken care of in the LAN hardware. And of course, all the support for bandwidth segregation, and support for multiple queues and traffic classes is there as well to make sure Storage and other QoS sensitive applications are still going to work well.

 

 

As for VT-d support, Intel platforms now come with this basically standard, so there is no issue there. But the last most important piece is the ability for an individual VM to be moved between physical servers while still being able to ‘renegotiate' with its physical network connection. The ability to do this is under development by Intel, VMware and others in the industry, and the end goal is to have an architectural framework in place to make this kind of handoff seamless from a hardware and software perspective.

 

 

This architectural framework will be the topic of a future post, as I think I've used up all the lines I can before I start putting my readers to sleep. Until next time!

 

 

Ben Hacker

 

 

0 Comments Permalink
1

If there is one thing that has stayed consistent in the computing industry over time, it's that performance doesn't stand still. As our computing platform processing, I/O, and memory speeds continue to accelerate, it is important to remember a little thing called latency.

 

Often in the Ethernet world throughput is the 1st and last performance metric of choice. 1 Gigabit and 10 Gigabit are the numbers that inspire thoughts of increased performance, and improved computing power. However, it's important to note that, in many applications, the transaction latency over the wire is really the key to unlocking high performance at the system level. One of the primary reasons that some organizations have turned to Infiniband and other I/O technologies for HPC and clustering in the past has to do with their desire to achieve very low latencies, not necessarily increased throughput. If you look at a historical standard Gigabit Ethernet connection, you may see latencies that are around 125μs. This may have been ok in the past, but as improvements at the application level as well in the system hardware and CPU take hold, legacy Ethernet won't be good enough for HPC and clustering environments.

 

 

The interesting, and often overlooked fact with Ethernet is that the latency characteristics are improving as the industry moves from 1 Gigabit to 10 Gigabit. The faster throughput on the wire comes along with lower latency to some extent, but in addition, there have been several improvements in interrupt handling that drastically improve overall latencies when comparing legacy 1Gigabit to 10Gigabit. With a basic 1st generation Intel® 10Gigabit CX4 card you can now see latencies approach 25μs without any special tuning.

 

 

What's even better is that Intel's 10 Gigabit networking silicon also has further enhancements for improving latency by introducing some new specialized Low Latency Interrupt (LLI) filters in the silicon. These filters provide the hardware with a quicker reaction time to network packets that meet certain customizable criteria. The filters can be tuned to have a rapid response to certain packet and traffic types. With these kinds of LLI filters in place, latencies can be reduced further by another ~50% to ~14μs.

 

 

Going forward with 10 Gigabit there are new technologies and designs that can help push latency even lower to the sub-10μs threshold to keep Ethernet very competitive as a fabric not only from a cost and throughput perspective, but also from the perspective of latency.

 

 

And while lower latency is certainly important, the last piece that was really missing from the Ethernet performance puzzle was not just low latency, but deterministically low latency. The key is that the worst case packet latencies for many applications are relevant and very important. By application thread affinitization, the individual data thread can be piped directly between a network queue and a CPU core. By more evenly distributing the networking workload between CPU cores in a predictable fashion, you get a deterministic kind of latency that does not stray far from the average assuming CPU cores do not get oversubscribed. Average latency of ~14μs is good, but the fact that you can get this with reasonable determinism is a key for many applications and usages.

 

 

Now, lower, deterministic latency is not just a theoretical benefit for certain niche applications. Decreasing latency and improving overall latency characteristics while increasing throughput directly benefits the transaction rates that can be achieved with real world applications. As an example of the improved performance is the latest Reuter Market Data Systems (RMDS) benchmarks done by STACResearch on the 4-way Intel® Xeon E7450 (Dunnington) using the Intel® 82598EB 10 Gigabit AT Dual Port networking adapter. The testing showed the Highest Point-to-Point Server throughput to date on a single server in testing done by STAC. And total updates per second reached over 15 million. Financial Service industry administrators: I can see you drooling...

 

 

Latency and throughput numbers are great to talk about, but at the end of the day, real world application performance on real systems is the key. While there will always be a small subset of the high end server market that needs the absolute lowest latencies provided by Infiniband; 10 Gigabit Ethernet is gaining ground while maintaining its place as the default fabric of choice for multiple applications and traffic types. I believe the best is yet to come as newer, faster, and more responsive technologies continue to roll out.

 

 

Ben Hacker

1 Comments Permalink
4

Ethernet has been around a long time. It is a highly reliable and trusted means for interconnecting computing nodes, and above that, it has generally been the most commoditize (read: lowest cost) form of interconnect for quite some time. Broad deployment, administrator trust, and low cost have kept Ethernet as the mainstream fabric for LAN traffic for a long time.

 

However, despite Ethernet's strong connectivity credentials, it still comes up short in certain applications. Ethernet is what is referred to as a ‘best effort' network. This simply means that in the real world, you will generally get pretty good performance (throughput, latency, lack of dropped packets, etc), but from time to time when there is congestion, packet drops and performance degradation can be quite a nuisance. For many applications, this doesn't matter. If you are using email, browsing the web, or transferring files to a shared drive, the only thing you will notice is a decrease in performance, but everything will still ‘work', and transfer properly. For some applications like storage though, this non-deterministic performance is unacceptable. If packets are dropped, or arrive out of order, storage applications have a nasty tendency to hang or crash.

 

Because of this limitation of the standard, there have been separate fabrics used for Storage Area Networks (SANs) for quite a while. One of the main fabrics developed and used for high performance SANs is known as Fiber Channel. In order to create a Fiber Channel network, a server and storage target need to support a Fiber Channel Host Bus Adapter (FC HBA) to communicate via the Fiber Channel protocol. In addition, the switches that connect the Fiber Channel infrastructure must also be dedicated Fiber Channel switches; a standard Ethernet router cannot be used.

 

Once in place, this SAN architecture provides a very high performance, high reliability network that is ideal (and required) for high end storage traffic, but it comes at a cost:

 

1) Fiber Channel HBAs are generally more expensive than their Ethernet counterparts.

 

2) You have to have a separate fabric in your network which also adds to your infrastructure (switch costs, and cabling costs) as well as complicates IT management.

 

3) Servers connected to the SAN now need to have an Ethernet adapter AND a Fiber Channel adapter.

 

The upside to the additional cost and complexity is of course better performance, but the question has always been "Is there a better way?"

 

I believe there is a better way, and that Fiber Channel over Ethernet (FCoE) (and importantly, the standards in IEEE that are making it possible) seems to be the logical path to solve the issue of performance on lossless performance on Ethernet, while maintaining Ethernets historical core cost advantages.

 

‘Best Effort' is not good enough:

The bottom line for today's Ethernet is that it simply can't provide the ‘lossless' behavior that storage traffic demands; but this fact is changing. Below I will summarize at a high level some of the standards being developed in IEEE to improve the performance of Ethernet for storage applications, and how they help to mend some of the issues with Ethernet and how that helps to enable FCoE.

 

Bandwidth Sharing, Priority Flow Control and Pause:

This capability offers a method to assign priorities to different Ethernet traffic classes. From there, when congestion becomes an issue, traffic can be ‘paused' on a per-priority basis; allowing the lower priority traffic to be halted temporarily while keeping the top priority traffic like storage running smoothly. This per-priority pause capability is really the first basic step in allowing Ethernet to provide some ‘QoS like' Layer 2 guarantees.

 

Congestion Notification (or Backward Congestion Notification):

In addition to simply pausing individual low priority streams of traffic, congestion notification allows for a communication method to go upstream from the node and notify the offending traffic generator to throttle back its traffic and re-route as necessary. This capability is a key to the longer term development of FCoE because with only the pause capability the congestion is really just pushed up a single node in the network. In order to support FCoE storage across multiple nodes in a network, congestion notification is needed.

 

Shortest Path Bridging:

This capability is really an optimization for inter-node routing that defines the path within the network between switches. Using traditional spanning tree path algorithms will sometimes result in paths in the network that are non-optimal and incompatible with high performance storage traffic. A new algorithm to determine the shortest path between nodes will help to enable both less congestion in the network as well as fast delivery of critical packets for storage.

 

DCB Capability Exchange Protocol (DCBX):

This capability goes by several different names depending on who you talk to, but essentially what it will provide is the ability for switches on the network to exchange their capability sets with other nodes of the network. This allows for each switch to understand what others switches near it can use the Congestion Notification, Flow Control, or other features need to support this ‘Lossless Ethernet' capability.

 

While the list above is not meant to be all inclusive of all the new IEEE development under way for this new ‘Lossless Ethernet' initiative, it should provide a good overview of the general push taking place and how the goal of getting to near lossless performance is going to be accomplished.

 

Weren't we talking about Fiber Channel?

Astute users will realize that I haven't really addressed the Fiber Channel piece of this. The above features I described only allow for Ethernet to carry certain kinds of traffic (like Fiber Channel) that require very high reliability and performance; but how do you get the Fiber Channel data onto an Ethernet frame?

 

In today's environment, a Fiber channel initiator on a Server system will place Fiber Channel data onto an FC HBA to send over the SAN to a storage target. All of this data is transmitted over a fiber channel network. Under the FCoE model, what you will need is a Server system that has an FCoE initiator, and on the target side, the switch connected to the target must be able to convert the data from storage target and encapsulate it into Ethernet. Beyond that, the data is transmitted over the Ethernet fabric as normal, but the features that I described above allow for the performance of Ethernet to allow a Fiber Channel application stack to function properly.

 

This is certainly a capability that Intel has been supportive of. Ethernet is a critical piece of the computing platform, and FCoE provides a potential improvement for datacenter and storage network design. By consolidating the Fiber Channel data onto a single Ethernet wire, end user IT houses can also see several benefits:

 

1) Reduced the need for two physical network cards in each server. Now, a single NIC will connect to the SAN and to the normal TCP/IP data network.

 

2) Along with the consolidation in network cards, you also save in terms of cabling. One 10 Gigabit link can replace the old Fiber Channel fiber link and Ethernet links.

 

3) Reduces power consumption and cooling

 

4) The commoditized and low cost nature of Ethernet provides additional benefit by converging system I/O onto what will likely be the lowest cost interface over the coming years; 10 Gigabit.

 

In summary, FCoE may be in its infancy, but the standards in final, or in process. Products are available today, and the value proposition in here. Further performance improvements and cost reductions and the proliferation of 10 Gigabit networks, as well as more choices in the future, will only further the support and interest in Fiber Channel over Ethernet in datacenter SAN applications.

 

 

 

~ Ben Hacker

 

 

 

 

Links for further information:

http://ieee802.org/

http://www.open-fcoe.org/

4 Comments Permalink
8

Sometimes you get so deep into something that you don't realize how crazy it is until you take a step back. Like most technology companies, Intel has an inherent love for acronyms. The cacophony of standards bodies, advanced technologies, and intense rates of change in our industry necessitates the use of abbreviation just to be able to communicate clearly and briefly. However, while I am at least as much of a techno-phyliac as most of the folks in the technology jungle, even I sometimes run into an acronym wall. I thought to help myself and others it might be a good idea to decode one of the newer sets of network technologies that I work closely with and to decipher some of the associated names and acronyms that come along with it.

 

10 Gigabit Ethernet: It's here, it's real, and it's growing fast_._

 

Ethernet (IEEE 802.x) has evolved over the years from a new standard linking computers together at slow rates and has moved from 10 Megabit per second (Mbps), to 100Mbps, to 1 Gigabit per second (Gbps), and a few years ago to 10GbE unidirectional throughput. Over time there have been several physical connection types for Ethernet. The most common is copper (Cat 3/4/5/6/7 cabling is used as the physical medium) but Fiber has also been prevalent as well as some other more esoteric (such as BNC Coax) physical media types. The most common 10GbE adapter (until very recently) has been Optical only due the difficulty of making 10GbE function properly over copper cabling.

 

But this post isn't meant to discuss the past, but more to decode the present and future as it relates to 10Gig Ethernet and the variety of flavors that are available. Below I'll cover a number of acronyms for 10GbE IEEE standards that are often lumped together as '10 Gigabit' and discuss some of the differences and usages for each. After that, I'll also try to clear up some of the confusion about ‘form factor' standards for optical modules (which are separate from IEEE) and some of terms and technologies in that realm:

 

 

10GBase-T (aka: IEEE 802.3an):

 

 

This is a 10GbE standard for copper-based networking deployments. Networking silicon and adapters that follow this specification are designed to communicate over CAT6 (or 6a/7) copper cabling up to 100 meters in length. To enable this capability, a 10GbE MAC (media access controller) and a PHY (Physical Layer) designed for copper connections work in tandem.

 

 

10GBase-T is viewed as the holy grail for 10GbE because it will work within the most prevalent Cat 6/7 based infrastructure that is already in place. For this flexibility, 10GBase-T trades off higher power, and higher latency.

 

 

10Gbase-KX4 (aka: IEEE 802.3ap):

 

 

This is a pair of standards that are targeted toward the use of 10GbE silicon in backplane applications (such as a blade design). The specifically is designed for an environment where lower power is required and shorter distances (up to only 40 inches) are sufficient.

 

 

10GBase-SR (aka: IEEE 802.3ae):

 

 

This specification is for 10GbE with optical cabling over short ranges (SR = Short Range) with multi-mode fiber. Depending on the kinds of fiber, SR in this instance can mean anything between 26 - 82 meters on older fiber (50-62um fiber). On the latest fiber technology, SR can reach distances of 300m. To be able to physically support a connection of the cable, any network silicon or adapter that support 10GBase-SR would need to have a 10GbE MAC connected to an Optics module designed for multi-mode fiber. (We'll discuss optics modules in more depth further down in this post.)

 

 

10GBase-SR is often the standard of choice to use inside the datacenters where fiber is already deployed and widely used.

 

 

10GBase-LR (aka: IEEE 802.3ae, Clause 49):

 

 

LR is very similar to the SR specification except that it is for Long Range connections over single-mode fiber. Long Range in this spec is defined as 10km, but distances above that (as much as 25km) can often be obtained.

 

 

10GBase-LR is used sparsely and really only deployed where ultra long distances are absolutely required.

 

 

10GBase-LRM (aka: IEEE 802.3aq):

 

 

LRM stands for Long Range over Multimode and allows distances of up to 220 meters on older standard (50-62um) multi-mode fiber.

 

 

10GBase-LRM is targeted for those customers who have older fiber already in place but need extra reach for their network.

 

 

10GBase-CX4 +(aka: IEEE 802.3ak):+

 

 

This standard of 10GbE connection uses the CX4 connector/cabling that is used in Inifinband^TM^* networks. CX4 is a lower power standard that can be supported without a standalone PHY or optics module (the signals can be routed directly from a CX4 capable 10GbE MAC to the CX4 connector). Due to the physical specification for CX4 based 10 Gigabit, it provides a lower latency than comparable 10GBase-T copper PHY solutions. With the use of CX4 passive (copper) cables, the nominal distance you can expect between your 10GbE links is ~10-15m. There are also amplified 'active' (but still copper) cables with nominal distances up near 30m.

 

 

Below is an image of a standard CX4 based socket that would be on a 10GBase-CX4 NIC:

 

 

 

 

There are also what referred to as ‘active optical' cables are for CX4, which actually have an optics module in the termination of the cable, and the cable body is fiber. This kind of active design increases cable reach and improves flexibility (fiber is smaller than copper pairs) but also increases cost. These active cables can increase reach up to 100m.

 

 

Intel has recently released our own series of active optical CX4 cables.

 

 

For short distances (such as inside the rack in a datacenter), CX4 offers one of the lowest cost ways to deploy 10GbE from switch to server. Because of its design, CX4 also achieves very low latencies as well.

 

 

&lt;/end of IEEE standards ramble&gt;

 

 

Ok, so we've summarized the majority of the IEEE 10GbE standards. But the immediate question arises: "Why are there so many?" Is the IEEE standards body for 10GbE just throwing out all these standards for every possible niche application? The answer is no. For any new standard IEEE phy interface standard to be approved, it must pass on several criteria including "distinct identity" and "broad market potential". While all of these standards certainly won't apply to any given institution's network, they all do all meet real market needs.

 

 

X2, XFP, SFP+... say what?

 

 

A final mystery that I've alluded to above has to do with the various optical module form factors that are available for 10GbE. XENPAK, X2, XPAK, XFP and SFP+ are standard optics module form factors that are used by both switch and NIC vendors in the industry. These modules that go along with the 10GbE networking products are an interesting beast. They are not specified by IEEE, but are standardized by a group of industry participants through what is known as a Multi-Source Agreement (MSA).

 

 

XENPAK, XPAK and X2 are the older module standards originally used for 1GbE, followed by XFP which shrunk the form factor of the actual module as well as the fiber cable pairs. SFP+ is a newer form factor that is now gaining momentum with switch and NIC vendors. An SFP+ optics module can use the same fiber pairs used with XFP (no new fiber cable needed), but the form factor of the cage in the switch or NIC as well as the optics module itself are smaller. The key advantage of using SFP+ is the new form factor can drive lower costs, lower thermals, and higher densities at the switch.

 

 

Here is an image of an older X2 optics module:

 

 

 

 

And here is a comparison of the size of XFP (right) relative to SFP+ (left):

 

 

 

 

The optics modules are driven by a low power interface from the 10GbE MAC. The interfaces are XAUI (for X2 modules), XFI (for XFP modules), and SFI (for SFP+ modules). These interfaces generally are supplied directly from the 10GbE based MAC to the module cage. One of the things the module MSA standards bodies agree on is not only a form factor for the module itself but also the electrical specifications of the driver interface that can be accepted from the MAC.

 

 

The key thing I want to hammer home here is that IEEE specification (such as 10GBase-SR) is separate from the module form factor used.

 

 

For example, you can have a Short Range optical NIC that uses X2, XFP, or SFP. So asking for an "SFP NIC" isn't actually specific enough, because that could mean a lot of different things. You'd have to specify a 10GBase-SR NIC, with SFP+ optics.

 

 

SFP+Direct Attach:

 

 

Now that I've thoroughly confused everyone, I'll take it one step further. Not only can each module form factor be used with different IEEE MAC specifications, but each module doesn't even need to be used for a fiber connection at all. An interesting example of using an ‘optics' module form factor for a non-optical design is SFP+Direct Attach.

 

 

SFPDA is similar in concept to CX4 but provides a bit more flexibility. Normally, you may have a switch or NIC that is designed to be able to support the addition of SFP based optics modules for a 10GbE fiber connection. Direct Attach allows for passive Twin-Axial (2 pair copper) cables to be plugged directly into the SFP+ cage (in place of an optical module) to carry the serial signal from the MAC directly over the cable to another SFP+ form factor enabled NIC or switch.

 

 

Again, the downside is that without either a standalone PHY, or optics module to send the signal over a long distance, a passive cable with SFPDA has a reach in the ~10-15m range. The real advantage for SPFDA over CX4 is that on the switch side the SFP+ module design allows higher density switches than CX4 can provide.

 

 

For a top of the rack switch, SFP+DA will likely provide excellent cost, power and latency characteristic and still have enough reach to be very feasible inside the rack.

 

 

10GbE - The Infrastructure is Ready!

 

 

I hope that I've lifted a little bit of the fog that surrounds the 10GbE market and the related technologies. The last thing I want to leave you with is the fact that 10GbE infrastructure is now starting to roll into the mainstream. CX4 switches are available broadly in the market today and SFP+ type designs for both optical modules as well as Direct Attach connections have been demonstrated and will be getting rolled out very soon by various vendors.

 

 

Intel is already selling a wide variety of NICs and silicon to meet the various form factors and standards based market needs I listed above along with other vendors in the market place.

 

 

After years of anticipation, 10GbE is finally hitting its stride. Next stop... 10_0_GbE...

 

 

8 Comments Permalink
0

 

Virtualization is without a doubt a very hot topic these days. Companies continue to look to server virtualization to increase the utilization rates of their systems and lower overall deployment and management costs. The basic model of a virtualized server is depicted below:

 

 

 

 

 

 

Essentially, you have a VMM (Virtual Machine Monitor) SW layer that talks between hardware and software and allows each virtual machine to successfully use what it thinks is one network port. This is a pretty straightforward model and it directly addresses the general reason for virtualization which is that generally the server may not be utilizing its processing power in full and is thus wasting CPU cycles.

 

 

There is an interesting result of this consolidation onto a single physical box with several Virtual Machines. In addition to consolidating CPU processes, you also effectively consolidate I/O bandwidth and switch processing capabilities onto the same platform. The overhead of this switching limits your bandwidth, adds CPU overhead, and effectively reduces the benefits of server virtualization. In some cases you may have a new problem in having created an I/O bottleneck.

 

 

This makes a lot of sense if you think about the fact that in essence, what you are doing is merging 5-10 machines that each had 1 or 2 ports of Gigabit Ethernet (all connected via a switch) into a single machine. This new server probably needs to have at least 6 ports or more of Gigabit Ethernet and may even require 10 Gigabit connections just to be able to support the new consolidated workload.

 

 

Enter Virtual Machine Device Queues (VMDq):

 

 

In order to help the I/O congestion associated with the additional VMM software switching in a virtualized environment, Intel implemented a technology called VMDq in our latest Ethernet NICs and silicon. VMDq is a technology specifically designed to offload some of the switching that was done in the VMM to networking hardware specifically designed for this function. This drastically reduces the overhead associated with I/O switching in the VMM which greatly improves throughput and overall system performance.

 

 

Below is a diagram that summarizes the new virtualized server stack with VMDq enabled:

 

 

 

 

 

 

On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer.

 

 

On the transmit side, the packets are serviced round robin style to avoid "head of line" blocking and alleviate potential quality of service (QoS) issues.

 

 

The immediate question I expect is "So, don't the VMM vendors have to support this?" And the answer is yes. Intel is supporting this feature today on shipping platforms, but you do need to work closely with the VMM vendor to make sure the whole stack works as designed.

 

 

Just this week Intel announced that our VMDq capability will be supported in VMware's upcoming ESX release. This is certainly a big step towards wide support of network virtualization performance enhancing features.

 

 

Ethernet technology has grown and become more important over the last 25 years, and the trend appears to be continuing on course.

 

 

Ben Hacker

 

 

--

 

 

For more details on VMDq, there is a VMDq Whitepaper, and an Intel® VT for Connectivity Datasheet located on our website.

 

 

0 Comments 0 References Permalink
0

While Intel is certainly most widely known for manufacturing our extremely complicated CPUs that are the brain of many computing platforms worldwide; there are several other products and technologies that people at Intel have been involved in for many years which are critical to computing environments everywhere. As a person who has been working in various networking and manageability roles at Intel since 2001, I'd like to take a little time to focus on Intel's history in the Ethernet market since its inception more than 25 years ago and focus a little on where the market might be going in the future.

 

Below is an image that tries to capture the key highlights of Intel's specific involvement in the Ethernet market over the last 3 decades:

 

 

 

As you can see the Ethernet market has come along way from clunky multi-chip 10Mpbs solutions from more than 25 years ago all the way to Quad Port Gigabit and Dual Port 10 Gigabit designs that are prevalent today.

 

Moving into the future the Ethernet market is growing increasingly more complicated by the year with new capabilities and features targeted specifically to support server virtualization, infrastructure convergence, enhanced storage technologies, and the continued importance of power efficiency of the overall compute infrastructure. Each of these innovations and changes will have a big impact on the overall structure and design what servers and datacenters will look like in the future. My colleague Ken Lloyd gave his thoughts on how 10 Gigabit technologies will provide I/O convergence and overall cost savings for computer networks in the future and there are clearly lots of interesting things going on right now.

 

Over the next several months I plan to try to go more in depth on many of the exciting developments taking place in the Ethernet market and to hopefully shed some light on some of the changes that are coming our way.

Stay tuned in the coming weeks!

- Ben

0 Comments 0 References Permalink

Filter Blog

By author: By date: By tag: