Home > Intel Communities > Open Port IT Community > The Server Room > Blog > Tags > networking

The Server Room Blog

9 Posts tagged with the networking tag
0

I’m a bit late in relaying my thoughts from Intel’s Developer Forum (IDF), but there was definitely some excitement around virtualization and high performance networking that I wanted to get the word out about!

In the past I’ve shared some details about SR-IOV and the advantages you can gain by being able to present virtual LAN hardware to each Virtual Machine (VM), effectively avoiding the Hypervisor when presenting virtual devices to each VM.  The advantage of being able to do this is clear:  The less interaction in the networking stack there is from the hypervisor, the less processing overhead is required for the system process the data.

That’s all good because if you have a dual 10 Gigabit adapter, you can segregate those two physical pipes into perhaps 16 virtual pipes that get exposed to 16 VMs.  By segregating these LAN pipes at the hardware level with SR-IOV instead of using Hypervisor switching, the performance gains in both CPU utilization as well as maximum total throughput can be very large.  There were several demos at IDF with various configurations, but reductions in CPU utilization of 40% were possible coupled with dramatic improvement in throughput!

But there is unfortunately one minor complication that I didn’t mention in my last post on the topic of SR-IOV.  There is the little fact that when VMs move between physical boxes (a usage that is highly desired and commonplace these days) you run into some problems with this SR-IOV capability.  When the hypervisor owned the network hardware abstraction, the performance was worse, but the functionality was better because you could seamlessly migrate from one box to another and the virtualization application would handle the details.  But with SR-IOV, a new layer needs to be added so that the direct hardware connection between the VM and the LAN hardware can be moved to a new box.

The really exciting part of IDF demos that I saw was the demonstration not just of the SR-IOV functionality on multiple hardware and virtualization configurations, but that these demonstrations also showed updated software from two virtualization vendors allowing mobility of the VMs while supporting SR-IOV! 

There was a demo on Dell systems showing this fully functional SR-IOV implementation with Citrix’s Virtualization suite.  There were two separate demonstrations on Dell systems, with VMWare displaying their new Network Plug-In Architecture (NPIA) solution that allows for the migration of SR-IOV connected VMs seamlessly between servers.

For those hungry for more detail, I’ve included the three SR-IOV demonstration videos here:

The first is the Citrix demonstration on Dell and Intel hardware of SR-IOV with VM mobility:

 

These next two are two videos are demos on Dell and Intel hardware with VMWare and their NPIA software implementation.

 

Each virtualization demo shows the massive performance benefits under various workloads when moving from Hypervisor based LAN segregation to SR-IOV implementation.  But most importantly, each demonstration proves out the capability to migrate VMs between physical hardware.  The only system hardware requirement is that the server itself supports VT-d.  If the networking hardware in the newly migrated-to box supports SR-IOV you get better performance, and if not, the solution falls back on the legacy Hypervisor virtualization.  Backwards compatibility is maintained!

I didn’t get firm details on when this full support for SR-IOV and migration will be available in Citrix and VMWare’s releases, but the demos looked pretty clean, and hopefully these suites will be available soon with this new functionality.  The LAN and Server hardware ecosystems are ready today, and it looks like the software vendors are just around the corner.  Virtualization momentum continues!

While virtualization was the big takeaway for me from IDF, there were also several other interesting demos for us networking hounds.  I’ve linked a couple videos of them below for anyone still thirsting for more of the latest networking technology and performance details!

The first video is a demonstration of Intel’s 82599 10 Gigabit Ethernet-based adapter card with Fiber Channel over Ethernet (FCoE) support.  Storage and Ethernet together at last!

The second video is a demonstration of Intel’s NetEffect 10 Gigabit Ethernet card publishing 1 million messages per second in a simulated NYSE floor trading scenario.  Oh yeah, only 35uS of latency.  That is fast.

So although I am two weeks after IDF, I hope some of you got a little taste of the networking excitement that took place.   Industry wide, hardware and software vendors alike are delivering ultra high performance low latency applications for the financial services industry, as well as mainstream performance increases for virtualization.  The performance and technology beat moves forward.  Exciting times!

--

Ben Hacker

0 Comments Permalink
2

I love food! Since I was a kid I’ve loved noodles, especially Italian pasta. I used to think that Spaghetti was the general name for Italian noodles. Learning how to twist Spaghetti on a fork gave a great sense of achievement and joy.

Bowl of Spaghetti.jpg

Many years later my wife and I travelled to Rome. Naturally – both of us really love food, we spent a lot of time seeking out restaurants and checking out new food. One of the wonderful dishes we had was Pappardelle (on the left) with Duck Ragu. It was my 1st encounter with Pappardelle –a very wide form of pasta. You get only a few Pappardelle on your plate but it’s still the same amount of pasta. I found it not as practical to twist the Pappardelle around my fork, so I cut them up into smaller pieces to eat them.

Pappardelle.jpg


I’ve thought about it recently when I looked at the back of a virtualized server. Looks similar, no? J

Server with 1GbE.jpgBowl of Spaghetti.jpg

A typical virtualized server has 8-10 1Gigabit Ethernet (1GbE) ports, and 2 Fibre Channel ports. This makes for a lot of cabling, and many add-in cards. It translates to a lot of cost, power, and complexity (and thus reliability risk) for an IT shop. As a result, there’s a lot of buzz around high-speed networks, specifically 10GbE. That technology presents the opportunity to consolidate all these 1GbE ports to a significantly smaller number of higher bandwidth, i.e. 10GbE ports. It makes for a much tidier server.

Server with 10GbE.jpg

Kind of like substituting Pappardelle for SpaghettiJ


In that case, iSCSI or FCoE (Fibre Channel over Ethernet) could be used for the SAN connection, still using the same high-speed ports. Standards like Data Center Bridging (DCB) could add a lossless character to the 10GbE link to make it friendlier to FCoE.

Few new solutions though come without new challenges. The common way for VMs to share I/O devices in a today’s environment in through mediation of the hypervisor, using emulation or para-virtualization. That reduces the effective I/O bandwidth. It also becomes a fairly significant overhead to the server in its own right, reducing the available server capacity for application processing, and it adds latency. With the growing trend in IT to treat virtualization as a default deployment mode for any application, these issues become quite limiting.

We at Intel have thought that the best way to overcome these issues is by using “direct assignment”. Using the Intel® VT-d technology (launched in the Xeon 5500 platform), a VM can be assigned a dedicated I/O device. This nearly eliminates the overheard related to the hypervisor mediation I mentioned above. A side benefit is that it increases the VM to VM isolation and security. But assigning an individual I/O device to one VM is not very scalable…

This is where the PCI-SIG’s SR-IOV (Single Root I/O Virtualization) standard comes into play. This standard allows a single I/O device to present itself as multiple virtual devices. With SR-IOV, each virtual device can be assigned to a VM, adding scalability to the direct assignment model, effectively allowing the physical I/O to be shared yet with greater security and reliability.

Another challenge with the direct assignment model is related to live migration. Hypervisors have typically assumed the SW mediated IOV model. As a result, hypervisors need to be modified to adapt their live migration solutions to direct assignment.

These technologies span many different components of the server platform. Intel® VT-d is necessary, so Xeon 5500 must be used (or later platforms). SR-IOV capable I/O devices – NICs or Storage controllers, are required. BIOS must be modified, as well as hypervisor software. This is pretty heavy lifting.

So you can only imagine how excited I am to be able to showcase 4 different SR-IOV demos at IDF next week! The demos involve 2 server vendors, 3 VMM vendors – 3 different vendors implementing 3 different hypervisor architectures, and 3 different IHVs representing 2 different I/O technologies. We show the performance improvements, as well as VM live-migration. It works!

Come and see it (Booths 517, 707, 709, and 711 in the IDF showcase)!

2 Comments Permalink
0

Intel has just launched the Intel® Ethernet Server Adapter X520 family.  These NICs are Intel’s first 10 Gigabit adapter products that support “pluggable” optics.  This additional configuration option gives IT users a great deal more flexibility in how they deploy 10 Gigabit in their servers and datacenters.

 

The X520 Family of adapters support bailed optics that allow the removal or addition of different kinds of optics or support with no optics at all.  For previous 10 Gigabit products, if you wanted 10 Gigabit SR Fiber connectivity, you had to purchase a 10 Gigabit SR adapter.  But with the pluggable X520 adapter family, you can support SR, LR, or simply an SFP+ direct attach cable via the same card by simply removing / exchanging the optics.

 

 

X520Images.jpg

 

 

With the X520 you can still buy an SR or LR fiber configured adapter, but you can also switch back and forth after purchase by ordering only the new optics that you want to support (not a whole new adapter).  In the case of the Direct Attach adapter that supports an SFP+ cage, but comes without optics inserted, you can still use Twin-Ax copper cables to run in the rack less than 7m length runs of 10 Gigabit, but you can also upgrade the Direct Attach adapter later with SR or LR optics as the needs for the particular adapter may change. You can also mix and match optics modules in a dual-port adapter, meaning you could have an LR module in one port and an SR module in the other. You could also throw a Twinax cable into the mix.

 

The Intel® Ethernet optics modules for the X520 family of adapters also support both 1 Gigabit and 10 Gigabit speeds to help with backward compatibility – an industry first.

 

Finally, while this new pluggable capability of Intel 10 Gigabit adapters adds a bit more usage flexibility from an IT perspective, the performance capabilities and advanced features for the datacenter I’ve discussed over the past 18 months are also supported.  The X520 is based on the Intel® 82599 10 Gigabit Ethernet Controller, so the end result is a flexible product that can help unleash server IO performance whether FCoE, iSCSI, Virtualization, Security, or just raw IO performance.  Regardless of your 10 Gigabit needs, the X520 probably has what your Server environment needs.

 

--

Ben Hacker

0 Comments Permalink
0

In my last I/O Virtualization blog, earlier this year, I discussed a fundamental problem with virtualizing I/O and one the solution that Intel and VMware have teamed up to deliver - VMDq and VMware NetQueue. These queuing technologies together can help to offload some of the virtual switching (vswitch) functionality to the network adapter from the hypervisor. VMDq provides a method for the Hypervisor to do less work, and also provides a way to share the I/O processing across multiple cores; improving system bandwidth and more fully utilizing its processing power.

 

Now, VMDq and NetQueue are a great solution together that scale well, support Vmotion, and are relatively simple to manage. However, is there a way to get even better performance from your Virtualized I/O?

 

What if there was a way to completely cut the Hypervisor software switch out of the picture and remove the associated latency and CPU overhead? The ideal scenario for optimum performance is for the VM to communicate directly with LAN hardware itself, and bypass the vswitch completely. For example, you could have a single 10 Gigabit port expose multiple LAN interfaces at the hardware level (on the PCI-e bus), and each VM could be assigned directly to a hardware interface. Alternatively, you have multiple physical NICs in the system that could be directly assigned to a given VM. Below is a diagram that summarizes the 3 main variations of attachment for I/O in a virtualized server. Below we will get into more detail to put the diagram in context.

 

 

 

 

In the diagram above, the left side represents an implementation of a virtualized environment with a standard I/O setup using the Hypervisor vswitch and VMDq for I/O performance enhancement. In the middle is an example of direct I/O assignment between a single physical LAN interface and a single Virtual Machine. The implementation on the right is showing what is possible with a single NIC that supports SR-IOV (we'll discuss this later) for a fuller, hardware level I/O virtualization. After taking a moment to understand the basic differences in these three implementations, there are immediately a few obvious benefits here for bypassing the Hypervisor vswitch and going with either of the two directly assigned designs...

 

 

By allowing the Virtual Machines to talk directly to the networking hardware, throughput, latency, and CPU utilization of the I/O traffic processing will be greatly improved. So the question is, "why hasn't this been done before?" Well, the answer is that there are several gotchas to make this implementation work well...

 

 

First, in order to implement this properly, the LAN hardware needs to support some physical capabilities to successfully route the networking traffic in this kind of virtualized system. In addition to all of the above the actual server hardware itself must also support VT-d so that the memory mapping between the Virtual Machine PCI-e memory space and the systems physical memory space are correlated correctly. Also, the actual system itself must also support VT-d so that the memory mapping between the Virtual Machine (I/O data memory address) and the systems physical memory address are correlated correctly.

 

 

Finally, and this is a big one, this kind of implementation while very good for performance just happens to break the ability to move a VM from one physical server to another (VMware Vmotion). This is one of the more widely used aspects of VMware's software that has been utilized heavily by most IT shops. Seamless vmotion support is critical for making any I/O performance improvement deployable in the real world.

 

 

Now, if you stop at the 2nd diagram, and use separate NICs for each VM, you will also miss out on a few key advantages of new Ethernet capabilities. You won't be able to allocate your overall bandwidth between your VMs (each VM will get a single Gig or 10Gig port), and more importantly, you won't be able to effectively share higher bandwidth pipes. For example, a server with a few 10 Gigabit ports may have enough I/O horse power to handle traffic for 30 VMs, but there would be no way to assign only a portion of the bandwidth of the pipe to an individual VM.

 

 

Additionally, the LAN hardware needs to support the ability for each virtual function of the LAN device to be able to support bandwidth segregation (think QoS per VM) and the ability to support multiple queues and traffic classes per LAN virtual function. This last piece is necessary for those who remember the discussion on Fiber Channel over Ethernet (FCoE), as the ability to support multiple traffic classes, and dedicated bandwidth links, are key needs for the storage over Ethernet market.

 

 

Now that I've set up what is needed to make this directly assigned virtualized I/O environment work, and called out the potential problems, you don't need to worry; I won't throw cold water on this idea. In fact, most of the pieces are in place today and there is already work being done to complete the solution as we speak.

 

 

First, Intel network adapters now support some fancy hardware capabilities related to virtualization. In addition to all the hooks for VMDq, our newest NICs support PCI-SIG SR-IOV (I know... technologist love acronyms) which provides the ability to virtualize the LAN at the lowest hardware level. The networking hardware also supports some smart logic to be able to function properly in a virtualized system. For example, VM to VM communication in the same server must be looped back before it gets to the wire or the switch connected to the machine won't know how to route the packet. This is all taken care of in the LAN hardware. And of course, all the support for bandwidth segregation, and support for multiple queues and traffic classes is there as well to make sure Storage and other QoS sensitive applications are still going to work well.

 

 

As for VT-d support, Intel platforms now come with this basically standard, so there is no issue there. But the last most important piece is the ability for an individual VM to be moved between physical servers while still being able to ‘renegotiate' with its physical network connection. The ability to do this is under development by Intel, VMware and others in the industry, and the end goal is to have an architectural framework in place to make this kind of handoff seamless from a hardware and software perspective.

 

 

This architectural framework will be the topic of a future post, as I think I've used up all the lines I can before I start putting my readers to sleep. Until next time!

 

 

Ben Hacker

 

 

0 Comments Permalink
1

If there is one thing that has stayed consistent in the computing industry over time, it's that performance doesn't stand still. As our computing platform processing, I/O, and memory speeds continue to accelerate, it is important to remember a little thing called latency.

 

Often in the Ethernet world throughput is the 1st and last performance metric of choice. 1 Gigabit and 10 Gigabit are the numbers that inspire thoughts of increased performance, and improved computing power. However, it's important to note that, in many applications, the transaction latency over the wire is really the key to unlocking high performance at the system level. One of the primary reasons that some organizations have turned to Infiniband and other I/O technologies for HPC and clustering in the past has to do with their desire to achieve very low latencies, not necessarily increased throughput. If you look at a historical standard Gigabit Ethernet connection, you may see latencies that are around 125μs. This may have been ok in the past, but as improvements at the application level as well in the system hardware and CPU take hold, legacy Ethernet won't be good enough for HPC and clustering environments.

 

 

The interesting, and often overlooked fact with Ethernet is that the latency characteristics are improving as the industry moves from 1 Gigabit to 10 Gigabit. The faster throughput on the wire comes along with lower latency to some extent, but in addition, there have been several improvements in interrupt handling that drastically improve overall latencies when comparing legacy 1Gigabit to 10Gigabit. With a basic 1st generation Intel® 10Gigabit CX4 card you can now see latencies approach 25μs without any special tuning.

 

 

What's even better is that Intel's 10 Gigabit networking silicon also has further enhancements for improving latency by introducing some new specialized Low Latency Interrupt (LLI) filters in the silicon. These filters provide the hardware with a quicker reaction time to network packets that meet certain customizable criteria. The filters can be tuned to have a rapid response to certain packet and traffic types. With these kinds of LLI filters in place, latencies can be reduced further by another ~50% to ~14μs.

 

 

Going forward with 10 Gigabit there are new technologies and designs that can help push latency even lower to the sub-10μs threshold to keep Ethernet very competitive as a fabric not only from a cost and throughput perspective, but also from the perspective of latency.

 

 

And while lower latency is certainly important, the last piece that was really missing from the Ethernet performance puzzle was not just low latency, but deterministically low latency. The key is that the worst case packet latencies for many applications are relevant and very important. By application thread affinitization, the individual data thread can be piped directly between a network queue and a CPU core. By more evenly distributing the networking workload between CPU cores in a predictable fashion, you get a deterministic kind of latency that does not stray far from the average assuming CPU cores do not get oversubscribed. Average latency of ~14μs is good, but the fact that you can get this with reasonable determinism is a key for many applications and usages.

 

 

Now, lower, deterministic latency is not just a theoretical benefit for certain niche applications. Decreasing latency and improving overall latency characteristics while increasing throughput directly benefits the transaction rates that can be achieved with real world applications. As an example of the improved performance is the latest Reuter Market Data Systems (RMDS) benchmarks done by STACResearch on the 4-way Intel® Xeon E7450 (Dunnington) using the Intel® 82598EB 10 Gigabit AT Dual Port networking adapter. The testing showed the Highest Point-to-Point Server throughput to date on a single server in testing done by STAC. And total updates per second reached over 15 million. Financial Service industry administrators: I can see you drooling...

 

 

Latency and throughput numbers are great to talk about, but at the end of the day, real world application performance on real systems is the key. While there will always be a small subset of the high end server market that needs the absolute lowest latencies provided by Infiniband; 10 Gigabit Ethernet is gaining ground while maintaining its place as the default fabric of choice for multiple applications and traffic types. I believe the best is yet to come as newer, faster, and more responsive technologies continue to roll out.

 

 

Ben Hacker

1 Comments Permalink
4

Ethernet has been around a long time. It is a highly reliable and trusted means for interconnecting computing nodes, and above that, it has generally been the most commoditize (read: lowest cost) form of interconnect for quite some time. Broad deployment, administrator trust, and low cost have kept Ethernet as the mainstream fabric for LAN traffic for a long time.

 

However, despite Ethernet's strong connectivity credentials, it still comes up short in certain applications. Ethernet is what is referred to as a ‘best effort' network. This simply means that in the real world, you will generally get pretty good performance (throughput, latency, lack of dropped packets, etc), but from time to time when there is congestion, packet drops and performance degradation can be quite a nuisance. For many applications, this doesn't matter. If you are using email, browsing the web, or transferring files to a shared drive, the only thing you will notice is a decrease in performance, but everything will still ‘work', and transfer properly. For some applications like storage though, this non-deterministic performance is unacceptable. If packets are dropped, or arrive out of order, storage applications have a nasty tendency to hang or crash.

 

Because of this limitation of the standard, there have been separate fabrics used for Storage Area Networks (SANs) for quite a while. One of the main fabrics developed and used for high performance SANs is known as Fiber Channel. In order to create a Fiber Channel network, a server and storage target need to support a Fiber Channel Host Bus Adapter (FC HBA) to communicate via the Fiber Channel protocol. In addition, the switches that connect the Fiber Channel infrastructure must also be dedicated Fiber Channel switches; a standard Ethernet router cannot be used.

 

Once in place, this SAN architecture provides a very high performance, high reliability network that is ideal (and required) for high end storage traffic, but it comes at a cost:

 

1) Fiber Channel HBAs are generally more expensive than their Ethernet counterparts.

 

2) You have to have a separate fabric in your network which also adds to your infrastructure (switch costs, and cabling costs) as well as complicates IT management.

 

3) Servers connected to the SAN now need to have an Ethernet adapter AND a Fiber Channel adapter.

 

The upside to the additional cost and complexity is of course better performance, but the question has always been "Is there a better way?"

 

I believe there is a better way, and that Fiber Channel over Ethernet (FCoE) (and importantly, the standards in IEEE that are making it possible) seems to be the logical path to solve the issue of performance on lossless performance on Ethernet, while maintaining Ethernets historical core cost advantages.

 

‘Best Effort' is not good enough:

The bottom line for today's Ethernet is that it simply can't provide the ‘lossless' behavior that storage traffic demands; but this fact is changing. Below I will summarize at a high level some of the standards being developed in IEEE to improve the performance of Ethernet for storage applications, and how they help to mend some of the issues with Ethernet and how that helps to enable FCoE.

 

Bandwidth Sharing, Priority Flow Control and Pause:

This capability offers a method to assign priorities to different Ethernet traffic classes. From there, when congestion becomes an issue, traffic can be ‘paused' on a per-priority basis; allowing the lower priority traffic to be halted temporarily while keeping the top priority traffic like storage running smoothly. This per-priority pause capability is really the first basic step in allowing Ethernet to provide some ‘QoS like' Layer 2 guarantees.

 

Congestion Notification (or Backward Congestion Notification):

In addition to simply pausing individual low priority streams of traffic, congestion notification allows for a communication method to go upstream from the node and notify the offending traffic generator to throttle back its traffic and re-route as necessary. This capability is a key to the longer term development of FCoE because with only the pause capability the congestion is really just pushed up a single node in the network. In order to support FCoE storage across multiple nodes in a network, congestion notification is needed.

 

Shortest Path Bridging:

This capability is really an optimization for inter-node routing that defines the path within the network between switches. Using traditional spanning tree path algorithms will sometimes result in paths in the network that are non-optimal and incompatible with high performance storage traffic. A new algorithm to determine the shortest path between nodes will help to enable both less congestion in the network as well as fast delivery of critical packets for storage.

 

DCB Capability Exchange Protocol (DCBX):

This capability goes by several different names depending on who you talk to, but essentially what it will provide is the ability for switches on the network to exchange their capability sets with other nodes of the network. This allows for each switch to understand what others switches near it can use the Congestion Notification, Flow Control, or other features need to support this ‘Lossless Ethernet' capability.

 

While the list above is not meant to be all inclusive of all the new IEEE development under way for this new ‘Lossless Ethernet' initiative, it should provide a good overview of the general push taking place and how the goal of getting to near lossless performance is going to be accomplished.

 

Weren't we talking about Fiber Channel?

Astute users will realize that I haven't really addressed the Fiber Channel piece of this. The above features I described only allow for Ethernet to carry certain kinds of traffic (like Fiber Channel) that require very high reliability and performance; but how do you get the Fiber Channel data onto an Ethernet frame?

 

In today's environment, a Fiber channel initiator on a Server system will place Fiber Channel data onto an FC HBA to send over the SAN to a storage target. All of this data is transmitted over a fiber channel network. Under the FCoE model, what you will need is a Server system that has an FCoE initiator, and on the target side, the switch connected to the target must be able to convert the data from storage target and encapsulate it into Ethernet. Beyond that, the data is transmitted over the Ethernet fabric as normal, but the features that I described above allow for the performance of Ethernet to allow a Fiber Channel application stack to function properly.

 

This is certainly a capability that Intel has been supportive of. Ethernet is a critical piece of the computing platform, and FCoE provides a potential improvement for datacenter and storage network design. By consolidating the Fiber Channel data onto a single Ethernet wire, end user IT houses can also see several benefits:

 

1) Reduced the need for two physical network cards in each server. Now, a single NIC will connect to the SAN and to the normal TCP/IP data network.

 

2) Along with the consolidation in network cards, you also save in terms of cabling. One 10 Gigabit link can replace the old Fiber Channel fiber link and Ethernet links.

 

3) Reduces power consumption and cooling

 

4) The commoditized and low cost nature of Ethernet provides additional benefit by converging system I/O onto what will likely be the lowest cost interface over the coming years; 10 Gigabit.

 

In summary, FCoE may be in its infancy, but the standards in final, or in process. Products are available today, and the value proposition in here. Further performance improvements and cost reductions and the proliferation of 10 Gigabit networks, as well as more choices in the future, will only further the support and interest in Fiber Channel over Ethernet in datacenter SAN applications.

 

 

 

~ Ben Hacker

 

 

 

 

Links for further information:

http://ieee802.org/

http://www.open-fcoe.org/

4 Comments Permalink
8

Sometimes you get so deep into something that you don't realize how crazy it is until you take a step back. Like most technology companies, Intel has an inherent love for acronyms. The cacophony of standards bodies, advanced technologies, and intense rates of change in our industry necessitates the use of abbreviation just to be able to communicate clearly and briefly. However, while I am at least as much of a techno-phyliac as most of the folks in the technology jungle, even I sometimes run into an acronym wall. I thought to help myself and others it might be a good idea to decode one of the newer sets of network technologies that I work closely with and to decipher some of the associated names and acronyms that come along with it.

 

10 Gigabit Ethernet: It's here, it's real, and it's growing fast_._

 

Ethernet (IEEE 802.x) has evolved over the years from a new standard linking computers together at slow rates and has moved from 10 Megabit per second (Mbps), to 100Mbps, to 1 Gigabit per second (Gbps), and a few years ago to 10GbE unidirectional throughput. Over time there have been several physical connection types for Ethernet. The most common is copper (Cat 3/4/5/6/7 cabling is used as the physical medium) but Fiber has also been prevalent as well as some other more esoteric (such as BNC Coax) physical media types. The most common 10GbE adapter (until very recently) has been Optical only due the difficulty of making 10GbE function properly over copper cabling.

 

But this post isn't meant to discuss the past, but more to decode the present and future as it relates to 10Gig Ethernet and the variety of flavors that are available. Below I'll cover a number of acronyms for 10GbE IEEE standards that are often lumped together as '10 Gigabit' and discuss some of the differences and usages for each. After that, I'll also try to clear up some of the confusion about ‘form factor' standards for optical modules (which are separate from IEEE) and some of terms and technologies in that realm:

 

 

10GBase-T (aka: IEEE 802.3an):

 

 

This is a 10GbE standard for copper-based networking deployments. Networking silicon and adapters that follow this specification are designed to communicate over CAT6 (or 6a/7) copper cabling up to 100 meters in length. To enable this capability, a 10GbE MAC (media access controller) and a PHY (Physical Layer) designed for copper connections work in tandem.

 

 

10GBase-T is viewed as the holy grail for 10GbE because it will work within the most prevalent Cat 6/7 based infrastructure that is already in place. For this flexibility, 10GBase-T trades off higher power, and higher latency.

 

 

10Gbase-KX4 (aka: IEEE 802.3ap):

 

 

This is a pair of standards that are targeted toward the use of 10GbE silicon in backplane applications (such as a blade design). The specifically is designed for an environment where lower power is required and shorter distances (up to only 40 inches) are sufficient.

 

 

10GBase-SR (aka: IEEE 802.3ae):

 

 

This specification is for 10GbE with optical cabling over short ranges (SR = Short Range) with multi-mode fiber. Depending on the kinds of fiber, SR in this instance can mean anything between 26 - 82 meters on older fiber (50-62um fiber). On the latest fiber technology, SR can reach distances of 300m. To be able to physically support a connection of the cable, any network silicon or adapter that support 10GBase-SR would need to have a 10GbE MAC connected to an Optics module designed for multi-mode fiber. (We'll discuss optics modules in more depth further down in this post.)

 

 

10GBase-SR is often the standard of choice to use inside the datacenters where fiber is already deployed and widely used.

 

 

10GBase-LR (aka: IEEE 802.3ae, Clause 49):

 

 

LR is very similar to the SR specification except that it is for Long Range connections over single-mode fiber. Long Range in this spec is defined as 10km, but distances above that (as much as 25km) can often be obtained.

 

 

10GBase-LR is used sparsely and really only deployed where ultra long distances are absolutely required.

 

 

10GBase-LRM (aka: IEEE 802.3aq):

 

 

LRM stands for Long Range over Multimode and allows distances of up to 220 meters on older standard (50-62um) multi-mode fiber.

 

 

10GBase-LRM is targeted for those customers who have older fiber already in place but need extra reach for their network.

 

 

10GBase-CX4 +(aka: IEEE 802.3ak):+

 

 

This standard of 10GbE connection uses the CX4 connector/cabling that is used in Inifinband^TM^* networks. CX4 is a lower power standard that can be supported without a standalone PHY or optics module (the signals can be routed directly from a CX4 capable 10GbE MAC to the CX4 connector). Due to the physical specification for CX4 based 10 Gigabit, it provides a lower latency than comparable 10GBase-T copper PHY solutions. With the use of CX4 passive (copper) cables, the nominal distance you can expect between your 10GbE links is ~10-15m. There are also amplified 'active' (but still copper) cables with nominal distances up near 30m.

 

 

Below is an image of a standard CX4 based socket that would be on a 10GBase-CX4 NIC:

 

 

 

 

There are also what referred to as ‘active optical' cables are for CX4, which actually have an optics module in the termination of the cable, and the cable body is fiber. This kind of active design increases cable reach and improves flexibility (fiber is smaller than copper pairs) but also increases cost. These active cables can increase reach up to 100m.

 

 

Intel has recently released our own series of active optical CX4 cables.

 

 

For short distances (such as inside the rack in a datacenter), CX4 offers one of the lowest cost ways to deploy 10GbE from switch to server. Because of its design, CX4 also achieves very low latencies as well.

 

 

</end of IEEE standards ramble>

 

 

Ok, so we've summarized the majority of the IEEE 10GbE standards. But the immediate question arises: "Why are there so many?" Is the IEEE standards body for 10GbE just throwing out all these standards for every possible niche application? The answer is no. For any new standard IEEE phy interface standard to be approved, it must pass on several criteria including "distinct identity" and "broad market potential". While all of these standards certainly won't apply to any given institution's network, they all do all meet real market needs.

 

 

X2, XFP, SFP+... say what?

 

 

A final mystery that I've alluded to above has to do with the various optical module form factors that are available for 10GbE. XENPAK, X2, XPAK, XFP and SFP+ are standard optics module form factors that are used by both switch and NIC vendors in the industry. These modules that go along with the 10GbE networking products are an interesting beast. They are not specified by IEEE, but are standardized by a group of industry participants through what is known as a Multi-Source Agreement (MSA).

 

 

XENPAK, XPAK and X2 are the older module standards originally used for 1GbE, followed by XFP which shrunk the form factor of the actual module as well as the fiber cable pairs. SFP+ is a newer form factor that is now gaining momentum with switch and NIC vendors. An SFP+ optics module can use the same fiber pairs used with XFP (no new fiber cable needed), but the form factor of the cage in the switch or NIC as well as the optics module itself are smaller. The key advantage of using SFP+ is the new form factor can drive lower costs, lower thermals, and higher densities at the switch.

 

 

Here is an image of an older X2 optics module:

 

 

 

 

And here is a comparison of the size of XFP (right) relative to SFP+ (left):

 

 

 

 

The optics modules are driven by a low power interface from the 10GbE MAC. The interfaces are XAUI (for X2 modules), XFI (for XFP modules), and SFI (for SFP+ modules). These interfaces generally are supplied directly from the 10GbE based MAC to the module cage. One of the things the module MSA standards bodies agree on is not only a form factor for the module itself but also the electrical specifications of the driver interface that can be accepted from the MAC.

 

 

The key thing I want to hammer home here is that IEEE specification (such as 10GBase-SR) is separate from the module form factor used.

 

 

For example, you can have a Short Range optical NIC that uses X2, XFP, or SFP. So asking for an "SFP NIC" isn't actually specific enough, because that could mean a lot of different things. You'd have to specify a 10GBase-SR NIC, with SFP+ optics.

 

 

SFP+Direct Attach:

 

 

Now that I've thoroughly confused everyone, I'll take it one step further. Not only can each module form factor be used with different IEEE MAC specifications, but each module doesn't even need to be used for a fiber connection at all. An interesting example of using an ‘optics' module form factor for a non-optical design is SFP+Direct Attach.

 

 

SFPDA is similar in concept to CX4 but provides a bit more flexibility. Normally, you may have a switch or NIC that is designed to be able to support the addition of SFP based optics modules for a 10GbE fiber connection. Direct Attach allows for passive Twin-Axial (2 pair copper) cables to be plugged directly into the SFP+ cage (in place of an optical module) to carry the serial signal from the MAC directly over the cable to another SFP+ form factor enabled NIC or switch.

 

 

Again, the downside is that without either a standalone PHY, or optics module to send the signal over a long distance, a passive cable with SFPDA has a reach in the ~10-15m range. The real advantage for SPFDA over CX4 is that on the switch side the SFP+ module design allows higher density switches than CX4 can provide.

 

 

For a top of the rack switch, SFP+DA will likely provide excellent cost, power and latency characteristic and still have enough reach to be very feasible inside the rack.

 

 

10GbE - The Infrastructure is Ready!

 

 

I hope that I've lifted a little bit of the fog that surrounds the 10GbE market and the related technologies. The last thing I want to leave you with is the fact that 10GbE infrastructure is now starting to roll into the mainstream. CX4 switches are available broadly in the market today and SFP+ type designs for both optical modules as well as Direct Attach connections have been demonstrated and will be getting rolled out very soon by various vendors.

 

 

Intel is already selling a wide variety of NICs and silicon to meet the various form factors and standards based market needs I listed above along with other vendors in the market place.

 

 

After years of anticipation, 10GbE is finally hitting its stride. Next stop... 10_0_GbE...

 

 

8 Comments Permalink
0

 

Virtualization is without a doubt a very hot topic these days. Companies continue to look to server virtualization to increase the utilization rates of their systems and lower overall deployment and management costs. The basic model of a virtualized server is depicted below:

 

 

 

 

 

 

Essentially, you have a VMM (Virtual Machine Monitor) SW layer that talks between hardware and software and allows each virtual machine to successfully use what it thinks is one network port. This is a pretty straightforward model and it directly addresses the general reason for virtualization which is that generally the server may not be utilizing its processing power in full and is thus wasting CPU cycles.

 

 

There is an interesting result of this consolidation onto a single physical box with several Virtual Machines. In addition to consolidating CPU processes, you also effectively consolidate I/O bandwidth and switch processing capabilities onto the same platform. The overhead of this switching limits your bandwidth, adds CPU overhead, and effectively reduces the benefits of server virtualization. In some cases you may have a new problem in having created an I/O bottleneck.

 

 

This makes a lot of sense if you think about the fact that in essence, what you are doing is merging 5-10 machines that each had 1 or 2 ports of Gigabit Ethernet (all connected via a switch) into a single machine. This new server probably needs to have at least 6 ports or more of Gigabit Ethernet and may even require 10 Gigabit connections just to be able to support the new consolidated workload.

 

 

Enter Virtual Machine Device Queues (VMDq):

 

 

In order to help the I/O congestion associated with the additional VMM software switching in a virtualized environment, Intel implemented a technology called VMDq in our latest Ethernet NICs and silicon. VMDq is a technology specifically designed to offload some of the switching that was done in the VMM to networking hardware specifically designed for this function. This drastically reduces the overhead associated with I/O switching in the VMM which greatly improves throughput and overall system performance.

 

 

Below is a diagram that summarizes the new virtualized server stack with VMDq enabled:

 

 

 

 

 

 

On the receive path, VMDq provides a hardware ‘sorter' or classifier that essentially does the pre-work for the VMM of directing which end VM the packets should go to. The NIC or LAN silicon is performing a hardware assist for the VMM layer.

 

 

On the transmit side, the packets are serviced round robin style to avoid "head of line" blocking and alleviate potential quality of service (QoS) issues.

 

 

The immediate question I expect is "So, don't the VMM vendors have to support this?" And the answer is yes. Intel is supporting this feature today on shipping platforms, but you do need to work closely with the VMM vendor to make sure the whole stack works as designed.

 

 

Just this week Intel announced that our VMDq capability will be supported in VMware's upcoming ESX release. This is certainly a big step towards wide support of network virtualization performance enhancing features.

 

 

Ethernet technology has grown and become more important over the last 25 years, and the trend appears to be continuing on course.

 

 

Ben Hacker

 

 

--

 

 

For more details on VMDq, there is a VMDq Whitepaper, and an Intel® VT for Connectivity Datasheet located on our website.

 

 

0 Comments 0 References Permalink
0

While Intel is certainly most widely known for manufacturing our extremely complicated CPUs that are the brain of many computing platforms worldwide; there are several other products and technologies that people at Intel have been involved in for many years which are critical to computing environments everywhere. As a person who has been working in various networking and manageability roles at Intel since 2001, I'd like to take a little time to focus on Intel's history in the Ethernet market since its inception more than 25 years ago and focus a little on where the market might be going in the future.

 

Below is an image that tries to capture the key highlights of Intel's specific involvement in the Ethernet market over the last 3 decades:

 

 

 

As you can see the Ethernet market has come along way from clunky multi-chip 10Mpbs solutions from more than 25 years ago all the way to Quad Port Gigabit and Dual Port 10 Gigabit designs that are prevalent today.

 

Moving into the future the Ethernet market is growing increasingly more complicated by the year with new capabilities and features targeted specifically to support server virtualization, infrastructure convergence, enhanced storage technologies, and the continued importance of power efficiency of the overall compute infrastructure. Each of these innovations and changes will have a big impact on the overall structure and design what servers and datacenters will look like in the future. My colleague Ken Lloyd gave his thoughts on how 10 Gigabit technologies will provide I/O convergence and overall cost savings for computer networks in the future and there are clearly lots of interesting things going on right now.

 

Over the next several months I plan to try to go more in depth on many of the exciting developments taking place in the Ethernet market and to hopefully shed some light on some of the changes that are coming our way.

Stay tuned in the coming weeks!

- Ben

0 Comments 0 References Permalink

Filter Blog

By author: By date: By tag: