Up until recently, Ethernet has grown to dominate enterprise and Internet networking applications without much consideration for packet latency.  In applications like email or video delivery, an extra microsecond or two doesn’t have that much impact. But many emerging datacenter applications need low latency networks.  These include financial trading, real-time data analytics and cloud-based high-performance computing. In addition, many web service applications can spawn a large amount of east-west traffic, where the latency of these multiple transactions can build up to unacceptable levels.  


The Intel® Ethernet switch product family can deliver the industry’s lowest layer 3 forwarding latencies while providing a large number of 10GbE and 40GbE interfaces.   Intel latency numbers are less half of traditional layer 3 Ethernet switches, and to deliver this performance, Intel uses several key technologies:


Cut-Through Switching: A switch in cut-through mode will start transmitting a data packet before it has completely received that packet. This is compared to store-and-forward switching, where a packet is completely received by the switch before it is forwarded to its next destination.  Store-and-forward switches can’t deliver low enough latency for the data center. 


Terabit Crossbar Switch:  In many cases, latency increases as a result of internal switch congestion.  This causes the packets to be queued until others are sent through, causing delays that can be unacceptable for real-time applications.  Intel uses a matrix-based crossbar switch that’s unique because of its capacity – 1 terabit-per-second.  This amount of crossbar over-speed can greatly reduce internal switch congestion.


Single Output Queued Shared Memory: Intel uses a proprietary SRAM technology that is fast enough to allow every input port to write into the same output queue simultaneously.  Without this technology, there may be insufficient on-chip bandwidth for simultaneous output queue access, so chip architects have traditionally relied upon combined input/output queued (CIOQ) memory architectures that build in a set of virtual output queues at every switch input. This is a complex solution that is very difficult to scale to large port count switches, and adds blocking that impacts packet latency through the chip.


High Packet-Rate Frame Processing: It doesn’t matter how fast you can enqueue and dequeue packets from shared memory if your frame forwarding pipeline can’t keep up. Intel employs special circuit technology that allows a single L2/L3/L4 frame processing pipeline to forward packets at full line rate even if they are minimum size packets arriving back-to-back on all ports simultaneously.  It does this while maintaining extremely low processing latencies under all forwarding conditions.


By adding a high-performance frame processing pipeline, terabit crossbar data path and very fast packet memory to a low latency cut-through switch, the Intel Ethernet Switch family chips deliver the latency that meets the needs of the new mesh, cloud and financial networking applications.