A question I get on occasion is how to make my teaming go faster. This webpage outlines a TON about teaming. It's a great “go to” reference and lists abilities the infrastructure must have.
In terms of the throughput, I'm looking at a test pass report in which we ran some teaming testing. In a two Intel® Ethernet Server Adapter X520 team under Windows* in our testing we saw 18Gb doing just TX, almost 19Gb doing just RX, and almost 32Gb bi-direction(BX). That's 18 out of 20, 19 out of 20 and 32 out 40 (10TX1 + 10TX2 + 10RX1 + 10RX2). Where did the 8Gb go? Overhead! Each time somebody has to act on that packet, it takes time, CPU and trips to memory. Each step and packet touch slows things down enough for it to add up. Also remember that the usable data throughput of 10GbE unidirectional is about 9.49Gb. Headers, checksums, consume the rest.
We used 16 1Gb clients per port for this test. That's 32Gb RX and 32Gb TX so that should provide saturation. With 10Gb clients you can lower the number needed. The ultimate goal with over-saturating the traffic load is to ensure that traffic generation isn’t the bottleneck for your benchmark.
Make sure your switch is up to the task. Most people assume the switch can handle it, but some switches don’t have a lot of backplane throughput to handle the saturation traffic levels. And for some teaming options, like 802.3ad, you will need specific switch configurations. These may need additional license. Consult your switch vendor for more details.
On the system side, in the BIOS we get aggressive with the Power Management related setting. On high performance runs, we don’t care about power. We usually turn off processor C-states and Speed Step off so the processor doesn't try to sleep during the test. The time lost coming out of the sleep states will cost you performance. Performance at its core isn’t about bandwidth; it truly is about time. The processor guys are epic in keeping these transitions as fast as possible, but in the Ethernet performance land, we can’t spare time for anything. At 10Gb, that can be 11 million packets per second. Even with a 4 Gigahertz CPU, nanoseconds lost can mean packets lost.
A dynamic team seems faster than static ones since we use dynamic more for our testing. It also keeps you away from proprietary solutions. Make sure your server has LOTS of RAM. Ours have 12GB. Obviously you’ll need an O/S that can address all that memory. Clients had only 2GB each.
Sorry I can't share the test report (it includes can’t-share info on 3rd party’s adapters) but I think I captured the high points to help you get your teams tuned. Let me know if you have questions in comment section.
And, as always, thanks for using Intel® Ethernet