Earlier Ben posted in the Server Room about how we have demonstrated how Intel® Ethernet products can be used to generate 1 million I/O Operations Per Second. (IOPS).  This article left a lot of people saying "How did you do that?" and today I'll talk a little bit about how we did it.  Your mileage may vary, as they like to say, but this will enable you go get your own test bed up and running at or near the 1 million IOPS mark.  I used this ingredients list to make our demo at NAB which did just shy of 1 million IOPS for a week on the show floor.

 

First you'll need the right ingredients.  Like any recipe, you can do substitutions, but that may change things and not give you the same experience.  There are two sides of the equation:  Initiators and Targets.  We used a single initiator, the system under test (SUT).  Faster the better rule applies here.  We used an Intel® Xeon® 5500 series processor platform with the fastest RAM configuration available. The RAM needs to be as fast as possible, so watch how you install it.  Populating each memory channel in the system with a single stick of  RAM will keep the speed at maximum, so we just used 12 GB of 1333MHz RAM.  Make sure you use the x64 version of the O/S so you can actually use all of that RAM.  So on the Initiator you will need super fast CPU, super fast memory, and super fast LAN.  We used the Intel® X520-DA 10 Gigabit direct attach adapter.

 

For the second part of the equation, we needed a seriously fast target.  So we built one.  We used the StarWind* iSCSI product to build a modestly sized RAM drive array on 10 very fast machines.  Nearly as fast at the initiator, each machine also featured the X520-DA adapter.  Again, fast RAM, with enough for the RAM drive, but nothing excessive.  On each machine we made 5 RAM drives for a total of 50 RAM drives.

Back to the Initiator, we mapped all 50 drives into the Disk Manager and made them active.  Now we are ready for the test run. We used Iometer benchmarking tool, a free product that Intel open sourced a while back.  We used various Access Specifications, but the 512B I/O size (the smallest possible I/O size using Iometer) gives you the maximum possible IOPS.   The Max possible IOPS at 512B when running 10Gbs is 2.44 million. The math is really easy:  IOPS=bandwidth / IO Size.  We used 2 instances of dynamo, each with 25 workers.  Each worker was assigned one of the RAM drives to conduct its I/O.  We made sure RSS was on, with maximum number of queues supported to balance the work across all those cores.  The more cores you have the more IOPS you can do before you saturate the link.  If you add more 10G cards you should get more IOPS, up to the limits of the infrastructure.  QPI is fast, but even it has limits

 

Then we just sat back and collected the data.  The numbers would fluctuate around, part of the nature of the Windows* O/S.  As it does garbage collection, general housekeeping and even driver statistics gathering, it will cost some CPU cycles which will in turn cost some IOPS.  But we would peak above the 1 Million milestone.

 

This might seem a little "downhill with a tail wind" type of performance measuring, but it is really just like the top speed of a high performance sedan.  The speedometer might go to 160, but in its average use it might rarely see higher than 55.  This is the same case.  Under lab conditions Intel® Ethernet products generate eye popping numbers.  They do just as well in the real world.

Imagine what type of performance they can give your network.