We use constructs called descriptors to move data into and out of the Intel® Ethernet device. Under Windows* these are controlled via our Intel® PROSet application, or via the property tabs if installed without Intel® PROSet. Intel® PROset allows the user to change the number of Descriptors with a parameter called “Receive Buffers” for RX Descriptors and “Transmit Buffers” For TX Descriptors. Some drivers will call it buffers and others descriptors. For our discussion they are the same thing. Why “buffers” when it influences the descriptor count? Most people understand what a network buffer is, but descriptors are a logical construct for our hardware and we don’t want to confuse people.
Here is what it looks like on my laptop:
You can see the “Transmit buffers” are below it. To modify the number of descriptors you just increase the value. In our Windows offerings there will be a limit of 2048 and it must be in increments of 8. On the Transmit side the starting value is 512, but the same rules of 2048 by 8 still apply. Why more TX than RX? Our architecture favors the nondeterministic RX side for priority so there is more turnover of descriptors than the TX side. Plus the O/S can sometimes not return TX assets back to the driver in a timely fashion.
Now under Linux*, for igb you use the ethtool interface to set ring parameters and the max is 4096 by 8. For the e1000 version 8 and greater, you pass it a parameter to change the size. The TX parameter is a little different in that it is the exponent you want, two multiplied by the value, to get the ring size you want: 2^7 to 2^12 (128 to 4096). This is already set to max, so not much to worry about there. On the RX size, it’s a straight number with a range of 80 to 4096, with the default being 256. The e1000e driver uses the same method as the igb driver, so ethtool will be your friend in that case.
Why does Linux have a bigger top end? Windows makes a bigger deal out of non-pageable memory than Linux does, and so our driver team limits the top end. Since each descriptor comes with a 2048 byte buffer, (and sometimes more) that can be a lot of memory. 8MB doesn’t sound like a lot for a modern system that can have 8GB of RAM, but we like to keep our memory footprint small.
When should you make changes? There is no magic formula for it. This article outlines some performance tuning ideas. But in general, there are two cases where you would want to make changes: 1) Low memory and 2) CPU or bus saturation.
If you running light on memory, you can drop the number of descriptors down but how fast the packets get processed by the O/S will determine if you suffer from missed and dropped packets because of a lack of resources.
If you have a large number of packets incoming and they are being processed slowly, you might want to turn up the number of buffers. This will increase the entry buffers, but the real problem is why the packets are being processed slowly above it. There is a cure for that most of the time, and it’s a called the Intel® XEON® processor E7 Family. Look around for articles on how to tune the memory usage of your favorite application; most serious application vendors have tuning tips on the web. If the O/S can service and return the buffer back to the driver in under a millisecond, you don’t need more buffers since the data is moving fast enough. If the buffers go to an upper layer like the stack or the application and sit, then more buffers is just going to treat a symptom and not the problem. You are better off to keep digging for the root cause than just slapping on more buffers. The stack is part of the equation a great deal, so check out its statistics and errors to see if your network is underperforming because of slowness there.
Other than more descriptors what can I do? Make sure the data is going to the core that is going to be doing the work. Use RSS and MSI-X to make sure that you’re not moving your data several times. If you’re sending your traffic to a core that is saturated, maybe consider moving some of what is running on that core to another core. Process affinity is pretty easy to use, and can make sure you’re keeping all those cores working evenly. You might also consider updating the O/S as an option. Not always a very attractive option, but modern O/Ses are very aware of the loads that a network can bring, and the vendors listen to our suggestions like never before. We saw major improvements moving from one O/S to a newer O/S from the same vendor. I won’t name names since we all have our off days, but since the driver for our stuff didn’t change, it clearly pointed at the cause. The application can also be a good source of tuning, so scour the apps support site for tuning ideas.
As always, thanks for using Intel® Ethernet