12 Replies Latest reply on Dec 11, 2013 12:53 AM by Johansan

    SR-IOV VF performance and default bandwidth rate limiting

    Vladimir

      Hi All,

       

      Is there a default rate limit for VFs created for 82599?

       

      On our servers we create up to 32 VFs per NIC and assign to VMs (PCI passthrough). Servers run Ubuntu with KV limitM.

      We found out that usage of VF on Boot OS level gives up to 2-3Gbps throughput. Same between VFs.

       

      Measured using iperf running with single or multiple parallel threads.

       

      In another our environment we have X540 cards and same problem exists there as well.

       

      Thanks,

      -Vladimir

        • 1. Re: SR-IOV VF performance and default bandwidth rate limiting
          Patrick_Kutch

          Thanks for posting to the forum.

           

          There is no rate limiting by default.  If all VF's are in contention then each will get an equal share of the bandwidth - I have several blogs and papers detailing the rate lmiting goodness.

           

          There are several factors when dealing with performance.  One of the biggest is that you make sure the 10Gbps device is in a X8 PCIe slot.  If you have it in anyhing less, your performance is going to suffer horribly.

           

          After that, two VF's on the same port should communicate with each other at PCIe speeds, which should be in the neighborhood of 20+Gbps.  I don't have the exact numbers committed to memory :-(.

           

          What generation of processor/chipset are you using may also have a big impact.

           

          Look at those things and let us know how it goes.

           

          Thanx,

          Patrick

          • 2. Re: SR-IOV VF performance and default bandwidth rate limiting
            Vladimir

            Hi Patrick,

             

            Thanks a lot for a quick response.

             

            You are completely right about VF contention, but in this case the system is in complete idle state and only few VFs assigned to VMs.

             

            If cards in x4 slots ixgbe driver doesn't allow us to create VFs at all, but I double checked it (PCI Express:5.0Gb/s:Width x8) on all servers.

            The test we were performing was between any 2 servers (not between VFs).

             

            It is actually a bit more complicated as we are using bonding as well

            For HA purposes we were assigning 2 VFs to each VM (one per PF) and bonding them together (Ubuntu bonding, active/backup mode). Exactly the same is set on Boot OS level as well.

             

            In this test config all ports connected to the same 10G switch, so no ISL issues.

             

            CPUs are Intel(R) Xeon(R) CPU E5-2620...

            • 3. Re: SR-IOV VF performance and default bandwidth rate limiting
              Patrick_Kutch

              That performance is more inline of an emulated path through DOM0 rather than sr-iov.

               

              I assume you have seen the paper I published several months ago on SR-IOV and bonding:

              Latest Flexible Port Partitioning Paper is now available.  Learn about QoS and SR-IOV!

               

              I know I can do upwards of 6Gbps in a VF to another VF on a separate box and that was on older processor.

               

              Before going down the pather of interrupt alignment and such, can you do an experiment?

               

              Can you remove all bonding, and not fire up any VM's.  Then assign an IP address to a VF in the kernel in a different subnet than any other of your eth devices.  Do this on each side, then do some performance testing.  Let's remove all the 'other stuff' and just talk VF to VF before adding back layers of the other stuff.

               

              - Patrick

              • 4. Re: SR-IOV VF performance and default bandwidth rate limiting
                Vladimir

                Hi Patrick,

                 

                This is a great paper. Thanks!!!

                I configured IPs on regular VFs but with no luck - same 2-3Gbps. Strangely, but it varies from run to run. At the same time there is no any additional traffic on 10G switch.

                 

                I've not mentioned it before, but on these servers we are using Ubuntu 11.04 with kernel 2.6.38-8

                The IP utility there has version

                     ip utility, iproute2-ss100519

                and it seems like it doesn't support rate limiting. At least when I try to set it up it returns:

                 

                # ip link set eth103 vf 22 rate 10000

                     RTNETLINK answers: Operation not supported

                 

                but our VMs are running with Ubuntu 12.04.1 (3.2.0-25). VFs exposed directly to VMs using PCI passthrough with KVM and iperf gives same results there as well:

                 

                On server side:

                # iperf --server --len=64K --nodelay

                 

                On Client side:

                # iperf --client xx.xx.xx.xx --len=64K --nodelay --parallel=4 --time=50 --interval=10

                 

                [ ID] Interval       Transfer     Bandwidth

                [  6]  0.0-10.0 sec   583 MBytes   489 Mbits/sec

                [  3]  0.0-10.0 sec   561 MBytes   471 Mbits/sec

                [  5]  0.0-10.0 sec   500 MBytes   420 Mbits/sec

                [  4]  0.0-10.0 sec   536 MBytes   450 Mbits/sec

                [SUM]  0.0-10.0 sec  2.13 GBytes  1.83 Gbits/sec

                [  5] 10.0-20.0 sec   470 MBytes   394 Mbits/sec

                [  3] 10.0-20.0 sec   526 MBytes   442 Mbits/sec

                [  4] 10.0-20.0 sec   515 MBytes   432 Mbits/sec

                [  6] 10.0-20.0 sec   548 MBytes   459 Mbits/sec

                [SUM] 10.0-20.0 sec  2.01 GBytes  1.73 Gbits/sec

                [  3] 20.0-30.0 sec   558 MBytes   468 Mbits/sec

                [  5] 20.0-30.0 sec   568 MBytes   476 Mbits/sec

                [  4] 20.0-30.0 sec   480 MBytes   403 Mbits/sec

                [  6] 20.0-30.0 sec   559 MBytes   469 Mbits/sec

                [SUM] 20.0-30.0 sec  2.11 GBytes  1.82 Gbits/sec

                ...

                • 5. Re: SR-IOV VF performance and default bandwidth rate limiting
                  Patrick_Kutch

                  Glad you like the paper :-)

                   

                  You need to download the latest iproute2 tool to do the rate limiting and other cool stuff.

                   

                  Have you tried PF to PF performance testing?

                   

                  I suspect that it is more in the network stack than it is in the VF's and drivers.

                   

                  To get the best performance in for my demos and stuff, I turn off stuff like irqbalance and sometimes the network manager.  For example, if the interrupt assigned to a VF (or PF) is assigned to a core/package that is not on the same segment (I think it is called) that the PCIe connector is assigned to,then the interrupt must not just go from one package (CPU) to nother, but also across the QPI bus.  This has a big hit on performance.

                   

                  The latest generation kernel does some interesting things with performance that seems to have unwanted problems with the network performance.

                   

                  There is a script that in inside of the source for our drivers (I can't recall the name at the moment - and I'm away at class without access to my lab) that will disable a bunch of this stuff AND try to align the driver/interrupts with the correct cores/packages.  See if you can find this.

                   

                  - Patrick

                  1 of 1 people found this helpful
                  • 6. Re: SR-IOV VF performance and default bandwidth rate limiting
                    Vladimir

                    Hi Patrick,

                     

                    We've performed set of tests and got very interesting results:

                     

                    PF to PF without SR-IOV enabled      ~9.5 Gpbs

                     

                    PF to PF with SR-IOV enabled           ~5.0 Gpbs

                    we tried 32, 16 and 8 VFs per NIC - same results

                     

                    VF to VF                              ~ 5.0 Gpbs

                    Bonded VF to Bonded VF      ~3.5 Gpbs

                     

                    It was done under Ubuntu 11.04 with old iproute tools. The bonding was done like:

                            bond_mode 1

                            bond_fail_over_mac 1

                            bond_miimon 100

                            bond_primary fe1002

                            slaves fe1002 fe1003

                     

                    Any ideas what might be slowing it down?

                    • 7. Re: SR-IOV VF performance and default bandwidth rate limiting
                      Patrick_Kutch

                      That is interesting.  I'm leaning towards an interrupt alignment issue.  Have you tried running that script I mentioned, and/or turning off the irqbalance service?

                      • 8. Re: SR-IOV VF performance and default bandwidth rate limiting
                        alex@zadarastorage.com

                        Hi Patrick, we retesting this on Ubuntu-Precise kernel (3.2.0-29-generic #46) and the latest ixgbe driver (3.11.33):

                        PF to PF (SR-IOV enabled) and VF to PF – 9.5 GBs

                        VF to PF and VF to VF – 5.5 Gbs

                         

                        It appears that the limiting factor is the VF receiving queue. We see that the interrupt that is assigned to it, happens on a single CPU and this CPU gets 100% busy with handling softirqs. We tried to spread the IRQs between several CPUs using /proc/irq/XXX/smp_affinity and /sys/class/net/XXX/queues/rx-0/rpc_cpus (Receive Packet Steering), but still all the interrupts get handled on the same CPU. Is there a way to spread the interrupts for the receiving queue between several CPUs?

                         

                        That apart, we also did a bi-directional test: we saw that VF-to-VF in total gets to 10Gbs, but PF-to-PF gets in total 13-14 Gbs, which is strange. Is the 10Gbs network bandwidth uni-directional or bi-directional?

                         

                        Regarding the script: we tried it, but it is intended for multi-queue interfaces. When SR-IOV is enabled, each interface (VFs and PF) receives only one tx/rx queue pair (PF receives a singe TxRx queue), so it does not look relevant for the SR-IOV case.

                        Lastly, we disabled irqbalance, but didn't see any notable difference.

                         

                        Alex.

                         

                        P.S.: I will also reply to other threads that we are having in parallel with you:) Thank you for being so responsive.

                        • 9. Re: SR-IOV VF performance and default bandwidth rate limiting
                          Patrick_Kutch

                          Alex,

                           

                          What server are you using?  Chipset and processors (type and #)?

                           

                          thanx,

                           

                          Patrick

                          • 10. Re: SR-IOV VF performance and default bandwidth rate limiting
                            Vladimir

                            Patrick,

                             

                            In this particular environment all servers are

                                 Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.5.3 10/25/2010

                            with

                                 Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz

                             

                            24 vCPUs (dual socket, 6 cores, HT)

                             

                            Regards,

                            -Vladimir

                            • 11. Re: SR-IOV VF performance and default bandwidth rate limiting
                              Patrick_Kutch

                              Vlad (and Alex I presume),

                               

                              We have been scratching our heads abou tthis and our best guess is that this may be BIOS related.  The latest BIOS for the R510 is 1.11.0 from 8/20/12.  Can you try to update the BIOS and see what is happening?

                               

                              If that doesn't help, maybe do a dump to see which interrupts aer assigned to the VF and on which core/package that is assigned to.  If the traffic has to go across the QPI bus, there are some performance issues, so need to make sure the interrupts for the VF are on the same package where the NIC is.

                               

                              - Patrick

                               

                              • 12. Re: SR-IOV VF performance and default bandwidth rate limiting
                                Johansan

                                Patrik,

                                 

                                Me and my colleagues run in to the same performance problem with the PF functions when SR-IOV was enabled, but it seems like an upgrade of the ixgbe and ixgbevf driver seem to have solved the problem. We ran with:

                                 

                                ixgbe driver 3.18.7

                                ixgbevf driver 2.11.3

                                Enea Linux with kernel 3.10

                                 

                                and got the same results with and without SR-IOV enabled.

                                 

                                Regards,

                                - Johan