8 Replies Latest reply on Sep 18, 2015 2:32 AM by Sandy_Intel

    NIC-rich but CPU-poor?

    zperry

      Recently we got a few new servers. All have identical configuration. Each has dual E5-2620v3 2.4Ghz CPUs, 128GiB RAM (8 x 16GiB DDR4 DIMMs), 1 dual-40G XL710, and two dual 10G SPF+ mezz cards (i.e. 4 x 10G SPF+ ports). All of them run CentOS 7.1 x86_64.  These XL710s are connected to the 40G ports of QCT LY8 switches using genuine Intel QSFP+ DACs.  All 10G SPF+ ports are connected to Arista 7280SE-68 switches, but using third party DACs.  All systems have been so far minimally tuned:

      • In each's BIOS, the pre-defined "High Performance" profile is selected, furthermore, Intel I/OAT is enabled, VT-d is disabled (We don't need them to run virtual machines, they are for HPC applications).
      • In each's CentOS, the tuned-adm active is set to network-throughput.

       

      After the servers have been setup, we have been using iperf3 to run long-running tests among such servers. So far, we have observed consistent packet drops on the receiving side.  An example:


      [root@sc2u1n0 ~]# netstat -i

      Kernel Interface table

      Iface      MTU    RX-OK RX-ERR RX-DRP RX-OVR    TX-OK TX-ERR TX-DRP TX-OVR Flg

      ens10f0  9000 236406987      0      0 0      247785514      0      0      0 BMRU

      ens1f0    9000 363116387      0  2391 0      2370529766      0      0      0 BMRU

      ens1f1    9000 382484140      0  2248 0      2098335636      0      0      0 BMRU

      ens20f0  9000 565532361      0  2258 0      1472188440      0      0      0 BMRU

      ens20f1  9000 519587804      0  4225 0      5471601950      0      0      0 BMRU

      lo      65536 19058603      0      0 0      19058603      0      0      0 LRU



      We have also observed iperf3 retries at the beginning of a test session and often, during a session (not as often however).  Two examples:


      40G pairs:


      $ iperf3 -c 192.168.11.100  -i 1 -t 10

      Connecting to host 192.168.11.100, port 5201

      [  4] local 192.168.11.103 port 59351 connected to 192.168.11.100 port 5201

      [ ID] Interval          Transfer    Bandwidth      Retr  Cwnd

      [  4]  0.00-1.00  sec  2.77 GBytes  23.8 Gbits/sec  54    655 KBytes  

      [  4]  1.00-2.00  sec  4.26 GBytes  36.6 Gbits/sec    0  1.52 MBytes  

      [  4]  2.00-3.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.12 MBytes  

      [  4]  3.00-4.00  sec  4.53 GBytes  38.9 Gbits/sec    0  2.57 MBytes  

      [  4]  4.00-5.00  sec  4.00 GBytes  34.4 Gbits/sec    7  1.42 MBytes  

      [  4]  5.00-6.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.01 MBytes  

      [  4]  6.00-7.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.47 MBytes  

      [  4]  7.00-8.00  sec  4.61 GBytes  39.6 Gbits/sec    0  2.88 MBytes  

      [  4]  8.00-9.00  sec  4.61 GBytes  39.6 Gbits/sec    0  3.21 MBytes  

      [  4]  9.00-10.00  sec  4.61 GBytes  39.6 Gbits/sec    0  3.52 MBytes  

      - - - - - - - - - - - - - - - - - - - - - - - - -

      [ ID] Interval          Transfer    Bandwidth      Retr

      [  4]  0.00-10.00  sec  43.2 GBytes  37.1 Gbits/sec  61            sender

      [  4]  0.00-10.00  sec  43.2 GBytes  37.1 Gbits/sec                  receiver

       

      82599 powered 10G pairs:

       

      $ iperf3 -c 192.168.15.100 -i 1 -t 10

      Connecting to host 192.168.15.100, port 5201

      [  4] local 192.168.16.101 port 53464 connected to 192.168.15.100 port 5201

      [ ID] Interval          Transfer    Bandwidth      Retr  Cwnd

      [  4]  0.00-1.00  sec  1.05 GBytes  9.05 Gbits/sec  722  1.97 MBytes  

      [  4]  1.00-2.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.80 MBytes  

      [  4]  2.00-3.00  sec  1.10 GBytes  9.42 Gbits/sec  23  2.15 MBytes  

      [  4]  3.00-4.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.16 MBytes  

      [  4]  4.00-5.00  sec  1.09 GBytes  9.41 Gbits/sec    0  2.16 MBytes  

      [  4]  5.00-6.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.17 MBytes  

      [  4]  6.00-7.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.18 MBytes  

      [  4]  7.00-8.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.22 MBytes  

      [  4]  8.00-9.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.27 MBytes  

      [  4]  9.00-10.00  sec  1.10 GBytes  9.42 Gbits/sec    0  2.34 MBytes  

      - - - - - - - - - - - - - - - - - - - - - - - - -

      [ ID] Interval          Transfer    Bandwidth      Retr

      [  4]  0.00-10.00  sec  10.9 GBytes  9.38 Gbits/sec  745            sender

      [  4]  0.00-10.00  sec  10.9 GBytes  9.37 Gbits/sec                  receiver


      Looking around, I ran into a 40G NIC Tuning article on the DOE Energy Science Network fast data site, quoted "At the present time (February 2015), CPU clock rate still matters a lot for 40G hosts.  In general, higher CPU clock rate is far more important than high core count for a 40G host.  In general, you can expect it to be very difficult to achieve 40G performance with a CPU that runs more slowly than 3GHz per core.We don't have such fast CPUs The E5-2620v3 is a mid-range CPU from the Basic category, not even the Performance category. So,

      • Are our servers too rich in NICs, but under-powered CPU-wise? 
      • Is there anything that we can do to get these servers to behave at least reasonably?  Especially, not dropping packets?


      BTW, a few days ago we updated all servers with the most recent Intel stable i40e and ixgbe drivers, but we have not run the set_irq_affinity CPU yet. Neither we have tuned the NIC (e.g. adjusting rx-usecs value etc). The reason is because each server runs two highly concurrent applications which tend to use all the cores. We are afraid that to use the set_irq_affinity script, we may negatively impact the performance of our applications. But if Intel folks consider running the script beneficial, we are willing to try.

       

      Regards,

       

      -- Zack

        • 1. Re: NIC-rich but CPU-poor?
          Sandy_Intel

          Hi Zack,

           

          Thank you for contacting Intel.  I'll check on this and will update you accordingly.

           

          Sincerely,

          Sandy

          • 2. Re: NIC-rich but CPU-poor?
            zperry

            Hi Sandy

             

            Any updates?

             

            Thanks!

             

            -- Zack

            • 3. Re: NIC-rich but CPU-poor?
              Sandy_Intel

              Hi Zack,

               

              We are checking this with your system setup.  Thank you for your patience and understanding.

               

              Sincerely,

               

              Sandy

              • 4. Re: NIC-rich but CPU-poor?
                Sandy_Intel

                Hi Zack,

                 

                Good day.

                 

                We would like to request for your adapter details.  Please run  ethtool -I and post the output here. 

                 

                Please try to set the port to 10Gbps and see the test results.

                 

                We look forward to your reply.

                 

                Sincerely,

                 

                Sandy

                • 5. Re: NIC-rich but CPU-poor?
                  zperry

                  Hi Sandy

                   

                  Please see below.  Should you need any info, please just ask.  All eight servers use the same configuration. Thus, even the following was obtained on just one node. It's representative. 

                   

                  ens10f0 is the 40G port (one of the two of XL710)

                  The rest are 10G port (82599)

                   

                  Regards,

                   

                  -- Zack

                   

                  [root@sc2u0n0 ~]# for if in ens10f0 ens20f{0..1} ens1f{0..1}

                  > do

                  > ethtool -i $if

                  > echo "ethtool -i for $if done..."

                  > done

                  driver: i40e

                  version: 1.3.38

                  firmware-version: 4.24 0x800013fc 0.0.0

                  bus-info: 0000:02:00.0

                  supports-statistics: yes

                  supports-test: yes

                  supports-eeprom-access: yes

                  supports-register-dump: yes

                  supports-priv-flags: yes

                  ethtool -i for ens10f0 done...

                  driver: ixgbe

                  version: 4.1.2

                  firmware-version: 0x800004e0, 1.808.0

                  bus-info: 0000:01:00.0

                  supports-statistics: yes

                  supports-test: yes

                  supports-eeprom-access: yes

                  supports-register-dump: yes

                  supports-priv-flags: no

                  ethtool -i for ens20f0 done...

                  driver: ixgbe

                  version: 4.1.2

                  firmware-version: 0x800004e0, 1.808.0

                  bus-info: 0000:01:00.1

                  supports-statistics: yes

                  supports-test: yes

                  supports-eeprom-access: yes

                  supports-register-dump: yes

                  supports-priv-flags: no

                  ethtool -i for ens20f1 done...

                  driver: ixgbe

                  version: 4.1.2

                  firmware-version: 0x80000646, 1.446.0

                  bus-info: 0000:07:00.0

                  supports-statistics: yes

                  supports-test: yes

                  supports-eeprom-access: yes

                  supports-register-dump: yes

                  supports-priv-flags: no

                  ethtool -i for ens1f0 done...

                  driver: ixgbe

                  version: 4.1.2

                  firmware-version: 0x80000646, 1.446.0

                  bus-info: 0000:07:00.1

                  supports-statistics: yes

                  supports-test: yes

                  supports-eeprom-access: yes

                  supports-register-dump: yes

                  supports-priv-flags: no

                  ethtool -i for ens1f1 done...

                  [root@sc2u0n0 ~]#

                  • 6. Re: NIC-rich but CPU-poor?
                    zperry

                    Hi Sandy

                     

                    > Please try to set the port to 10Gbps and see the test results.

                     

                    Quick follow-up:

                     

                    I have a script that enables me to check the speed of all interfaces.  We have 40 of them   The script calls ethtool iface remotely over ssh and extract the Speed line. Since all servers were online, all interfaces have been running at their rated speed.  We don't have a need to manually force anyone of to run at rated speed thus far.

                     

                    $ ./iface_speed.sh |grep Speed

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 40000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                      Speed: 10000Mb/s

                    $ ./iface_speed.sh |grep Speed|wc -l

                    40


                    Regards,


                    -- Zack

                    • 7. Re: NIC-rich but CPU-poor?
                      Sandy_Intel

                      Hi Zack,

                       

                      Sorry for my late reply.  Thanks for coming back with the details. I am checking on this.

                       

                      Sincerely,

                       

                      Sandy

                      • 8. Re: NIC-rich but CPU-poor?
                        Sandy_Intel

                        Hi Zack,

                         

                        Please refer to website below for some suggested settings to improve system's performance.

                        Network Connectivity — Tuning Intel® Ethernet Adapter Throughput Performance

                        - scroll down to Related Topics, then click on "Performance Tuning for 10 Gigabit Ethernet adapters in Linux*"

                         

                        Hope this help improve the your systems' overall performance.

                         

                        Sincerely,

                         

                        Sandy