1 2 Previous Next 19 Replies Latest reply on Dec 14, 2017 8:06 PM by Intel Corporation

    x710 SR-IOV problems

    anmarin

      Hi all,

       

      I have following baseline:

       

      Dell R630 (2x14 core Xeon, 128GB RAM, 800GB SSD)

      x710 4-port NIC, in 10Gbit mode

      SUSE12SP1

      Latest NIC firmware but default PF/VF drivers (came with OS, v1,3,4)

      VF driver blacklisted on hypervisor

      Setup according to official Intel and Suse documentation, KVM hypervisor

       

      With test setup, single VM with single VF and untagged traffic, I could achieve basically line-rate numbers: with MTU 1500, there were about 770Kpps and BW of 9.4Gbps, achieved both for UDP and TCP traffic, with no packet drops. There is plenty of processing power, setup is nice and tidy and everything works as it should.

       

      Production setup is a bit different: VM is using 3 VFs, one for each PF (4th PF is not being used). All VFs except first one use untagged traffic. First VF is passing two types of traffic: first one untagged (VLAN 119) and second one tagged (VLAN 1108). Tagging is done inside VM. Setup worked fine for some time, confirming test setup numbers. However, after some time following errors started to appear in hypervisor logs:

       

      Mar  11 14:32:52 test_machine1 kernel: [10423.889924] i40e 0000:01:00.1: TX driver issue detected on VF 0

      Mar  11 14:32:52 test_machine1 kernel: [10423.889925] i40e 0000:01:00.1: Too many MDD events on VF 0, disabled

       

      And performance numbers became erratic: sometimes it worked perfectly, sometimes it did not. But most importantly, packet drops occured.

       

      So, I've reinstalled everything (hypevisor and VMs), configured exactly as before using automated tools, but upgraded PF and VF drivers to latest ones (v2.0.19/v2.0.16). Errors in logs disappeared, but issue persists. Now I have this in logs:

       

      2017-03-12T11:33:43.356014+01:00 test_machine1 kernel: [  420.439112] i40e 0000:01:00.1: Unable to add VLAN filter 0 for VF 0, error -22

      2017-03-12T11:33:43.376009+01:00 test_machine1 kernel: [  420.459168] i40e 0000:01:00.0: Unable to add VLAN filter 0 for VF 0, error -22

      2017-03-12T11:33:44.352009+01:00 test_machine1 kernel: [  421.435124] i40e 0000:01:00.2: Unable to add VLAN filter 0 for VF 0, error -22

       

      I've increased VM CPU count number, VF ring sizes, turnet off VF spoofcheck in hypervisor, VM linux software buffers, VM netdev.budget kernel parameter (amount of CPU time assinged for NIC processing) etc. but situation remains the same. Sometimes works perfectly, other time it does not.

       

      Can you please provide some insight? Since rx_dropped counter is increasing in VM, I am suspecting driver/VF issue.

      Is there a way to handle this problem, without switching to untagged traffic?

       

       

       

      Thank you in advance,

      Ante

        • 1. Re: x710 SR-IOV problems
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi Ante,

           What is the exact x710 4 ports NIC model? What is the exact driver version?

          Thanks,
          wb
           

          • 2. Re: x710 SR-IOV problems
            anmarin

            Hi,

             

            here you go; for PF:

             

            01:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                    Subsystem: Dell Ethernet 10G 4P X710 SFP+ rNDC

                    Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+

                    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                    Latency: 0, Cache Line Size: 32 bytes

                    Interrupt: pin A routed to IRQ 62

                    Region 0: Memory at 93000000 (64-bit, prefetchable) [size=16M]

                    Region 3: Memory at 94818000 (64-bit, prefetchable) [size=32K]

                    Expansion ROM at 94b00000 [disabled] [size=512K]

                    Capabilities: [40] Power Management version 3

                            Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)

                            Status: D0 NoSoftRst+ PME-Enable- DSel=8 DScale=1 PME-

                    Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+

                            Address: 0000000000000000  Data: 0000

                            Masking: 00000000  Pending: 00000000

                    Capabilities: [70] MSI-X: Enable+ Count=129 Masked-

                            Vector table: BAR=3 offset=00000000

                            PBA: BAR=3 offset=00001000

                    Capabilities: [a0] Express (v2) Endpoint, MSI 00

                            DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

                                    ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

                            DevCtl: Report errors: Correctable- Non-Fatal+ Fatal+ Unsupported+

                                    RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop- FLReset-

                                    MaxPayload 256 bytes, MaxReadReq 4096 bytes

                            DevSta: CorrErr+ UncorrErr- FatalErr- UnsuppReq+ AuxPwr- TransPend-

                            LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us

                                    ClockPM- Surprise- LLActRep- BwNot-

                            LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk+

                                    ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                            LnkSta: Speed 8GT/s, Width x8, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

                            DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

                            DevCtl2: Completion Timeout: 65ms to 210ms, TimeoutDis-, LTR-, OBFF Disabled

                            LnkCtl2: Target Link Speed: 2.5GT/s, EnterCompliance- SpeedDis-

                                     Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                                     Compliance De-emphasis: -6dB

                            LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+, EqualizationPhase1+

                                     EqualizationPhase2+, EqualizationPhase3+, LinkEqualizationRequest-

                    Capabilities: [e0] Vital Product Data

                            Product Name: X710 10GbE Controller

                            Read-only fields:

                                    [V0] Vendor specific: FFV17.5.12

                                    [PN] Part number: 68M95

                                    [MN] Manufacture ID: 31 30 32 38

                                    [V1] Vendor specific: DSV1028VPDR.VER2.0

                                    [V3] Vendor specific: DTINIC

                                    [V4] Vendor specific: DCM10010395C521010395C532010395C543010395C514020395C525020395C536020395C547020395C518030395C529030395C53A030395C54B030395C51C040395C52D040395C53E040395C54F040395C5

                                    [V5] Vendor specific: NPY4

                                    [V6] Vendor specific: PMT7

                                    [V7] Vendor specific: NMVIntel Corp

                                    [V8] Vendor specific: L1D0

                                    [RV] Reserved: checksum good, 4 byte(s) reserved

                            Read/write fields:

                                    [Y1] System specific: CCF1\x00

                            End

                    Capabilities: [100 v2] Advanced Error Reporting

                            UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                            UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt+ UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                            UESvrt: DLP+ SDES+ TLP+ FCP+ CmpltTO+ CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC+ UnsupReq- ACSViol-

                            CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+

                            CEMsk:  RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ NonFatalErr+

                            AERCap: First Error Pointer: 00, GenCap+ CGenEn+ ChkCap+ ChkEn+

                    Capabilities: [140 v1] Device Serial Number bc-54-21-ff-ff-96-6e-24

                    Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)

                            ARICap: MFVC- ACS-, Next Function: 1

                            ARICtl: MFVC- ACS-, Function Group: 0

                    Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)

                            IOVCap: Migration-, Interrupt Message Number: 000

                            IOVCtl: Enable+ Migration- Interrupt- MSE+ ARIHierarchy+

                            IOVSta: Migration-

                            Initial VFs: 32, Total VFs: 32, Number of VFs: 1, Function Dependency Link: 00

                            VF offset: 16, stride: 1, Device ID: 154c

                            Supported Page Size: 00000553, System Page Size: 00000001

                            Region 0: Memory at 0000000094600000 (64-bit, prefetchable)

                            Region 3: Memory at 00000000949a0000 (64-bit, prefetchable)

                            VF Migration: offset: 00000000, BIR: 0

                    Capabilities: [1a0 v1] Transaction Processing Hints

                            Device specific mode supported

                            No steering table available

                    Capabilities: [1b0 v1] Access Control Services

                            ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                            ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                    Capabilities: [1d0 v1] #19

                    Kernel driver in use: i40e

                    Kernel modules: i40e

             

            and for VF:

             

            01:02.0 Ethernet controller: Intel Corporation XL710/X710 Virtual Function (rev 01)

                    Subsystem: Dell Device 0000

                    Control: I/O- Mem- BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

                    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                    Latency: 0

                    Region 0: [virtual] Memory at 94600000 (64-bit, prefetchable) [size=64K]

                    Region 3: [virtual] Memory at 949a0000 (64-bit, prefetchable) [size=16K]

                    Capabilities: [70] MSI-X: Enable+ Count=5 Masked-

                            Vector table: BAR=3 offset=00000000

                            PBA: BAR=3 offset=00002000

                    Capabilities: [a0] Express (v2) Endpoint, MSI 00

                            DevCap: MaxPayload 2048 bytes, PhantFunc 0, Latency L0s <512ns, L1 <64us

                                    ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+

                            DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                                    RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop- FLReset-

                                    MaxPayload 128 bytes, MaxReadReq 128 bytes

                            DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr- TransPend-

                            LnkCap: Port #0, Speed 8GT/s, Width x8, ASPM L1, Exit Latency L0s <2us, L1 <16us

                                    ClockPM- Surprise- LLActRep- BwNot-

                            LnkCtl: ASPM Disabled; RCB 64 bytes Disabled- CommClk-

                                    ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                            LnkSta: Speed unknown, Width x0, TrErr- Train- SlotClk- DLActive- BWMgmt- ABWMgmt-

                            DevCap2: Completion Timeout: Range ABCD, TimeoutDis+, LTR-, OBFF Not Supported

                            DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis-, LTR-, OBFF Disabled

                            LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete-, EqualizationPhase1-

                                     EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

                    Capabilities: [100 v2] Advanced Error Reporting

                            UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                            UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                            UESvrt: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-

                            CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

                            CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr-

                            AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-

                    Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)

                            ARICap: MFVC- ACS-, Next Function: 0

                            ARICtl: MFVC- ACS-, Function Group: 0

                    Capabilities: [1a0 v1] Transaction Processing Hints

                            Device specific mode supported

                            No steering table available

                    Capabilities: [1d0 v1] Access Control Services

                            ACSCap: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                            ACSCtl: SrcValid- TransBlk- ReqRedir- CmpltRedir- UpstreamFwd- EgressCtrl- DirectTrans-

                    Kernel driver in use: vfio-pci

                    Kernel modules: i40evf

             

            Driver versions:

             

            PF:

            test_machine1:~ # ethtool -i eth0

            driver: i40e

            version: 2.0.19

            firmware-version: 5.05 0x800028a3 17.5.12

            expansion-rom-version:

            bus-info: 0000:01:00.0

            supports-statistics: yes

            supports-test: yes

            supports-eeprom-access: yes

            supports-register-dump: yes

            supports-priv-flags: yes

             

            VF:

            test_machine_vm1:~ # ethtool -i eth0

            driver: i40evf

            version: 2.0.16

            firmware-version: N/A

            bus-info: 0000:00:03.0

            supports-statistics: yes

            supports-test: no

            supports-eeprom-access: no

            supports-register-dump: no

            supports-priv-flags: no

             

            Br,

            Ante

            • 3. Re: x710 SR-IOV problems
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hi Ante,

               Thank you for the information just to double check if this is an onboard NIC on Dell system?

              Thanks,
              wb
               

              • 4. Re: x710 SR-IOV problems
                anmarin

                Hi,

                 

                as can be seen from information provided previously, it is rNDC ie add-on card attached to motherboard:

                http://i.dell.com/sites/doccontent/business/large-business/en/Documents/Intel-X710-Quad-Port-10-GbE-SFP-DA-rNDC-CNA.pdf

                 

                No other NICs are present, no swaps, upgrades, HW modifications etc. This is baseline R630 system.

                 

                Br,

                Ante

                • 5. Re: x710 SR-IOV problems
                  anmarin

                  Hi,

                   

                  do you have any update or need any additional information?

                   

                  BR,

                  Ante

                  • 6. Re: x710 SR-IOV problems
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hi Ante, we're still checking your issue. On the other hand, you may also report this issue to Dell for further assistance.


                    regards,
                    Vince

                    • 7. Re: x710 SR-IOV problems
                      anmarin

                      Hi Vince,

                       

                      Dell is trailing latest driver version by couple of revisions, not sure how much of support is possible to get there. If I downgrade driver, then I have MDD events, and I am back at the beginning, kind of a loop situation.

                      Let me know if you figure something out.

                       

                      BR,

                      Ante

                      • 8. Re: x710 SR-IOV problems
                        Intel Corporation
                        This message was posted on behalf of Intel Corporation

                        Hi Ante,

                          Further checking Dell is the best route where you can contact as they sometime adjusts the hardware or firmware of the card which  the information is known by them only. Hope this clarifies.

                        Thanks,
                        wb
                         

                        • 9. Re: x710 SR-IOV problems
                          anmarin

                          Hi,

                           

                          as mentioned previously, I am not using Dell drivers, but ones sourced from Intel.

                          Not sure if ping-pong game is going to help, is Intel not a maker of drivers? Who is then supposed to know the most about issue I face?

                           

                          Can you please provide info what those error messages mean, and if there is a workaround?

                           

                          Thanks,

                          Ante

                          • 10. Re: x710 SR-IOV problems
                            Intel Corporation
                            This message was posted on behalf of Intel Corporation

                            Hi Ante,

                             Thank you for the reply. This is Dell OEM card thus it is recommend you contact Dell for the customized driver.

                            Rgds,
                            wb
                             

                            • 11. Re: x710 SR-IOV problems
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              Hi Ante,

                                As this is Dell OEM card, the only thing we have available for this device is our generic driver. But we don't guarantee our driver will work.

                               
                                 Dell sometime adjusts the hardware or firmware of the card and that information is not know or tracked by us. Thus it is recommended to contact Dell support directly. Please feel free to update me if other assistance needed.

                              Thanks,
                              wb

                              • 12. Re: x710 SR-IOV problems
                                shivrao

                                I am using X710 on a UCS server and i see the sam issue.

                                Is there a response from engineering on this?

                                • 13. Re: x710 SR-IOV problems
                                  Intel Corporation
                                  This message was posted on behalf of Intel Corporation

                                  Hi Shivrao,

                                   Thank you for posting in Wired Communities. Just to double check is your X710 an OEM version from Dell? Can you share more information about your NIC.

                                  Thanks,
                                  Sharon
                                   

                                  • 14. Re: x710 SR-IOV problems
                                    shivrao

                                    No This is not from Dell.

                                    This is a cisco UCS server and the NIC is purchased from Intel.

                                     

                                    81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                                    81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                                    81:00.2 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                                    81:00.3 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                                     

                                     

                                    # ethtool -i enp129s0f0

                                    driver: i40e

                                    version: 1.6.27-k

                                    firmware-version: 5.04 0x80002542 0.385.7

                                    expansion-rom-version:

                                    bus-info: 0000:81:00.0

                                     

                                    I am attaching the logs.

                                    The trigger is to enable SRIOV and put all VFs into trust mode

                                    1 2 Previous Next