7 Replies Latest reply on Nov 8, 2012 7:44 AM by Patrick_Kutch

    Why does the Intel 82576 require MSI-X for SR-IOV

    isfort

      To all SR-IOV experts,

       

      I want to use the SR-IOV functionality of Intel's PCI Express 82576 Ethernet NIC on a platform which does not support MSI-X interrupts but only MSI and legacy interrupts types of PCI.

       

      Why are both igb and igbvf drivers requiring MSI-X for I/O virtualization?

       

      In the PCI Express "Single Root I/O Virtualization and Sharing Specification" from PCI-SIG is mentioned, that MSI should be sufficient to use SR-IOV in general.

       

      Any reason Intel decided to only use MSI-X?

       

      Thanks,

       

      Mr. Isfort


        • 1. Re: Why does the Intel 82576 require MSI-X for SR-IOV
          Patrick_Kutch

          Thanx for visiting the forums.

           

          The Intel SR-IOV Solution requires MSI-X due to the way we chose to architect it.  Each VF has it's own set of dedicated resources in hardware, including Tx/Rx Queue's, descriptors and interrupts.  Each VF had 3 interrupts assigned to it, for Rx, Tx and for the PF/VF communication.  On a 4 port Intel 82576 that could be 4(ports) * (7VFs)*3 =84 interrupts.

           

          If you use one of the Intel 10Gb Ethernet devices, 2 ports would result in up to 378 interrupt vectors.  Too many for standard MSI interrupts.  If they were to all share the same interrupt - then performance would be severly hampared.

           

          I hope that answers your question.

           

          I am extremely surprised that you have found a server that supports SR-IOV, but not MSI-X interrupts, I have never encountered such a system - would you mind sharing which server and model you have for future reference?

           

          thanx,

           

          Patrick

          • 2. Re: Why does the Intel 82576 require MSI-X for SR-IOV
            isfort

            Thank you Patrick for the insightful answer. I think this perfectly closes the topic.

             

            I try to use the Intel 82576EB with a non-Intel Embedded Platform and if you are interested, I may send you further details on a private channel.

             

            For my research it'll be sufficient to have just two virtual functions at one physical port which shouldn't be a problem with MSI interrupts.

             

            As this is a pure software related problem then, i'll try to retrofit MSI SR-IOV usage to the igb and igbvf drivers. You wouldn't have an idea if that is feasible would you?

             

            Greetings,

             

            Mr. Isfort

            • 3. Re: Why does the Intel 82576 require MSI-X for SR-IOV
              Patrick_Kutch

              Sounds interesting.  Always like to hear what interesting things SR-IOV is used for!

               

              I have had a customer or two re-work our drivers, which are of course Open Source to use MSI.  I think they made them polling too rather than interrupt driven - though has been a while and I don't recall for sure.

               

              The only guidance I can give is my SR-IOV Toolkit, which included a driver companion.

               

              http://communities.intel.com/community/wired/blog/2010/06/09/announcing-the-intel-ethernet-sr-iov-toolkit-v11

               

              Download that and the latest source, then go have some fun.  Would love to hear from you via private message capability of the blog when you are done and learn about what you are doing.

               

              Best of luck!

               

              - Patrick

              • 4. Re: Why does the Intel 82576 require MSI-X for SR-IOV
                isfort

                Dear Patrick,

                 

                I still have one last question regarding the limitations of MSI in SR-IOV in general.

                 

                You said that the accumulated interrupt count is too much for MSI, e.g. 4(ports) * (7VFs)*3 =84 interrupts.

                 

                However, each PF and each VF has its own unique MSI Capability or MSI-X Capability (according to the SR-IOV Specification 1.1) and therefore each may request for a maximum of 32 interrupts. 32 interrupt vectors per VF should be more than sufficient for the 82576. What I don't understand is now why 2048 interrupt vectors would be a requirement for each VF.

                 

                Is there maybe another hardware/software limitation that might prevent the use of MSI like the Interrupt Controller or Intel VT-d?

                 

                Or is it simply that Intel just decided to use MSI-X without any incompatibility reasons in mind but something else?

                 

                Thanks again and with kind regards,

                 

                Mr. Isfort

                • 5. Re: Why does the Intel 82576 require MSI-X for SR-IOV
                  Patrick_Kutch

                  There is no limitation, it is just the way we chose to implement the functionality.  In speaking with the engineer who architect and wrote the drivers, it was simply an easier and cleaner solution to implement.

                  • 6. Re: Why does the Intel 82576 require MSI-X for SR-IOV
                    arichter

                    Hi Patrick,

                     

                    I seem to have such a board that works with SR-IOV but _seems_ to not support MSI-X (albeit i'm not 100% sure about this).

                     

                    It's the Intel DQ77MK. I have to use the pci=assign-busses kernel parameter to enable the virtual functions, so it seems that the BIOS isn't written for SR-IOV?

                     

                    I use the 82576 NIC, and lspci tells me that the bridge the NIC is connected to doesn't have MSI-X capabilities:

                     

                    00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode])

                            Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-

                            Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-

                            Latency: 0, Cache Line Size: 64 bytes

                            Bus: primary=00, secondary=03, subordinate=03, sec-latency=0

                            I/O behind bridge: 0000f000-00000fff

                            Memory behind bridge: fff00000-000fffff

                            Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff

                            Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-

                            BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-

                                    PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-

                            Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00

                                    DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us

                                            ExtTag- RBE+ FLReset-

                                    DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-

                                            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-

                                            MaxPayload 128 bytes, MaxReadReq 128 bytes

                                    DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-

                                    LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <16us

                                            ClockPM- Surprise- LLActRep+ BwNot-

                                    LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-

                                            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-

                                    LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-

                                    SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-

                                            Slot #0, PowerLimit 25.000W; Interlock- NoCompl+

                                    SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-

                                            Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-

                                    SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-

                                            Changed: MRL- PresDet- LinkState-

                                    RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-

                                    RootCap: CRSVisible-

                                    RootSta: PME ReqID 0000, PMEStatus- PMEPending-

                                    DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd-

                                    DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-

                                    LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB

                                             Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-

                                             Compliance De-emphasis: -6dB

                                    LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-

                                             EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-

                            Capabilities: [80] MSI: Enable- Count=1/1 Maskable- 64bit-

                                    Address: 00000000  Data: 0000

                            Capabilities: [90] Subsystem: Intel Corporation Device 2035

                            Capabilities: [a0] Power Management version 2

                                    Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)

                                    Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-

                            Kernel driver in use: pcieport

                            Kernel modules: shpchp

                     

                    However, the linux kernel (xen and non xen alike) tells me this for the physical functions:

                     

                    [    1.652766] igb: Intel(R) Gigabit Ethernet Network Driver - version 4.0.1-k

                    [    1.652770] igb: Copyright (c) 2007-2012 Intel Corporation.

                    [    1.653111] igb 0000:01:00.0: >irq 45 for MSI/MSI-X

                    [    1.653118] igb 0000:01:00.0: >irq 46 for MSI/MSI-X

                    [    1.653124] igb 0000:01:00.0: >irq 47 for MSI/MSI-X

                    [    1.757217] igb 0000:01:00.0: >7 VFs allocated

                    [    1.947724] igb 0000:01:00.0: >Intel(R) Gigabit Ethernet Network Connection

                    [    1.947809] igb 0000:01:00.0: >Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)

                     

                    That doesn't make sense given the lspci output above.

                    Furthermore, if I use xen and attach the physical function an HVM linux guest, qemu tells me MSI-X is used for the pass-through.

                    However, if I pass-through a VF, it tells me only legacy interrupts are available.

                    It's exactly the same as reported in this thread: [Xen-devel] SR-IOV problems - HVM cannot access network - Xen Source

                    with the only problem that the suggested solution (acpi=0) does not work for me.

                     

                    I'm confused by all the mixed signals I'm receiving from my setup.

                    Is the Q77 MSI-X capable or is it not? I was not able to find an answer yet.

                     

                    Best regards,

                    Andre

                    • 7. Re: Why does the Intel 82576 require MSI-X for SR-IOV
                      Patrick_Kutch

                      Hi,

                       

                      The Intel DQ77MK is a desktop board.  SR-IOV is classified as a server technology.  It is my guess that the BIOS on the Intel DQ77MK does not support SR-IOV.

                       

                      The BIOS must support both VT-D (which it does), and it must also support SR-IOV.  In most servers you only actually see VT-D option in BIOS, you enable VT-D and you get SR-IOV by default.  Hoever on a client system they seem to have a VT-D option, but it does not support SR-IOV.

                       

                      Not being an expert in either the Desktop or Server boards themselves, I am not 100% sure about this, however I am 92.3% confident I am likely correct. :-)

                       

                      Hopefully somebody from the desktop group will see this post and reply.

                       

                      -Patrick