To all SR-IOV experts,
I want to use the SR-IOV functionality of Intel's PCI Express 82576 Ethernet NIC on a platform which does not support MSI-X interrupts but only MSI and legacy interrupts types of PCI.
Why are both igb and igbvf drivers requiring MSI-X for I/O virtualization?
In the PCI Express "Single Root I/O Virtualization and Sharing Specification" from PCI-SIG is mentioned, that MSI should be sufficient to use SR-IOV in general.
Any reason Intel decided to only use MSI-X?
Thanx for visiting the forums.
The Intel SR-IOV Solution requires MSI-X due to the way we chose to architect it. Each VF has it's own set of dedicated resources in hardware, including Tx/Rx Queue's, descriptors and interrupts. Each VF had 3 interrupts assigned to it, for Rx, Tx and for the PF/VF communication. On a 4 port Intel 82576 that could be 4(ports) * (7VFs)*3 =84 interrupts.
If you use one of the Intel 10Gb Ethernet devices, 2 ports would result in up to 378 interrupt vectors. Too many for standard MSI interrupts. If they were to all share the same interrupt - then performance would be severly hampared.
I hope that answers your question.
I am extremely surprised that you have found a server that supports SR-IOV, but not MSI-X interrupts, I have never encountered such a system - would you mind sharing which server and model you have for future reference?
Thank you Patrick for the insightful answer. I think this perfectly closes the topic.
I try to use the Intel 82576EB with a non-Intel Embedded Platform and if you are interested, I may send you further details on a private channel.
For my research it'll be sufficient to have just two virtual functions at one physical port which shouldn't be a problem with MSI interrupts.
As this is a pure software related problem then, i'll try to retrofit MSI SR-IOV usage to the igb and igbvf drivers. You wouldn't have an idea if that is feasible would you?
Sounds interesting. Always like to hear what interesting things SR-IOV is used for!
I have had a customer or two re-work our drivers, which are of course Open Source to use MSI. I think they made them polling too rather than interrupt driven - though has been a while and I don't recall for sure.
The only guidance I can give is my SR-IOV Toolkit, which included a driver companion.
Download that and the latest source, then go have some fun. Would love to hear from you via private message capability of the blog when you are done and learn about what you are doing.
Best of luck!
I still have one last question regarding the limitations of MSI in SR-IOV in general.
You said that the accumulated interrupt count is too much for MSI, e.g. 4(ports) * (7VFs)*3 =84 interrupts.
However, each PF and each VF has its own unique MSI Capability or MSI-X Capability (according to the SR-IOV Specification 1.1) and therefore each may request for a maximum of 32 interrupts. 32 interrupt vectors per VF should be more than sufficient for the 82576. What I don't understand is now why 2048 interrupt vectors would be a requirement for each VF.
Is there maybe another hardware/software limitation that might prevent the use of MSI like the Interrupt Controller or Intel VT-d?
Or is it simply that Intel just decided to use MSI-X without any incompatibility reasons in mind but something else?
Thanks again and with kind regards,
I seem to have such a board that works with SR-IOV but _seems_ to not support MSI-X (albeit i'm not 100% sure about this).
It's the Intel DQ77MK. I have to use the pci=assign-busses kernel parameter to enable the virtual functions, so it seems that the BIOS isn't written for SR-IOV?
I use the 82576 NIC, and lspci tells me that the bridge the NIC is connected to doesn't have MSI-X capabilities:
00:1c.0 PCI bridge: Intel Corporation 7 Series/C210 Series Chipset Family PCI Express Root Port 1 (rev c4) (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx-
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Bus: primary=00, secondary=03, subordinate=03, sec-latency=0
I/O behind bridge: 0000f000-00000fff
Memory behind bridge: fff00000-000fffff
Prefetchable memory behind bridge: 00000000fff00000-00000000000fffff
Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort+ <SERR- <PERR-
BridgeCtl: Parity- SERR- NoISA- VGA- MAbort- >Reset- FastB2B-
PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
Capabilities:  Express (v2) Root Port (Slot+), MSI 00
DevCap: MaxPayload 128 bytes, PhantFunc 0, Latency L0s <64ns, L1 <1us
ExtTag- RBE+ FLReset-
DevCtl: Report errors: Correctable- Non-Fatal- Fatal- Unsupported-
RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
MaxPayload 128 bytes, MaxReadReq 128 bytes
DevSta: CorrErr- UncorrErr- FatalErr- UnsuppReq- AuxPwr+ TransPend-
LnkCap: Port #1, Speed 5GT/s, Width x4, ASPM L0s L1, Latency L0 <1us, L1 <16us
ClockPM- Surprise- LLActRep+ BwNot-
LnkCtl: ASPM L0s L1 Enabled; RCB 64 bytes Disabled- Retrain- CommClk-
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 2.5GT/s, Width x0, TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
Slot #0, PowerLimit 25.000W; Interlock- NoCompl+
SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet- Interlock-
Changed: MRL- PresDet- LinkState-
RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna- CRSVisible-
RootSta: PME ReqID 0000, PMEStatus- PMEPending-
DevCap2: Completion Timeout: Range BC, TimeoutDis+ ARIFwd-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- ARIFwd-
LnkCtl2: Target Link Speed: 5GT/s, EnterCompliance- SpeedDis-, Selectable De-emphasis: -6dB
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete-, EqualizationPhase1-
EqualizationPhase2-, EqualizationPhase3-, LinkEqualizationRequest-
Capabilities:  MSI: Enable- Count=1/1 Maskable- 64bit-
Address: 00000000 Data: 0000
Capabilities:  Subsystem: Intel Corporation Device 2035
Capabilities: [a0] Power Management version 2
Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
Kernel driver in use: pcieport
Kernel modules: shpchp
However, the linux kernel (xen and non xen alike) tells me this for the physical functions:
[ 1.652766] igb: Intel(R) Gigabit Ethernet Network Driver - version 4.0.1-k
[ 1.652770] igb: Copyright (c) 2007-2012 Intel Corporation.
[ 1.653111] igb 0000:01:00.0: >irq 45 for MSI/MSI-X
[ 1.653118] igb 0000:01:00.0: >irq 46 for MSI/MSI-X
[ 1.653124] igb 0000:01:00.0: >irq 47 for MSI/MSI-X
[ 1.757217] igb 0000:01:00.0: >7 VFs allocated
[ 1.947724] igb 0000:01:00.0: >Intel(R) Gigabit Ethernet Network Connection
[ 1.947809] igb 0000:01:00.0: >Using MSI-X interrupts. 2 rx queue(s), 1 tx queue(s)
That doesn't make sense given the lspci output above.
Furthermore, if I use xen and attach the physical function an HVM linux guest, qemu tells me MSI-X is used for the pass-through.
However, if I pass-through a VF, it tells me only legacy interrupts are available.
It's exactly the same as reported in this thread: [Xen-devel] SR-IOV problems - HVM cannot access network - Xen Source
with the only problem that the suggested solution (acpi=0) does not work for me.
I'm confused by all the mixed signals I'm receiving from my setup.
Is the Q77 MSI-X capable or is it not? I was not able to find an answer yet.
The Intel DQ77MK is a desktop board. SR-IOV is classified as a server technology. It is my guess that the BIOS on the Intel DQ77MK does not support SR-IOV.
The BIOS must support both VT-D (which it does), and it must also support SR-IOV. In most servers you only actually see VT-D option in BIOS, you enable VT-D and you get SR-IOV by default. Hoever on a client system they seem to have a VT-D option, but it does not support SR-IOV.
Not being an expert in either the Desktop or Server boards themselves, I am not 100% sure about this, however I am 92.3% confident I am likely correct. :-)
Hopefully somebody from the desktop group will see this post and reply.