1 2 Previous Next 18 Replies Latest reply on Dec 19, 2012 12:14 PM by alex@zadarastorage.com

    Interrupt not assigned to a VF attached to a KVM instance

    alex@zadarastorage.com

      Greetings all,

      we are using stock ubuntu natty, kernel 2.6.38-8 with KVM 0.14.0 and libvirt 0.8.8. We are attaching VFs of Intel NIC 82599 to instances created by KVM.

       

      Usually, when KVM process starts with VFs attached, we see the following prints in kern.log on the physical machine (for each VF):

      Mar 24 14:30:52 ashzadapp08p kernel: [158360.469082] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X
      Mar 24 14:30:52 ashzadapp08p kernel: [158360.469093] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X
      Mar 24 14:30:52 ashzadapp08p kernel: [158360.469102] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X
      Mar 24 14:30:53 ashzadapp08p kernel: [158360.709013] pci-stub 0000:07:12.3: irq 152 for MSI/MSI-X
      Mar 24 14:30:53 ashzadapp08p kernel: [158360.709024] pci-stub 0000:07:12.3: irq 153 for MSI/MSI-X
      Mar 24 14:30:53 ashzadapp08p kernel: [158360.709036] pci-stub 0000:07:12.3: irq 154 for MSI/MSI-X

      Note that there is a double print for each IRQ.

      If later we peek into /proc/interrupts, we see the appropriate interrupts: "PCI-MSI-edge kvm:0000:07:12.3" with appropriate IRQs.

       

      In some cases, however, we see only a single print for each IRQ, and later we don't find the appropriate interrupt assigned to the VF:

      Mar 24 14:30:53 ashzadapp08p kernel: [158360.948116] pci-stub 0000:04:10.6: irq 262 for MSI/MSI-X
      Mar 24 14:30:53 ashzadapp08p kernel: [158360.948127] pci-stub 0000:04:10.6: irq 263 for MSI/MSI-X
      Mar 24 14:30:53 ashzadapp08p kernel: [158360.948136] pci-stub 0000:04:10.6: irq 264 for MSI/MSI-X

       

      As a result, VF is not functioning at all within the VC. The only way to fix this issue is to stop the KVM process and restart it. Then VFs get re-attached properly.

       

      Can anybody advise why could this be happening? Or how we can debug this issue further?

      The ixgbe version is: 3.7.17-NAPI

      The ixgbevf version is: 1.0.19-k0

       

      Thanks!

        • 1. Re: Interrupt not assigned to a VF attached to a KVM instance
          Patrick_Kutch

          Hi Alex.

           

          One of the interesting challenges in doing a feature such as SR-IOV in Open Source is all the pieces must work together.  The Distro's pick and choose components, and sometimes not all the correct pieces are put into a distro release.

           

          I would suggest you try the 3.8.21 Source Forge release and see how that works for you.

           

          Please let us know how it goes.

           

          - Patrick

          • 2. Re: Interrupt not assigned to a VF attached to a KVM instance
            alex@zadarastorage.com

            Thanks, Patrick.

            I already looked at the later drivers and saw that 3.7.21 fixes an "SR-IOV critical bug". Can you pls give us more detail what bug this is?

            According to the code, it looks like 3.7.21 it mostly adds some kind of "VLAN Pool Filter" functionality. While the latest 3.8.21 seems to have major code changes.

            We will try & let you know.

             

            Thanks!

            Alex.

            • 3. Re: Interrupt not assigned to a VF attached to a KVM instance
              Patrick_Kutch

              I look forward to hearing your resutls.

               

              As to what was in that release -  those details were left out on purpose.  Seems it was some kind of security update, and to expose the details could lead to a security problem.

              • 4. Re: Interrupt not assigned to a VF attached to a KVM instance
                alex@zadarastorage.com

                Hello Patrick,

                I now have a scenario that always reproduces the issue. It happens when I spawn 8 VMs, each having 4 VFs attached. At least one VF ends up not having an interrupt, and, as a result, non functional.

                I tested the scenario with the following driver versions:

                3.2.9-k2 (packaged together with ubuntu-natty) - issue does not happen

                3.7.17 - issue happens

                3.7.21 - issue happens

                3.8.21 - issue happens

                 

                When compiling the driver, I used only the CFLAGS_EXTRA="-DIXGBE_NO_LRO" option (to make the driver compatible with bridging/routing as README advises). Should I use some additional flags?

                 

                Thanks,

                  Alex.

                • 5. Re: Interrupt not assigned to a VF attached to a KVM instance
                  Patrick_Kutch

                  Thanks for the detailed information.  Can you provide a bit more?

                   

                  Details about the server configuration, Model, memory, CPU etc.

                   

                  We will try to reproduce the issue and investigate.  WIll provide an update when I have more information.

                   

                  - Patrick

                  • 6. Re: Interrupt not assigned to a VF attached to a KVM instance
                    alex@zadarastorage.com

                    Thanks, Patrick.

                    Here is some info about the system. Please let me know whether anything else you need to debug this (like enabling some debug prints etc).

                    Below are dmidecode details:

                     

                    root@ubuntu-sata-51:/# dmidecode -t processor
                    # dmidecode 2.9
                    SMBIOS 2.6 present.

                     

                    Handle 0x0400, DMI type 4, 40 bytes
                    Processor Information
                            Socket Designation: CPU1
                            Type: Central Processor
                            Family: Xeon
                            Manufacturer: Intel
                            ID: C2 06 02 00 FF FB EB BF
                            Signature: Type 0, Family 6, Model 44, Stepping 2
                            Flags:
                                    FPU (Floating-point unit on-chip)
                                    VME (Virtual mode extension)
                                    DE (Debugging extension)
                                    PSE (Page size extension)
                                    TSC (Time stamp counter)
                                    MSR (Model specific registers)
                                    PAE (Physical address extension)
                                    MCE (Machine check exception)
                                    CX8 (CMPXCHG8 instruction supported)
                                    APIC (On-chip APIC hardware supported)
                                    SEP (Fast system call)
                                    MTRR (Memory type range registers)
                                    PGE (Page global enable)
                                    MCA (Machine check architecture)
                                    CMOV (Conditional move instruction supported)
                                    PAT (Page attribute table)
                                    PSE-36 (36-bit page size extension)
                                    CLFSH (CLFLUSH instruction supported)
                                    DS (Debug store)
                                    ACPI (ACPI supported)
                                    MMX (MMX technology supported)
                                    FXSR (Fast floating-point save and restore)
                                    SSE (Streaming SIMD extensions)
                                    SSE2 (Streaming SIMD extensions 2)
                                    SS (Self-snoop)
                                    HTT (Hyper-threading technology)
                                    TM (Thermal monitor supported)
                                    PBE (Pending break enabled)
                            Version: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
                            Voltage: 1.2 V
                            External Clock: 5860 MHz
                            Max Speed: 3600 MHz
                            Current Speed: 2400 MHz
                            Status: Populated, Enabled
                            Upgrade: <OUT OF SPEC>
                            L1 Cache Handle: 0x0700
                            L2 Cache Handle: 0x0701
                            L3 Cache Handle: 0x0702
                            Serial Number: Not Specified
                            Asset Tag: Not Specified
                            Part Number: Not Specified
                            Core Count: 6
                            Core Enabled: 6
                            Thread Count: 12
                            Characteristics:
                                    64-bit capable

                     

                    Handle 0x0401, DMI type 4, 40 bytes
                    Processor Information
                            Socket Designation: CPU2
                            Type: Central Processor
                            Family: Xeon
                            Manufacturer: Intel
                            ID: C2 06 02 00 FF FB EB BF
                            Signature: Type 0, Family 6, Model 44, Stepping 2
                            Flags:
                                    FPU (Floating-point unit on-chip)
                                    VME (Virtual mode extension)
                                    DE (Debugging extension)
                                    PSE (Page size extension)
                                    TSC (Time stamp counter)
                                    MSR (Model specific registers)
                                    PAE (Physical address extension)
                                    MCE (Machine check exception)
                                    CX8 (CMPXCHG8 instruction supported)
                                    APIC (On-chip APIC hardware supported)
                                    SEP (Fast system call)
                                    MTRR (Memory type range registers)
                                    PGE (Page global enable)
                                    MCA (Machine check architecture)
                                    CMOV (Conditional move instruction supported)
                                    PAT (Page attribute table)
                                    PSE-36 (36-bit page size extension)
                                    CLFSH (CLFLUSH instruction supported)
                                    DS (Debug store)
                                    ACPI (ACPI supported)
                                    MMX (MMX technology supported)
                                    FXSR (Fast floating-point save and restore)
                                    SSE (Streaming SIMD extensions)
                                    SSE2 (Streaming SIMD extensions 2)
                                    SS (Self-snoop)
                                    HTT (Hyper-threading technology)
                                    TM (Thermal monitor supported)
                                    PBE (Pending break enabled)
                            Version: Intel(R) Xeon(R) CPU           E5645  @ 2.40GHz
                            Voltage: 1.2 V
                            External Clock: 5860 MHz
                            Max Speed: 3600 MHz
                            Current Speed: 2400 MHz
                            Status: Populated, Idle
                            Upgrade: <OUT OF SPEC>
                            L1 Cache Handle: 0x0703
                            L2 Cache Handle: 0x0704
                            L3 Cache Handle: 0x0705
                            Serial Number: Not Specified
                            Asset Tag: Not Specified
                            Part Number: Not Specified
                            Core Count: 6
                            Core Enabled: 6
                            Thread Count: 12
                            Characteristics:
                                    64-bit capable


                    root@ubuntu-sata-51:/# dmidecode -t system
                    # dmidecode 2.9
                    SMBIOS 2.6 present.

                     

                    Handle 0x0100, DMI type 1, 27 bytes
                    System Information
                            Manufacturer: Dell Inc.
                            Product Name: PowerEdge R510
                            Version: Not Specified
                            Serial Number: 7TPR05J
                            UUID: 44454C4C-5400-1050-8052-B7C04F30354A
                            Wake-up Type: Power Switch
                            SKU Number: Not Specified
                            Family: Not Specified

                     

                    Handle 0x0C00, DMI type 12, 5 bytes
                    System Configuration Options
                            Option 1: NVRAM_CLR:  Clear user settable NVRAM areas and set defaults
                            Option 2: PWRD_EN:  Close to enable password

                     

                    Handle 0x2000, DMI type 32, 11 bytes
                    System Boot Information
                            Status: No errors detected


                    root@ubuntu-sata-51:/# dmidecode -t memory
                    # dmidecode 2.9
                    SMBIOS 2.6 present.

                     

                    Handle 0x1000, DMI type 16, 15 bytes
                    Physical Memory Array
                            Location: System Board Or Motherboard
                            Use: System Memory
                            Error Correction Type: Multi-bit ECC
                            Maximum Capacity: Unknown
                            Error Information Handle: Not Provided
                            Number Of Devices: 8

                     

                    Handle 0x1100, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 1
                            Locator: DIMM_A1
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 52702A81
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x1101, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 1
                            Locator: DIMM_A2
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 5E10365C
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x1102, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 2
                            Locator: DIMM_A3
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 5E60365D
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x1103, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: No Module Installed
                            Form Factor: DIMM
                            Set: 2
                            Locator: DIMM_A4
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: Unknown
                            Manufacturer:
                            Serial Number:
                            Asset Tag:
                            Part Number:

                     

                    Handle 0x1109, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 5
                            Locator: DIMM_B1
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 5E10365E
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x110A, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 6
                            Locator: DIMM_B2
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 5E603633
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x110B, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: 4096 MB
                            Form Factor: DIMM
                            Set: 6
                            Locator: DIMM_B3
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: 1333 MHz (0.8 ns)
                            Manufacturer: 00AD00B380AD
                            Serial Number: 2D905CFC
                            Asset Tag: 01110161
                            Part Number: HMT351R7BFR8C-H9

                     

                    Handle 0x110C, DMI type 17, 28 bytes
                    Memory Device
                            Array Handle: 0x1000
                            Error Information Handle: Not Provided
                            Total Width: 72 bits
                            Data Width: 64 bits
                            Size: No Module Installed
                            Form Factor: DIMM
                            Set: 4
                            Locator: DIMM_B4
                            Bank Locator: Not Specified
                            Type: <OUT OF SPEC>
                            Type Detail: Synchronous
                            Speed: Unknown
                            Manufacturer:
                            Serial Number:
                            Asset Tag:
                            Part Number:


                    root@ccmaster:/home/s-cloud# dmidecode -t bios
                    # dmidecode 2.9
                    SMBIOS 2.6 present.

                     

                    Handle 0x0000, DMI type 0, 24 bytes
                    BIOS Information
                            Vendor: Dell Inc.
                            Version: 1.5.3
                            Release Date: 10/25/2010
                            Address: 0xF0000
                            Runtime Size: 64 kB
                            ROM Size: 4096 kB
                            Characteristics:
                                    ISA is supported
                                    PCI is supported
                                    PNP is supported
                                    BIOS is upgradeable
                                    BIOS shadowing is allowed
                                    Boot from CD is supported
                                    Selectable boot is supported
                                    EDD is supported
                                    Japanese floppy for Toshiba 1.2 MB is supported (int 13h)
                                    5.25"/360 KB floppy services are supported (int 13h)
                                    5.25"/1.2 MB floppy services are supported (int 13h)
                                    3.5"/720 KB floppy services are supported (int 13h)
                                    8042 keyboard services are supported (int 9h)
                                    Serial services are supported (int 14h)
                                    CGA/mono video services are supported (int 10h)
                                    ACPI is supported
                                    USB legacy is supported
                                    BIOS boot specification is supported
                                    Function key-initiated network boot is supported
                                    Targeted content distribution is supported
                            BIOS Revision: 1.5

                     

                    Handle 0x0D00, DMI type 13, 22 bytes
                    BIOS Language Information
                            Installable Languages: 1
                                    en|US|iso8859-1
                            Currently Installed Language: en|US|iso8859-1

                    • 7. Re: Interrupt not assigned to a VF attached to a KVM instance
                      Patrick_Kutch

                      We tried to reproduce this on a Dell R710 and could not make it fail.

                       

                      How many Intel 82599 Devices do you have in the system?  If you have a few, all of them will get VF's created on them, in which case you could potentially run out of available interrupts.

                      • 8. Re: Interrupt not assigned to a VF attached to a KVM instance
                        alex@zadarastorage.com

                        We have one dual-port 82599 device. So totally we have 2 PFs, each one spawning 22 VFs, so totally 44 VFs. Four of the VFs are left out for the physical machine itself. The rest are available for VMs. In my test I create 8 (sometimes 9) VMs, each one receiving 4 VFs.

                         

                        Patrick, it doesn't seem that I run out of interrupts. This also happened when very few VMs were running on the node. With more VMs it is just easier to repro. Also, as I mentioned, I usually see a single print for the "bad" VF that it receives an IRQ. But later it ends up without the IRQ.

                         

                        Thanks for your assistance, Patrick. If you are willing to help us debug this further, please let me know what additional info is needed.

                         

                        Alex.

                        • 9. Re: Interrupt not assigned to a VF attached to a KVM instance
                          Patrick_Kutch

                          Well, it was just a shot in the dark :-)

                           

                          Do you have this issue with a different OS (say Red Hat)?  You might also, as a test, download the latest kernel and give it a try.

                           

                          You may also try getting the latest BIOS on your system.  As you are aware, SR-IOV requires many ingrediants to function properly, BIOS, Platform HW, OS and NIC.

                           

                          We know that the NIC and OS work for us on a different server with the same configuration, so I'd suggest good old fashioned debugging by process of elimination.

                           

                          Hopefully somebody out there has expereinced something like this also and will post here.

                           

                          If you do find something, please report back.  Likewise if I stumble across anything I will post it here.

                          • 10. Re: Interrupt not assigned to a VF attached to a KVM instance
                            alex@zadarastorage.com

                            Hi Patrick,

                            We don' use any other OS. As you probably understand, we don't have the bandwidth to test all the different combinations of kernel, bios and drivers. Currently, we downgraded to 3.2.9 version of the driver, and the issue does not reproduce.

                             

                            Thanks for your efforts to repro the issue.

                            Alex.

                            • 11. Re: Interrupt not assigned to a VF attached to a KVM instance
                              Patrick_Kutch

                              That is good data.  I'll pass that along, thanx!

                              • 12. Re: Interrupt not assigned to a VF attached to a KVM instance
                                alex@zadarastorage.com

                                Hello Patrick,

                                we have moved to Ubuntu-Precise 3.2.0-29-generic #46, and we are seeing this issue again with PF drivers: 3.7.21, 3.8.21, 3.9.15, 3.9.17, 3.11.33. (Versions 3.10.17, 3.10.16 were not tested yet). However, this time I debugged deeper, and the problem might not be related to PF drivers. The problem seems to happen in the path where KVM asks to allocate IRQs to VFs. Specifically, what I see is that pci_enable_msix() kernel function fails. (It is called from KVM's assigned_device_enable_host_msix() function). Once it failed with EINVAL, and other time with ENOMEM. Can you perhaps check with your devs what might be the cause and how to debug this further?

                                 

                                Thanks,

                                Alex.

                                • 13. Re: Interrupt not assigned to a VF attached to a KVM instance
                                  Patrick_Kutch

                                  Hi Alex,

                                   

                                  My Guru says that there are some recent changes to the kernel and that some of the distro's may not have grabbed all the changes.  Suggest you try applying this:

                                   

                                  http://us.generation-nt.com/patch-kvm-fix-device-assignment-threaded-irq-handler-help-208140031.html

                                  1 of 1 people found this helpful
                                  • 14. Re: Interrupt not assigned to a VF attached to a KVM instance
                                    alex@zadarastorage.com

                                    Hi Patrick,

                                     

                                    We have debugged the issue deeper, and it looks like different errors we are seeing all stem from the fact that the VF PCI device sometimes (not always) does not report the capability PCI_CAP_ID_MSIX.

                                     

                                    We see that Linux kernel code calls pci_find_capability(dev, PCI_CAP_MSIX), which further calls __pci_find_next_cap_ttl(). This function reads PCI configuration space of the VF PCI device, and looks for the required id. So sometimes, it does not find the PCI_CAP_ID_MSIX id. Then this function returns 0. This is the root cause of the errors we are seeing  - on different code paths they all call pci_find_capability(dev, PCI_CAP_MSIX).

                                     

                                    We tried to sleep for 100ms after such failure, and then check for PCI_CAP_MSIX capability again, and it was reported properly. So it looks like a transient HW issue of not reporting the capability.

                                     

                                    Can you please check with your hw/sw engineers what could be the cause of HW not reporting this capability?

                                    FYI, the detailed analysis is here: http://www.spinics.net/lists/linux-pci/msg19014.html

                                     

                                    Thanks!

                                     

                                    The code of __pci_find_next_cap_ttl():

                                     

                                    static int __pci_find_next_cap_ttl(struct pci_bus *bus, unsigned int devfn,

                                                       u8 pos, int cap, int *ttl)

                                    {

                                        u8 id;

                                     

                                        while ((*ttl)--) {

                                            pci_bus_read_config_byte(bus, devfn, pos, &pos);

                                            if (pos < 0x40)

                                                break;

                                            pos &= ~3;

                                            pci_bus_read_config_byte(bus, devfn, pos + PCI_CAP_LIST_ID,

                                                         &id);

                                            if (id == 0xff)

                                                break;

                                            if (id == cap)

                                                return pos;

                                            pos += PCI_CAP_LIST_NEXT;

                                        }

                                        return 0;

                                    }

                                    1 2 Previous Next