9 Replies Latest reply on Jul 20, 2012 2:27 PM by Patrick_Kutch

    SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational


      We have a strange problem on one of our servers using Intel 82599 SRIOV NIC. The server was working alright for almost ~8 months with SRIOV PF/VF's working fine. Suddenly we ran into an issue where one of the PF doesn't seem to be working. We need help in isolating if the SRIOV PF has failed in hardware or whether this is a software problem.


      Currently running ethtool offline tests, exits with the below dmesg

      # ethtool -t eth103 offline

      The test result is PASS

      The test extra info:

      Register test  (offline)         0

      Eeprom test    (offline)         0

      Interrupt test (offline)         0

      Loopback test  (offline)         0

      Link test   (on/offline)         0


      [895552.667586] ixgbe: eth103: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 64 not cleared within the polling period


      Also show-ring shows

      # ethtool --show-ring eth103

      Ring parameters for eth103:

      Pre-set maximums:

      RX:             4096

      RX Mini:        0

      RX Jumbo:       0

      TX:             4096

      Current hardware settings:

      RX:             64

      RX Mini:        0

      RX Jumbo:       0

      TX:             64


      only 64 rings, whereas previously it used to show 512 rings.


      We have some VM's that have SRIOV VF's PCI assigned to them from this bad SRIOV PF. They also run into the same issue. we added some debug prints in ixgbevf driver & saw that ixgbevf_reset_hw_vf() that gets called at init fails at

              ret_val = mbx->ops.read_posted(hw, msgbuf, IXGBE_VF_PERMADDR_MSG_LEN);

      with the following error

      [    3.484162] ixgbevf: read_posted retval:-100 (IXGBE_ERR_MBX)


      The link status of the SRIOV PF seems to be fine

      # ip link show dev eth103

      5: eth103: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000

          link/ether 00:1b:21:a3:94:39 brd ff:ff:ff:ff:ff:ff

          vf 0 MAC 02:17:3e:67:a0:f8

          vf 1 MAC 02:17:3e:45:bf:4a

          vf 2 MAC 02:17:3e:78:d2:d7

          vf 3 MAC 02:17:3e:1a:fb:c6

          vf 4 MAC 02:17:3e:58:35:8d

          vf 5 MAC 02:17:3e:52:ae:4c

          vf 6 MAC 02:17:3e:62:2d:b9

          vf 7 MAC 02:17:3e:24:ae:e3

          vf 8 MAC 02:17:3e:22:35:2b

          vf 9 MAC 02:17:3e:59:86:40

          vf 10 MAC 02:17:3e:6f:9c:de

          vf 11 MAC 02:17:3e:13:0a:c1

          vf 12 MAC 02:17:3e:24:b5:79

          vf 13 MAC 02:17:3e:2d:e1:2a

          vf 14 MAC 02:17:3e:0c:11:df

          vf 15 MAC 02:17:3e:7b:82:d2

          vf 16 MAC 02:17:3e:43:5c:8d

          vf 17 MAC 02:17:3e:54:ed:b2

          vf 18 MAC 02:17:3e:70:8f:53

          vf 19 MAC 02:17:3e:55:8d:2f

          vf 20 MAC 02:17:3e:72:18:20

          vf 21 MAC 02:17:3e:12:ff:95

          vf 22 MAC 02:17:3e:71:d8:4d

          vf 23 MAC 02:17:3e:27:eb:9f

          vf 24 MAC 02:17:3e:29:7a:ad

          vf 25 MAC 02:17:3e:2c:e9:4e

          vf 26 MAC 02:17:3e:15:ce:57

          vf 27 MAC 02:17:3e:6d:61:2c

          vf 28 MAC 02:17:3e:4c:24:4d

          vf 29 MAC 02:17:3e:4c:ab:7e

          vf 30 MAC 1e:f8:b3:79:75:b2

          vf 31 MAC 02:02:2f:eb:73:1e


      So, essentially the mailbox + tx/rx queues doesnt appear to work.


      Dump of all registers with ethtool on this PF can be found here



      Our setup:

      # Physical servers run ubuntu-natty (11.04) running linux-kernel 2.6.38-8-server. We are running ixgbe driver 3.2.9 that we locally compiled to disable mac anti-spoofing (primarily we call hw->mac.ops.set_mac_anti_spoofing always with disabled flag). We did this to enable bonding of SRIOV VF's within VM's

      # At the physical server level we use ixgbevf 1.0.19-k0 & expose/use couple of SRIOV VF's locally within the physical server for bonding. Primarily we setup a linux active-backup bond across SRIOV VF's from two different SRIOV PF's

      # We run several KVM VM's on these servers that are running ubuntu-precise (12.04) running linux-kernel 3.2.0-25-generic with  ixgbevf driver version  2.2.0-k. These VM's are PCI attached with SRIOV VF's & they in turn setup active-backup bonds across the VF's out of different SRIOV PF's.

      # We setup bonds primarily for failovers & at the same time use SRIOV for performance.


      We dont know if this problem will go away upon a power-cycle of the server. We are keeping this server in the same state if some more active state information is required. Pls let us know if any more state information would help in isolating this problem.


      Any help appreciated.




        • 1. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

          I will do some digging and see if I can find anything.


          Did you happen to have any recent updates applied to your OS?





          • 2. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

            Thanks Patrick. No, we didnt try recent updates ixgbe driver level. We had some issues moving to 3.2.17 (some times we had SRIOV VF's spawned without irq's attached), so we moved down to 3.2.9 which was stable.

            • 3. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

              I was actually thinking along the lines of any OS/Kernel updates; however since  you haven't rebooted in so long, that would be  unlikely.


              My experts are not sure what the source of your problem is.  The error is, as you pointed out the mailbox communication stopped working.  We can't tell from the description if it is a software (driver, or kernel) or a hardware problem. 


              All we can suggest is to save the kernel an dmesg logs and reboot.  If the PF and VF's work after reboot, we are more inclined to believe it is a software problem of some sort, otherwise a hardware failure.


              Also, before  you reboot, if dump the registers with the ethreg tool:



              If you post it, I'll see if it provides any more useful information


              Wish I had a magic bullet for you.





              • 4. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

                Hi Patrick,

                please find the output of ethregs in:



                I just ran it on eth103 without any options, pls let us know if you need to re-run it with any particular options.


                Thanks for your help,


                • 6. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

                  Hi Patrick,

                  I think the PF dump is also available in the attached file (it has all VFs and then the PF). Or, otherwise, pls let us know which command exactly you need to perform:



                  03:00.0 (8086:10fb)

                  Intel Corporation 82599EB 10-Gigabit SFI/SFP+ Network Connection

                      Name                  Value

                      ~~~~                  ~~~~~

                      CTRL                  00000000

                      STATUS                000c8000

                      CTRL_EXT              10010000

                      ESDP                  00000876

                      I2CCTL                0000000f

                      FRTIMER               30bef1b6

                      TCPTIMER              00000000

                      PFVFLRE[1]            00000000

                      LEDCTL                45444140

                      PFVFLRE[0]            00000000

                      PFVFLREC[0]           deadbeef

                      PFVFLREC[1]           deadbeef

                      PFVFLREC[2]           deadbeef

                      PFVFLREC[3]           deadbeef

                      PFMBICR[0]            00000000

                      PFMBICR[1]            00000000

                      PFMBICR[2]            00000000

                      PFMBICR[3]            00000000

                      PFMBIMR[0]            ffffffff

                      PFMBIMR[1]            ffffffff

                      PFMBIMR[2]            ffffffff

                      PFMBIMR[3]            ffffffff

                      EICS                  00000000

                      EIAC                  4000ffff

                      EITR[000]             000001e8

                      EITR[001]             000003d0

                      EITR[002]             00000798

                      EITR[003]             00000000

                      EITR[004]             00000000

                      EITR[005]             00000000

                      EITR[006]             00000000

                      EITR[007]             00000000

                      EITR[008]             00000000

                      EITR[009]             00000000

                      EITR[010]             00000000

                      EITR[011]             00000000

                      EITR[012]             00000000

                      EITR[013]             00000000

                      EITR[014]             00000000

                      EITR[015]             00000000

                      EITR[016]             00000000


                  • 7. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

                    Hi Patrick,

                    We rebooted the server & now the SRIOV PF/VF's are working alright. So it looks like its a s/w issue. Can you pls check if the ethregs/ethtool dump above provides any further info on the issue?




                    • 8. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

                      Hi Patrick,

                      One more question on this. Is there some version compatibility requirement required between ixgbe & ixgbevf?


                      We are currently on ixgbe 3.2.9 and we have two versions of ixgbevf interfacing with the same NIC. They are ixgbevf 1.0.19-k0 & ixgbevf 2.2.0-k. Would it be an issue if differing versions of ixgbevf working with simultaneously on the NIC.


                      One thing we observed was

                      ixgbevf 2.2.0-k has a new msg code 0x6

                      Though we dont use MAC VLAN, it clears up MAC VLAN like below

                              ixgbevf_set_uc_addr_vf (IXGBE_VF_SET_MACVLAN)

                              hw->mac.ops.set_uc_addr(hw, 0, NULL);


                      However ixgbe 3.2.9 doesnt understand IXGBE_VF_SET_MACVLAN and prints message like this

                      [1020846.780262] ixgbe: eth103: ixgbe_rcv_msg_from_vf: Unhandled Msg 00000006


                      This happens very frequently (i.e. the ixgbevf for some reason keeps doing this almost every 2 secs) & ixgbe keeps printing this message.


                      We dont know if there are any other such incompatibilities that can result in this behaviour? Any insights appreciated.




                      • 9. Re: SRIOV PF/VFs suddenly stopped working & tx/rx queues doesnt seem to be operational

                        We are not sure why your PF seemed to freeze.  We will keep an eye out for such behavior, thanks for bringing it to our attention.


                        As for your PF/VF alighment.  They are fairly tightly coupled.  The way the VF driver communicates with the PF driver is through messages in the mailbox.  If one side doesn't understand the other, such an error will occur.


                        I’d recommend the user to update both PF and VF drivers to the latest version that are available from our Source forge site.  URL below:




                        PF Driver - latest ixgbe version is 3.10.16

                        VF Driver - latest ixgbevf version is 2.6.2

                        1 of 1 people found this helpful