14 Replies Latest reply on Sep 24, 2018 11:12 AM by Intel Ethernet Branched to a new discussion.

    Intel X710 / i40en driver on vmware - any fix?

    ispcolohost

      I just received a few Dell servers to test as replacements for the vendor we currently use for vsphere.  I installed ESXi 6.5u1, patched up for spectre/meltdown so I'm now on build 7967591.  The servers are hooked into storage via multiple 10gig ethernet paths handled by dual port X710 NIC's:

       

      [root@vm4:~] esxcli network nic get -n vmnic0

         Advertised Auto Negotiation: false

         Advertised Link Modes: 10000BaseT/Full

         Auto Negotiation: false

         Cable Type: DA

         Current Message Level: -1

         Driver Info:

               Bus Info: 0000:18:00:0

               Driver: i40en

               Firmware Version: 6.00 0x800034e6 18.3.6

               Version: 1.3.1

         Link Detected: true

         Link Status: Up

         Name: vmnic0

         PHYAddress: 0

         Pause Autonegotiate: false

         Pause RX: true

         Pause TX: true

         Supported Ports: DA

         Supports Auto Negotiation: false

         Supports Pause: true

         Supports Wakeon: true

         Transceiver:

         Virtual Address: 00:50:56:5a:d4:93

         Wakeon: MagicPacket(tm)

       

      Almost immediately after putting any real load on them, the NIC simply stops passing traffic.  What's worse, it remains in an UP state, so vmware never tries the failover link.  Seeing this on the vmware side:

       

      2018-04-01T11:30:41.265Z cpu1:65925)StorageApdHandlerEv: 117: Device or filesystem with identifier [e87ff85e-cb1d8034] has exited the All Paths Down state.

      2018-04-01T12:39:19.849Z cpu46:66166)i40en: i40en_HandleMddEvent:6495: Malicious Driver Detection event 0x02 on TX queue 0 PF number 0x00 VF number 0x00

      2018-04-01T12:39:19.849Z cpu46:66166)i40en: i40en_HandleMddEvent:6521: TX driver issue detected, PF reset issued

       

      That of course lead me to the following two closed threads in this forum:

       

      https://communities.intel.com/thread/117035

      https://communities.intel.com/thread/117076

       

      Is it safe to assume this NIC is still broken and will never see fixes, with both sides blaming each other?  Can't say it leaves me very happy with Dell either as they knew we were doing vsphere on these, yet they still bundled them.