we're having problems with an Intel X710-DA4 retail card on VMware ESXi 6.5u1. After some time (usually minutes to hours) of sustained traffic on the NIC, we're seeing the following in vmkernel.log:
2017-08-11T12:26:02.554Z cpu18:66233)i40en: i40en_HandleMddEvent:6495: Malicious Driver Detection event 0x02 on TX queue 0 PF number 0x03 VF number 0x00
2017-08-11T12:26:02.554Z cpu18:66233)i40en: i40en_HandleMddEvent:6521: TX driver issue detected, PF reset issued
The network port in question is then apparently shut down, although the link stays up, and it does not pass any more network traffic. Only a reboot of the server will reset the network port and allow traffic to flow through it again.
The traffic pattern that leads to that issue usually is TCP traffic of >300MBit/s passing through a firewall virtual machine, entering on one virtual interface and exiting through another.
We are using ESXi 6.5u1 with the built-in i40en driver, as well as the latest NVM firmware version 5.05:
000:82:00.0 8086:1572 8086:0004 vmkernel vmnic2
0000:82:00.1 8086:1572 8086:0000 vmkernel vmnic3
0000:82:00.2 8086:1572 8086:0000 vmkernel vmnic4
0000:82:00.3 8086:1572 8086:0000 vmkernel vmnic5
esxcli network nic get -n vmnic3
Advertised Auto Negotiation: false
Advertised Link Modes: 10000BaseSR/Full
Auto Negotiation: false
Cable Type: FIBRE
Current Message Level: -1
Bus Info: 0000:82:00:1
Firmware Version: 5.05 0x80002898 1.1568.0
Link Detected: true
Link Status: Up
More details to curtail the problem:
- We are not using SR-IOV.
- The exact driver version is i40en 1.3.1-5vmw.6220.127.116.1169303. We have observed the same issue with a previous driver version 1.3.1-1OEM.600.0.0.2768847.
- The issue happens on multiple hosts, all with the same Intel X710-DA4 adapter.
VMware Support has not been able to resolve the issue for us, saying they have been observing issues with all current X710 drivers and cannot point us into any specific direction - other than asking us to turn to Intel for support.
Honestly, at this point we're at our wits end and do not know how to proceed any further - other than switching to a different manufacturer's network hardware altogether.
Thank you for any helpful advice.