6 Replies Latest reply on Aug 17, 2017 9:09 PM by Intel Corporation Branched from an earlier discussion.

    Intel X710-DA4 / VMware ESXi 6.5u1 - Malicious Driver Detection Event Occured-can't even get a VM to boot

    DHekimian

      I can't even get a VM to boot when I use the i40en driver v1.3.1 under ESX v6.0u2. As soon as I power on a VM the system crashes with Malicious Driver Detection and all traffic stops.

       

      I've had to fall back to using the i40e v2.0.6 (https://my.vmware.com/web/vmware/details?downloadGroup=DT-ESXI60-INTEL-I40E-206&productId=491).

       

      Just as you said, with v1.3.1 any decent amount of network traffic can  trigger this issue which stops ALL network traffic and requires a reboot.

      2017-08-11T23:59:52.735Z cpu38:33417)i40en: i40en_HandleMddEvent:6484: Malicious Driver Detection event 0x01 on TX queue 1 PF number 0x02 VF number 0x1e

      2017-08-11T23:59:52.735Z cpu38:33417)i40en: i40en_HandleMddEvent:6510: TX driver issue detected, PF reset issued

      2017-08-12T00:00:00.235Z cpu38:33417)i40en: i40en_HandleMddEvent:6484: Malicious Driver Detection event 0x02 on TX queue 0 PF number 0x02 VF number 0x00

      2017-08-12T00:00:00.235Z cpu38:33417)i40en: i40en_HandleMddEvent:6510: TX driver issue detected, PF reset issued

       

       

      With v2.0.6, traffic hiccups but keeps flowing as soon as the driver resets (>1 sec) which usually doesn't cause an issue. This usually occurs about 100x a day across my 8 node VMware Cluster.

      I do have occasions where the (TX driver issue detected, PF reset issued) occurs continuously and then it ends up causing an outage.

      2017-05-26T16:01:05.347Z cpu11:33354)<6>i40e 0000:05:00.2: TX driver issue detected, PF reset issued

      2017-05-26T16:01:05.538Z cpu38:33367)<6>i40e 0000:05:00.2: i40e_open: Registering netqueue ops

      2017-05-26T16:01:05.547Z cpu38:33367)IntrCookie: 1915: cookie 0x38 moduleID 4111 <i40e-vmnic4-TxRx-0> exclusive, flags 0x25

      2017-05-26T16:01:05.556Z cpu38:33367)IntrCookie: 1915: cookie 0x39 moduleID 4111 <i40e-vmnic4-TxRx-1> exclusive, flags 0x25

      2017-05-26T16:01:05.566Z cpu38:33367)IntrCookie: 1915: cookie 0x3a moduleID 4111 <i40e-vmnic4-TxRx-2> exclusive, flags 0x25

      2017-05-26T16:01:05.575Z cpu38:33367)IntrCookie: 1915: cookie 0x3b moduleID 4111 <i40e-vmnic4-TxRx-3> exclusive, flags 0x25

      2017-05-26T16:01:05.585Z cpu38:33367)IntrCookie: 1915: cookie 0x3c moduleID 4111 <i40e-vmnic4-TxRx-4> exclusive, flags 0x25

      2017-05-26T16:01:05.594Z cpu38:33367)IntrCookie: 1915: cookie 0x3d moduleID 4111 <i40e-vmnic4-TxRx-5> exclusive, flags 0x25

      2017-05-26T16:01:05.604Z cpu38:33367)IntrCookie: 1915: cookie 0x3e moduleID 4111 <i40e-vmnic4-TxRx-6> exclusive, flags 0x25

      2017-05-26T16:01:05.613Z cpu38:33367)IntrCookie: 1915: cookie 0x3f moduleID 4111 <i40e-vmnic4-TxRx-7> exclusive, flags 0x25

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 1 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 2 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 3 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 4 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 5 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 6 not allocated

      2017-05-26T16:01:05.659Z cpu26:32886)<6>i40e 0000:05:00.2: Tx netqueue 7 not allocated

      2017-05-26T16:01:05.660Z cpu26:32886)<6>i40e 0000:05:00.2: Netqueue features supported: QueuePair  Latency Dynamic Pre-Emptible

      2017-05-26T16:01:05.660Z cpu26:32886)<6>i40e 0000:05:00.2: Supporting next generation VLANMACADDR filter

       

       

      Intel Support has not been helpful in resolving these issues. They suggested disabling TSO/LRO but that didn't make a noticeable difference.

       

      Maybe one day Intel will take the VMware i40e/i40en driver issues seriously and attempt to fix them. I've been dealing with this for 2+ years with no end insight.