13 Replies Latest reply on May 10, 2017 6:16 PM by Intel Corporation

    Intel X710 vs VMWare ESX: crash and reboot

    drookie

      Hi,

       

      I have a bunch (actually around 50 boards) of Intel X710-DA2 adapters, and similar number of servers running ESX 6.0. The problem is: as soon as the server starts exchanging the traffic using X710, it reboots. Why I'm writing here instead of the VMWare support: because when the adapter stays idle (we use onboard copper gigabit i350 adapters to mitigate the issue), the server is rock stable. When using the Mellanox ConnextX-3 EN boards (recently we aquired a couple for testing purposes) the server doesn't crash either. So I'm quite sure either it's the board or it's driver.

       

      As about the Intel drivers for ESX: the problem is persistent across all available versions of the driver from 1.2.48 to 2.0.6 (we also tried the 1.4.28 in the middle). The NVM firmware version also doesn't seem to solve this - today we performed the tests on the 5.05 firmware, with 2.0.6 drivers - and  the uptime was just a couple of minutes before server rebooted. I've also tried to disable TSO and LRO, but this didn't change the result.

       

      I would appreciate greatly if someone will help me to mitigate this issue, because right now the only possible solution for us is switching to the Mellanox boards, which is quite expensive, as the server number is way big.

       

      Thanks.

        • 1. Re: Intel X710 vs VMWare ESX: crash and reboot
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hi drookie,

           Thank you for the post. Can you share below information?
          1) What is the server system used? brand and model
          2) What is the brand and model of  fiber module or SFP+ module used on the X710-DA2? 
          3) Is the X710-DA2 embedded on the system or separate adapter that can be plugged in and removed? 

          Thanks,
          wb

           

          • 2. Re: Intel X710 vs VMWare ESX: crash and reboot
            drookie

            Sure.

             

            1) All of the servers are Supermicro SYS-1028GR-TR systems, the motherboard is X10DRG-H.

            2) All of the servers are plugged into the Juniper EX4600 with passive DAC cables. We use the same cables to plug Mellanox. The SFPs on the Intel end are flashed with the Intel firmware, but the origin and version of the firmware is unknown. So is the exact DAC manufacturer. I guess this is the famous Chinese "Noname" brand.

            3) These are the discrete adapters, plugged into the PCI-E x8 slot.

             

            Recently we found a couple of stable servers using the X710 boards with the uptime measured in dozens of days (usually they crash within minutes). One this is common for both (we found two) - they are running the NVM version 4.25. Right now the lowest version on the Intel site is 4.42, so - any chance we could get the 4.25 tarball ? It seems to have been vanished from the internet. I am aware that the downgrade version from 5.05 is the 4.42, but some of my boards report they have the "0.00" version of the NVM, so I'm pretty sure I can flash them with 4.25.

             

            And one more thing. While flashing one of the boards to the 4.42 version I got my session disconnected, and now the nvm utility reports that "Access error" happens every time the board is examining. Does this mean the board is now broken ?

             

            Thanks.

            • 3. Re: Intel X710 vs VMWare ESX: crash and reboot
              drookie

              Update: at least some of the boards, reporting NVM version of 0.00 refure to flash:

               

              [root@hv07:/tmp/ESXi_x64_442] ./nvmupdate64e

               

              Intel(R) Ethernet NVM Update Tool

              NVMUpdate version 1.28.19.4

              Copyright (C) 2013 - 2016 Intel Corporation.

               

               

              WARNING: To avoid damage to your device, do not stop the update or reboot or power off the system during this update.

              Inventory in progress. Please wait [|.........]

               

               

              Num Description                               Ver. DevId S:B    Status

              === ======================================== ===== ===== ====== ===============

              01) Intel(R) I350 Gigabit Network Connection        1521 00:001 Update not    

                                                                              available

              02) Intel(R) Ethernet Converged Network       0.00  1572 00:129 Update not    

                  Adapter X710-2                                              available

               

               

              Tool execution completed with the following status: Device not found

              Press any key to exit.

              • 4. Re: Intel X710 vs VMWare ESX: crash and reboot
                Intel Corporation
                This message was posted on behalf of Intel Corporation


                Hi Drookie,

                Thank you for providing the detail information. For X710-DA2, please use the supported fiber module which we recommend on our website at http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html.?

                Can you check if you can use the supported fiber module??

                Thanks,
                wb

                • 5. Re: Intel X710 vs VMWare ESX: crash and reboot
                  drookie

                  Well, I have an update and more questions.

                   

                  1) I was told we are using Juniper-branded DACs there. So, no "made in China by Noname" brand, definitely.

                  2) I'm sorry to misinform you, because I was misinformed myself, - now it's possible that _some_ of the boards aren't Intel-manufactured, especially the ones that are reporting th NVM 0.00 version. It's possible that these are manufactured by some other vendors. Some boards, however, are definitely Intel BLKs, I saw the Intel stickers myself on the photo our field engineer sent me from site. We are investigating further. However, I'm seeing one of the adapters with NVM version 4.53 that is refusing to flash 5.05 with "Access error", though it's operable.

                   

                  3) Could you please clarify for me what does the phrase "Other brands of SFP optical modules do not work with the Intel® Ethernet CNA X710 Series." mean ? Do they just lack the physical connectivity or does it mean these modules/DACs could lead to the ESX crash ? In the same time these DACs work just fine with Mellanox. As about whether we can you SPF+ modules instead of the DAC cables - I guess we cannot, because DAC cables are way cheaper than a pair of SFP+ modules. So, this doesn't seem to be an option. As about the DACs, the document that is mentioned in the link you provided, states that only Leoni and Amphenol DACs aren't supported, I guess that leaves any Juniper DACs as supported, right ?

                   

                  Concluding - we are still having the problem, and my initial statement about Intel X710 boards still stands, because the board the ESX 6.0 was crashing with is proven to be an Intel-manufactured X710 adapter.

                  Looking forward to hear from you about whether it could be solved.

                  • 6. Re: Intel X710 vs VMWare ESX: crash and reboot
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hi Drookie,
                    Thank you for the additional information.

                    1) You can use direct attach cable that complies with the SFF-8431 v4.1 and SFF 8472v10.4 specification. Please refer to FAQ http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html
                    " Any SFP passive or active limiting direct attach copper cable that complies with the SFF-8431 v4.1 and SFF-8472 v10.4 specifications is compatible. We participate in testing with other members of the Ethernet Alliance to make sure there is interoperability between cables and host ports that meet these specifications."

                    2) The SFP optical cable is the optic type of connection which I think is not applicable in your case since you mentioned you need to use DAC.  The supported SPT+ optical cable are the one stated on our website 
                    http://www.intel.com/content/www/us/en/support/network-and-i-o/ethernet-products/000007045.html

                    3) With regards to the firmware upgrade, please reply to my private message.

                    Thanks,
                    wb



                     

                    • 7. Re: Intel X710 vs VMWare ESX: crash and reboot
                      Intel Corporation
                      This message was posted on behalf of Intel Corporation

                      Hi Drookie,

                        Please execute below command then provide the output. 
                      1) lspci -vv | grep "ethernet controller"

                      2) ethtool -i 

                      Thanks,
                      wb
                       

                      • 8. Re: Intel X710 vs VMWare ESX: crash and reboot
                        drookie

                        Hello,

                         

                        we decided to give a KVM a chance on this machine (very same), we replaced a fallen adapter with a new Intel X710 one (however, NVM version is 4.53, but as we saw earlire this doesn't affect the stability of ESX in any way ). Under Linux I get:

                         

                        [root@kvm15 Linux_x64]# lspci -vv | grep -i "ethernet controller"

                        01:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

                        01:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)

                        81:00.0 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                        81:00.1 Ethernet controller: Intel Corporation Ethernet Controller X710 for 10GbE SFP+ (rev 01)

                         

                        [root@kvm15 Linux_x64]# ethtool ens5f0

                        Settings for ens5f0:

                                Supported ports: [ FIBRE ]

                                Supported link modes:   10000baseT/Full

                                Supported pause frame use: Symmetric

                                Supports auto-negotiation: No

                                Advertised link modes:  Not reported

                                Advertised pause frame use: No

                                Advertised auto-negotiation: No

                                Speed: 10000Mb/s

                                Duplex: Full

                                Port: Direct Attach Copper

                                PHYAD: 0

                                Transceiver: external

                                Auto-negotiation: off

                                Supports Wake-on: d

                                Wake-on: d

                                Current message level: 0x0000000f (15)

                                                       drv probe link timer

                                Link detected: yes

                        [root@kvm15 Linux_x64]# ethtool ens5f1

                        Settings for ens5f1:

                                Supported ports: [ ]

                                Supported link modes:   1000baseT/Full

                                                        10000baseT/Full

                                Supported pause frame use: Symmetric

                                Supports auto-negotiation: Yes

                                Advertised link modes:  1000baseT/Full

                                                        10000baseT/Full

                                Advertised pause frame use: No

                                Advertised auto-negotiation: Yes

                                Speed: Unknown!

                                Duplex: Unknown! (255)

                                Port: Other

                                PHYAD: 0

                                Transceiver: external

                                Auto-negotiation: off

                                Supports Wake-on: d

                                Wake-on: d

                                Current message level: 0x0000000f (15)

                                                       drv probe link timer

                                Link detected: no

                        • 9. Re: Intel X710 vs VMWare ESX: crash and reboot
                          Intel Corporation
                          This message was posted on behalf of Intel Corporation

                          Hi Drookie,

                          Thank you for the information provided. 

                          Rgds,
                          wb
                           

                          • 10. Re: Intel X710 vs VMWare ESX: crash and reboot
                            Intel Corporation
                            This message was posted on behalf of Intel Corporation

                            Hi Drookie,

                                  As mentioned there is no issue when using firmware 4.25, can you
                            help provide the marking (serial number) of working X710 vs the non-working X710?
                            The serial number is found on the white sticker on the physical network adapter
                            Format: 15 digits + 6 digits + 6-3 


                            You can try use the SSU tool below to extract the system information:
                             
                            https://downloadcenter.intel.com/download/26735/Intel-System-Support-Utility-for-the-Linux-Operating-System
                             
                            regards,
                            wb
                             

                            • 11. Re: Intel X710 vs VMWare ESX: crash and reboot
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              Hi Drookie,

                               Please feel free to provide the information.

                              Rgds,
                              wb
                               

                              • 12. Re: Intel X710 vs VMWare ESX: crash and reboot
                                Hypervision

                                FYI- We had to replace every single Juniper branded DAC cable with Tripp Lite, either no link or strange results. Errors stated SFP was incompatible within the HPE UEFI error log. Tripp Lite worked just fine. I think we had to do this after updating the NVM/Driver, possibly 4.53.

                                • 13. Re: Intel X710 vs VMWare ESX: crash and reboot
                                  Intel Corporation
                                  This message was posted on behalf of Intel Corporation

                                  Hi Hypervision,

                                   Thank you for sharing the information. 

                                  Rgds,
                                  wb