1 Reply Latest reply on Jan 6, 2016 7:06 PM by wb_Intel

    ixgbe: link flapping on two directly connected servers with 82599ES adapter

    pva

      Good time.

       

      We experince link flapping while connecting two supermicro servers with 82599ES adapters:

       

      [Wed Jan  6 07:10:14 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

       

      [Wed Jan  6 07:10:15 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 07:10:15 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

      [Wed Jan  6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

      [Wed Jan  6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

      [Wed Jan  6 07:10:16 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 07:10:17 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

      [Wed Jan  6 07:10:17 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

       

      So we are trying to connect two supermicro servers with E10G42BTDA X520-DA2 controller in each server:

      81:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)

       

      In one controller there is following SFP module:

       

      # ethtool --module-info enp129s0f1

      Identifier                                : 0x03 (SFP)

      Extended identifier                      : 0x04 (GBIC/SFP defined by 2-wire interface ID)

      Connector                                : 0x07 (LC)

      Transceiver codes                        : 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00

      Transceiver type                          : 10G Ethernet: 10G Base-LR

      Encoding                                  : 0x06 (64B/66B)

      BR, Nominal                              : 10300MBd

      Rate identifier                          : 0x00 (unspecified)

      Length (SMF,km)                          : 20km

      Length (SMF)                              : 20000m

      Length (50um)                            : 0m

      Length (62.5um)                          : 0m

      Length (Copper)                          : 0m

      Length (OM3)                              : 0m

      Laser wavelength                          : 1330nm

      Vendor name                              : GIGALINK       

      Vendor OUI                                : 00:90:65

      Vendor PN                                : GL-OT-ST12LC1-13

      Vendor rev                                : A 

      Optical diagnostics support              : Yes

      Laser bias current                        : 37.020 mA

      Laser output power                        : 0.9654 mW / -0.15 dBm

      Receiver signal average optical power    : 0.4904 mW / -3.09 dBm

      Module temperature                        : 44.23 degrees C / 111.61 degrees F

      Module voltage                            : 3.2836 V

      Alarm/warning flags implemented          : Yes

      Laser bias current high alarm            : Off

      Laser bias current low alarm              : Off

      Laser bias current high warning          : Off

      Laser bias current low warning            : Off

      Laser output power high alarm            : Off

      Laser output power low alarm              : Off

      Laser output power high warning          : Off

      Laser output power low warning            : Off

      Module temperature high alarm            : Off

      Module temperature low alarm              : Off

      Module temperature high warning          : Off

      Module temperature low warning            : Off

      Module voltage high alarm                : Off

      Module voltage low alarm                  : Off

      Module voltage high warning              : Off

      Module voltage low warning                : Off

      Laser rx power high alarm                : Off

      Laser rx power low alarm                  : Off

      Laser rx power high warning              : Off

      Laser rx power low warning                : Off

      Laser bias current high alarm threshold  : 85.000 mA

      Laser bias current low alarm threshold    : 10.000 mA

      Laser bias current high warning threshold : 80.000 mA

      Laser bias current low warning threshold  : 12.000 mA

      Laser output power high alarm threshold  : 3.1623 mW / 5.00 dBm

      Laser output power low alarm threshold    : 0.3162 mW / -5.00 dBm

      Laser output power high warning threshold : 2.5119 mW / 4.00 dBm

      Laser output power low warning threshold  : 0.3981 mW / -4.00 dBm

      Module temperature high alarm threshold  : 85.00 degrees C / 185.00 degrees F

      Module temperature low alarm threshold    : -10.00 degrees C / 14.00 degrees F

      Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F

      Module temperature low warning threshold  : -5.00 degrees C / 23.00 degrees F

      Module voltage high alarm threshold      : 3.7000 V

      Module voltage low alarm threshold        : 2.9000 V

      Module voltage high warning threshold    : 3.6000 V

      Module voltage low warning threshold      : 3.0000 V

      Laser rx power high alarm threshold      : 1.0000 mW / 0.00 dBm

      Laser rx power low alarm threshold        : 0.0200 mW / -16.99 dBm

      Laser rx power high warning threshold    : 0.7943 mW / -1.00 dBm

      Laser rx power low warning threshold      : 0.0251 mW / -16.00 dBm

       

      In another controller:

       

      # ethtool --module-info enp129s0f1

      Identifier                                : 0x03 (SFP)

      Extended identifier                      : 0x04 (GBIC/SFP defined by 2-wire interface ID)

      Connector                                : 0x07 (LC)

      Transceiver codes                        : 0x20 0x00 0x00 0x00 0x00 0x00 0x00 0x00

      Transceiver type                          : 10G Ethernet: 10G Base-LR

      Encoding                                  : 0x06 (64B/66B)

      BR, Nominal                              : 10300MBd

      Rate identifier                          : 0x00 (unspecified)

      Length (SMF,km)                          : 20km

      Length (SMF)                              : 20000m

      Length (50um)                            : 0m

      Length (62.5um)                          : 0m

      Length (Copper)                          : 0m

      Length (OM3)                              : 0m

      Laser wavelength                          : 1270nm

      Vendor name                              : GIGALINK

      Vendor OUI                                : 00:90:65

      Vendor PN                                : GL-OT-ST12LC1-12

      Vendor rev                                : A

      Option values                            : 0x00 0x1a

      Option                                    : RX_LOS implemented

      Option                                    : TX_FAULT implemented

      Option                                    : TX_DISABLE implemented

      BR margin, max                            : 0%

      BR margin, min                            : 0%

      Vendor SN                                : G201511300616

      Date code                                : 151118

      Optical diagnostics support              : Yes

      Laser bias current                        : 35.120 mA

      Laser output power                        : 0.7842 mW / -1.06 dBm

      Receiver signal average optical power    : 0.5695 mW / -2.45 dBm

      Module temperature                        : 45.68 degrees C / 114.22 degrees F

      Module voltage                            : 3.2440 V

      Alarm/warning flags implemented          : Yes

      Laser bias current high alarm            : Off

      Laser bias current low alarm              : Off

      Laser bias current high warning          : Off

      Laser bias current low warning            : Off

      Laser output power high alarm            : Off

      Laser output power low alarm              : Off

      Laser output power high warning          : Off

      Laser output power low warning            : Off

      Module temperature high alarm            : Off

      Module temperature low alarm              : Off

      Module temperature high warning          : Off

      Module temperature low warning            : Off

      Module voltage high alarm                : Off

      Module voltage low alarm                  : Off

      Module voltage high warning              : Off

      Module voltage low warning                : Off

      Laser rx power high alarm                : Off

      Laser rx power low alarm                  : Off

      Laser rx power high warning              : Off

      Laser rx power low warning                : Off

      Laser bias current high alarm threshold  : 85.000 mA

      Laser bias current low alarm threshold    : 10.000 mA

      Laser bias current high warning threshold : 80.000 mA

      Laser bias current low warning threshold  : 12.000 mA

      Laser output power high alarm threshold  : 3.1623 mW / 5.00 dBm

      Laser output power low alarm threshold    : 0.3162 mW / -5.00 dBm

      Laser output power high warning threshold : 2.5119 mW / 4.00 dBm

      Laser output power low warning threshold  : 0.3981 mW / -4.00 dBm

      Module temperature high alarm threshold  : 85.00 degrees C / 185.00 degrees F

      Module temperature low alarm threshold    : -10.00 degrees C / 14.00 degrees F

      Module temperature high warning threshold : 80.00 degrees C / 176.00 degrees F

      Module temperature low warning threshold  : -5.00 degrees C / 23.00 degrees F

      Module voltage high alarm threshold      : 3.7000 V

      Module voltage low alarm threshold        : 2.9000 V

      Module voltage high warning threshold    : 3.6000 V

      Module voltage low warning threshold      : 3.0000 V

      Laser rx power high alarm threshold      : 1.0000 mW / 0.00 dBm

      Laser rx power low alarm threshold        : 0.0200 mW / -16.99 dBm

      Laser rx power high warning threshold    : 0.7943 mW / -1.00 dBm

      Laser rx power low warning threshold      : 0.0251 mW / -16.00 dBm

       

       

      We've checked both SFP+ modules and patchcord in other hardware and link was working fine there. As for module version we've tried both 4.1.5 and 4.3.13 and we experience problem with both versions. I've built v4.3.13 module with printk enabled, and I found following messages in dmesg:

       

      [Wed Jan  6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 06:04:02 2016] ixgbe_get_media_type_82599

      [Wed Jan  6 06:04:02 2016] ixgbe_check_mac_link_generic

      [Wed Jan  6 06:04:02 2016] ixgbe_get_media_type_82599

      [Wed Jan  6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

      [Wed Jan  6 06:04:02 2016] ixgbe_check_mac_link_genericixgbe_fc_enable_genericixgbe_fc_autonegixgbe_check_mac_link_generic

      [Wed Jan  6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

      [Wed Jan  6 06:04:02 2016] ixgbe_get_media_type_82599

      [Wed Jan  6 06:04:02 2016] ixgbe_check_mac_link_generic

      [Wed Jan  6 06:04:02 2016] ixgbe_get_media_type_82599

      [Wed Jan  6 06:04:02 2016] ixgbe 0000:81:00.1 enp129s0f1: NIC Link is Down

       

      Does this helps to understand reasons for such behaviour?

       

      What makes me wonder. In Documentation/networking/ixgbe.txt I found following information

       

      82599-BASED ADAPTERS


      NOTES: If your 82599-based Intel(R) Network Adapter came with Intel optics, or

      is an Intel(R) Ethernet Server Adapter X520-2, then it only supports Intel

      optics and/or the direct attach cables listed below.


      When 82599-based SFP+ devices are connected back to back, they should be set to

      the same Speed setting via ethtool. Results may vary if you mix speed settings.

      82598-based adapters support all passive direct attach cables that comply

      with SFF-8431 v4.1 and SFF-8472 v10.4 specifications. Active direct attach

      cables are not supported.

       

      Yet any attempt to set speed fails:

       

      # ethtool -s enp129s0f1 speed 10000

       

      Cannot set new settings: Invalid argument

        not setting speed

       

      Why is that? That said I found suggestion to set advertising mode:

       

      ethtool -s enp129s0f1 advertise 0x1000

       

      and this worked, but link flapping continues.

       

      Yes, I realize that these modules are unsupported and to make them working I had to load module with allow_unsupported_sfp=1,1 but the main problem we have is that I failed to find any intel suggested WDM SFP+ module. We need to use only single fiber cable since in our final configuration we need to connect both servers to our provider and they highly recomend WDM connections. If these modules are not going to work does there exist any WDM module that will?

       

       

      # ethtool -i enp129s0f1

      driver: ixgbe

      version: 4.1.5

      firmware-version: 0x61c10001

      bus-info: 0000:81:00.1

      supports-statistics: yes

      supports-test: yes

      supports-eeprom-access: yes

      supports-register-dump: yes

      supports-priv-flags: no

       

       

      # ethtool enp129s0f1

      Settings for enp129s0f1:

      Supported ports: [ FIBRE ]

      Supported link modes:  10000baseT/Full

      Supported pause frame use: No

      Supports auto-negotiation: No

      Advertised link modes:  10000baseT/Full

      Advertised pause frame use: Symmetric

      Advertised auto-negotiation: No

      Speed: 10000Mb/s

      Duplex: Full

      Port: FIBRE

      PHYAD: 0

      Transceiver: external

      Auto-negotiation: off

      Supports Wake-on: d

      Wake-on: d

      Current message level: 0x00000007 (7)

            drv probe link

      Link detected: yes

       

      We've struggle with this problem for few days already so any suggestions are more then wellcome.

       

      Thanks in advance for any help.