6 Replies Latest reply on Oct 30, 2012 8:43 PM by Semphony

    82598EB 10-Gigabit AT CX4 problem with new drivers

      Hi All

       

      Server: HP 360DL G6

      Network: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

      OS: CentOS 5.5

      Throughput: from 50Kpps up to 250Kpps in peak time

      Driver: ixgbe

       

      With new drivers 3.1.15 and 3.1.17 I have a connection loss approximately after 1 hour from booting.

      Driver compiled with CLAGS_EXTRA="-DIXGBE_NO_LRO"

       

      With the driver comes with OS (ixgbe ver 2.0.44) it works without problem.

       

      May be someone know how to fix it?

       

      Thanks in advance

       

      --

      From the /var/log/messages

       

      Dec 18 05:58:36 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_down: NIC Link is Down
      Dec 18 05:58:37 localhost kernel: Uhhuh. NMI received for unknown reason a0 on CPU 0.
      Dec 18 05:58:37 localhost kernel: You probably have a hardware problem with your RAM chips
      Dec 18 05:58:37 localhost kernel: Dazed and confused, but trying to continue
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
      Dec 18 05:58:38 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_reset: Hardware Error: -15
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
      Dec 18 05:58:44 localhost kernel: ixgbe: eth6: ixgbe_watchdog_link_is_up: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      Dec 18 06:00:30 localhost shutdown[4998]: shutting down for system reboot

       

       

       

       

      [root@localhost ixgbe-3.1.17]# lspci|grep 82598EB

      0b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

      0b:00.1 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)

      [root@localhost ixgbe-3.1.17]# lspci -v -v -s 0b:00.0
      0b:00.0 Ethernet controller: Intel Corporation 82598EB 10-Gigabit AT CX4 Network Connection (rev 01)
              Subsystem: Super Micro Computer Inc Unknown device af80
              Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR- FastB2B-
              Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR-
              Latency: 0, Cache Line Size: 64 bytes
              Interrupt: pin A routed to IRQ 169
              Region 0: Memory at fcee0000 (32-bit, non-prefetchable) [size=128K]
              Region 1: Memory at fce80000 (32-bit, non-prefetchable) [size=256K]
              Region 2: I/O ports at 5000 [size=32]
              Region 3: Memory at fce70000 (32-bit, non-prefetchable) [size=16K]
              [virtual] Expansion ROM at c2000000 [disabled] [size=256K]
              Capabilities: [40] Power Management version 3
                      Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold-)
                      Status: D0 PME-Enable- DSel=0 DScale=1 PME-
              Capabilities: [50] Message Signalled Interrupts: 64bit+ Queue=0/0 Enable-
                      Address: 0000000000000000  Data: 0000
              Capabilities: [60] MSI-X: Enable+ Mask- TabSize=18
                      Vector table: BAR=3 offset=00000000
                      PBA: BAR=3 offset=00002000
              Capabilities: [a0] Express Endpoint IRQ 0
                      Device: Supported: MaxPayload 256 bytes, PhantFunc 0, ExtTag-
                      Device: Latency L0s <512ns, L1 <64us
                      Device: AtnBtn- AtnInd- PwrInd-
                      Device: Errors: Correctable+ Non-Fatal+ Fatal+ Unsupported+
                      Device: RlxdOrd+ ExtTag- PhantFunc- AuxPwr- NoSnoop+
                      Device: MaxPayload 256 bytes, MaxReadReq 4096 bytes
                      Link: Supported Speed 2.5Gb/s, Width x8, ASPM L0s L1, Port 0
                      Link: Latency L0s <4us, L1 <64us
                      Link: ASPM Disabled RCB 64 bytes CommClk- ExtSynch-
                      Link: Speed 2.5Gb/s, Width x8
              Capabilities: [100] Advanced Error Reporting
              Capabilities: [140] Device Serial Number 64-8c-04-ff-ff-90-25-00
        • 1. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
          dong xiao

          I have met the same problem.

           

          someone know how to fix it?

          • 2. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
            mark_h_@intel

            @xiaomdoNG

            The driver issues reported above are about two years old.

             

            What driver versions are you using? What kernel? What distribution? What flags / options are you configuring with the driver? What are the details of the disconnects you are experiencing?

             

            Make sure you are using the latest driver, version 3.10.17. You can download the driver at http://downloadcenter.intel.com/Detail_Desc.aspx?DwnldID=14687.

             

            Mark H

            • 3. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
              dong xiao

              @Mark H

              I have met the same problem in kernel version 2.6.28 With driver ixgbe-3.10.15.
              Could you give me some adviceļ¼Ÿ

               

              Message as follows:

              Sep 26 17:45:30 cwcos user.warn kernel: WARNING: at net/sched/sch_generic.c:226 dev_watchdog+0x110/0x194()
              Sep 26 17:45:30 cwcos user.info kernel: NETDEV WATCHDOG: eth4 (ixgbe): transmit timed out
              Sep 26 17:45:30 cwcos user.warn kernel: Modules linked in:
              Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_tftp
              Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ftp
              Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack_ipv4
              Sep 26 17:45:30 cwcos user.info kernel: nf_defrag_ipv4
              Sep 26 17:45:30 cwcos user.info kernel: xt_state
              Sep 26 17:45:30 cwcos user.info kernel: nf_conntrack
              Sep 26 17:45:30 cwcos user.info kernel: nfnetlink
              Sep 26 17:45:30 cwcos user.info kernel: iptable_filter
              Sep 26 17:45:30 cwcos user.info kernel: ip_tables
              Sep 26 17:45:30 cwcos user.info kernel: xt_tcpudp
              Sep 26 17:45:30 cwcos user.info kernel: xt_limit xt_multiport x_tables ixgbe igb tg3 e1000e
              Sep 26 17:45:30 cwcos user.info kernel: e1000 e100 sd_mod pata_jmicron ata_generic libata uhci_hcd
              Sep 26 17:45:30 cwcos user.info kernel: ohci_hcd ehci_hcd
              Sep 26 17:45:30 cwcos user.warn kernel: Pid: 0, comm: swapper Not tainted 2.6.28.3cwcos_kernel_v1.0.0.1c1 #36
              Sep 26 17:45:30 cwcos user.warn kernel: Call Trace:
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0000 warn_slowpath+0x61/0x78
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0001 apm+0x3c9/0x51b
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0002 reschedule_interrupt+0x28/0x30
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0003 smp_reschedule_interrupt+0x10/0x21
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0004 reschedule_interrupt+0x28/0x30
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0005 next_cpu+0x12/0x21
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0006 find_busiest_group+0x23e/0x671
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0007 dev_watchdog+0x110/0x194
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0008 rebalance_domains+0x124/0x33d
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0009 dev_watchdog+0x0/0x194
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0010 run_timer_softirq+0xf5/0x14a
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0011
              do_softirq+0x76/0x113
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0012 __do_softirq+0x0/0x113
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0014 irq_exit+0x35/0x73
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0015 smp_apic_timer_interrupt+0x6e/0x78
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0016 apic_timer_interrupt+0x28/0x30
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0017 acpi_ex_prep_field_value+0x131/0x1aa
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0018 acpi_safe_halt+0x18/0x25
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0019 acpi_idle_enter_c1+0x9a/0xf0
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0020 cpuidle_idle_call+0x5c/0x94
              Sep 26 17:45:30 cwcos user.warn kernel: klzzwxh:0021 cpu_idle+0x68/0x98
              Sep 26 17:45:30 cwcos user.warn kernel: --- end trace 34ba8c1b33fd6912 ---
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 0 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 1 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 2 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 3 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 4 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 5 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 6 not cleared within the polling period
              Sep 26 17:45:30 cwcos user.err kernel: ixgbe: eth4: ixgbe_disable_rx_queue: RXDCTL.ENABLE on Rx queue 7 not cleared within the polling period
              Sep 26 17:45:37 cwcos user.err kernel: ixgbe: eth4: ixgbe_reset: Hardware Error: -15 !

              • 4. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
                mark_h_@intel

                Troubleshooting this issue is beyond what I know, so I contacted a Linux driver developer for his suggestions. He cannot tell what the issue is from this information. He thinks that something might be happening with the OS that then causes the problem shown in the log you posted. Possibly something shows up earlier in the log that leads up to the watchdog warning. Please supply the preceding log messages.

                 

                He would also like to see some other information might help with troubleshooting:
                Please provide the ethreg’s register dump of the system in the failure state. You can get the tool from SourceForge at http://sourceforge.net/projects/e1000/files/Ethregs%20-%20Register%20Dump%20Tool/.


                What are NIC’s stats from “ethtool -s”?
                What is the hardware you are running on?
                What is the output of lspci -vvv?

                 

                With the additional information, he might be able to get closer to the cause and give suggestions.

                 

                Mark H

                • 5. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
                  vegan

                  Given the mess with the Ethernet stack, it might be reasonable to reinstall the OS.

                   

                  Make sure you are using the latest distribution of CentOS so that you have the current kernel etc.

                   

                  I use a different distribution but CentOS is the same as for the others.

                  • 6. Re: 82598EB 10-Gigabit AT CX4 problem with new drivers
                    Semphony

                    Further help on this would help me too.