2 Replies Latest reply on Nov 23, 2009 8:29 PM by brianklowe

    Teaming Issue with 82575EB NICs

    brianklowe

      We've been receiving Nehalem-based Sun fire x64 servers for the past few months.  The new X4170 and X4270 models now ship with 4 Intel 82575EB builtin NICs.  we use the Intel ANS teaming software and drivers provided by Sun on their Tools and Drivers DVD.  Then we create a single, two adapter team using Adapter Fault Tolerance mode.  As a practice, we also identify the NIC [in the team] with the lowest physical MAC address as the Primary Adapter within the team.  Everything works as desired, until we test using the following process:

       

      1. Open an inbound MS Terminal Services session through the NIC team.
      2. Open console redirection to the server's OS using the Service Processor (ILOM).
      3. Within the Teminal Services session, start a 'ping -t' to a third host...just to see what happens to response time.
      4. From the console (ILOM):
        1. Disable Primary Adapter (failover process activates the Standby Adapter)
        2. Re-enable Primary Adapter (this should trigger the fail-back process, since a Primary Adapter was previously identified, and is now back online).
        3. Disable Standby Adapter (should have no effect).
        4. Enable Standby Adapter (should have no effect).

       

      When we attempt to re-enable the Primary Adpater to invoke fail-back, the Terminal Service session is immediately broken.  Further Analysis shows the team will now issue TCP resets for all inbound packets other than ICMP.  Inbound ICMP to the team IP address continues to reply.  Only a server reboot will solve the issue and restore connectivity.  Prior to the reboot, netstat will show the build and immediate teardown of an inbound TCP or IP connection.

       

      Any suggestions would be appreciated.

       

       

      Thanks,

       

      Brian

        • 1. Re: Teaming Issue with 82575EB NICs
          mark_h_@intel

          I am not sure why your Adapter Fault Tolerance mode team is failing. You said you were using the software & drivers from the DVD. You might want to try the latest drivers and software from Download Center.

           

          You might be able to work around the issue by configuring a different team mode that still gives you the fault tolerance you need. If your switch will support one of the link aggregation modes, then you should consider one of those team types. The team should continue to work if one port in the team fails, and as a bonus your team gains extra bandwidth.

           

          If your switch will not support link aggregation, you could try the Adaptive Load Balancing mode. The team will balance the connections across the ports when more than one port is available and still send and receive traffic over the remaining port if one port fails.

           

          Message was edited by: Mark H @ Intel

          • 2. Re: Teaming Issue with 82575EB NICs
            brianklowe

            Mark,

             

            To date, we've tried all available drivers from Intel and/or Sun - that support the 82575EB built-in NICs on the X4170 and X4270 servers.  There is no change in the behavior I've described.  Additionally, the teamed connections are cabled to two different Cisco Catalyst 6509 switch chassis, so 802.3ad (LACP) is currently not feasible.

             

            However, we have noticed a pattern with the failures.  On the Sun servers, the four built-in NICs are labeled Net0 through Net3.  Net0 and Net1 are on the first controller chip, and Net2 and Net3 are on the second controller chip.  When the AFT team does NOT span a controller chip, NIC teaming works as desired.  However, as a standard practice to extend server resilience beyond a controller failure, we consistently use Net1 and Net3 as members of our AFT team.  In this scenario, where the team DOES span a controller chip, we consistently see the teaming failure.  We have never experienced this failure with any other model Intel NIC.

             

            Sun Support believes this to be a bug in the NIC driver, and is currently gathering more data to formally file a bug report with Intel.