Assume you have four NICs on each compute module, two onboard NICs and two from mezzanine card. Note that the two onboard NICs goes to the first switch, and the two from mezzanine card goes to the second switch. You can configure fault tolerance between the NICs, or connect them to different networks/VLANs, but they should not be connected to the same network. What's your network configuration? Check if there is a loop in the network, thus STP may disconnect the port to cut the path.
See http://www.intel.com/support/motherboards/server/mfsys25/sb/CS-028603.htm for more details.
Yes, we have the NIC mezzanine card on each host hooked up to the second switch. The problem is only happening on the host interfaces connected to the second switch.
Both switches are hooked up to the same upstream switch (same VLAN/network), but not in such a way that would cause a bridge loop (i.e. the MFSYS switches are not hooked up to each other, and the hosts are not bridging across interfaces). When the problem is happening, the interface shows 'forwarding' as the STP status as expected. Also, if the interface were disabled by STP I would not expect it to flap up and down rapidly like this.
Furthermore, I just disabled (admin down) all external ports on the second switch and the problem still happens. Disabling port-fast on the host's ports likewise has no effect. Doesn't seem to be an issue with STP.
Any other suggestions?
Disconnect switch 1, then do you still have same problem on switch 2?
I've reset switch 1 and it had no effect on the flapping interfaces on switch 2. Still unable to bring up the interface, and existing interfaces that were flapping continued to flap.
Physically removing the switch is a bit more involved since it's remote, and there's no way to shut the switch down from the management interface. I can do that if you think it will be helpful, but if this is just a way to determine whether it's an issue with a bridge loop, then I think resetting the switch should be equivalent...
I had the NOC physically remove the switch for a few minutes and there was no change. Interfaces on the second switch still flapped and would not come up, except those already up.
I was thinking if you can physically swap switch 1 and 2 to see if it makes any differences... And have you tried different OS/driver version?
Replacing the switch is the first thing I tried... I swapped both switch 1 and switch 2 with replacements a few weeks back, and the issue remained on interfaces connected to switch 2.
I have not tried different driver versions. Seems unlikely that a particular driver would have problems only on one of two identical switches. Something to try if there are no better avenues, I guess...