We added a new node to our cluster and we've got 2 of the Intel X710-T4 cards in the server. Both of our SANs are currently only 1Gb (we have a Tegile T3100 and a Lenovo px12-450r). Whenever i try to setup SAN connections using one of the ports on these NICs it causes all sorts of issues. We have CSVs go offline and report being corrupted and unreadable, we've lost VM configs and VMs stop booting from any volumes that are being hosted by this node (2016 Hyper V cluster). The odd part is that this same node was working fine in our 2012 cluster before we updated/migrated into 2016. I have the latest drivers for the adapter. Here are some things that have happened:
- When I initially brought the node into the cluster I configured all 3 connections to the SANs using the X710-T4 (2 for the tegile for each independent switch/controller and 1 for the lenovo which is directly connected). Right off the bat each time this node took ownership of either volume on the Lenovo the VMs that were on that volume would no longer boot. They would give different weird boot errors and eventually it would even hard lock the Lenovo SAN itself and I'd have to hard boot it. I moved that connection down to the onboard broadcom 1Gb NIC and that solved those issues. The tegile was still connected via the X710-T4 and while it continued to operate, some odd things happened here as well. Sometimes the list of connected iSCSI devices would just be blank on that node even though it was still operating. In the latest case the node took ownership of a LUN from the Tegile and immediately all the VMs on that LUN stopped working and the CSV reported as corrupt and unreadable. I moved the CSV to another node and after a while it finally started working again. Problem is this node insists on taking ownership of storage nodes and there doesnt appear to be a way to stop it (cant set node preferences on CSVs). So right now I'm scared to unpause this node and am contemplating just moving the Tegile connections into the broadcom as well and hopefully avoid all the hassle.... but when we eventually do upgrade our SAN and go 10Gb I dont want to run into this issue again.
I realize this is probably incredibly hard to decipher... at this point I'm just looking for suggestions. Is it one of the adapter properties? The adapters that connect to the SANs have all protocols except IPv4 unchecked, they have jumbo frames set to 9014 and they are set to not allow the OS to turn them off (power saving thing). Aside from that they are basically at default settings. I think I could probably disable SRV-IO on these adapters but is that causing my issue (I doubt it). Let me know what you think!