I want to bring up something in the Forum that's been bugging us for some time. Sometimes we see cores lose connectivity. Then, they regain connectivity. People have come to call this phenomenum "vampire cores." 1.4.0 exhibited it a lot. 188.8.131.52 fixed it. But sometimes it comes back.
We keep at least one system here as a standalone system. We call it marc101. It's not part of the SCC Data Center network, although it sits behind the Data Center firewall. It has never exhibited vampires.
If you have your own MCPC/SCC system, have you seen this issue? It appears in our Bugzilla in various bugs but the best description I think is the one by Randolf Rotta in http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=358 so we designated this one the mothership.
We're beginning to think this problem might only exist in our SCC Data Center. We had a system that had vampires; we removed it from the SCC Data Center network, and the vampires went away. This does not necessarily indicate a configuration issue, although it may.
A standalone system looks as follows. Two ethernet cables come from the SCC and go into a 1 Gbit switch. Two ethernet cables come from the MCPC; they are called eth0 and eth1; eth0 goes to the Internet. When we telnet to the BMC, we go over eth1:1. eMAC goes over eth1.
Now in the SCC Data Center, it's much the same, except that the switch is much bigger (actually several large switches that plug into a master switch).
Removing a system from the Data Center meant just disconnecting its ethernet cables from the big switch (except for eth0 which is not shown in the second figure) and connect them instead to a little switch (to look like the first figure), unique to that system. The vampires went away. We have several theories about why this might be the case. If you want to contribute, please take a look at Bug 358.
We believe we have solved this problem. We thank Jan-Arne Sobania of HPI for his consultation.
This problem does not occur for standalone systems. It only occurs when you network MCPC/SCC systems together. And it's a simple fix, once you know how.
In systemSettings.ini, the MAC address of the first core is assigned as in sccFirstMac=00:45:4D:41:44:31. There are 48 cores. Each core gets sccFirstMac + its core number for its MAC address. When you assign sccFirstMac on another MCPC/SCC system, you must not use a number in that range. Making sccFirstMac unique for each system is not enough. The problem occurs when you forget that.