Any ideas where I can get some help with the analysis of the MCEs below? Using an "unqualified" OS (CentOS), which my OEM vendor doesn't support and therefore doesn't have the support pack tools that hook into the OS for analysis. They suggested I "ask Intel" to provide an analysis of what part of the subsystem may be having the problem. OEM vendor is suggesting this is potentially not strictly a hardware error despite what the MCE says, and might actually be an interop problem between the OS and the hardware. These are IA64 systems, and I'm seeing them occur regularly on multiple machines.
Thanks in advance,
-Rob
HARDWARE ERROR. This is NOT a software problem!
Please contact your hardware vendor
MCE 12
CPU 0 BANK 8
MISC 14a6688000011080 ADDR 8e41d65c0
TIME 1340190061 Wed Jun 20 11:01:01 2012
MCG status:
MCi status:
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44
HARDWARE ERROR. This is NOT a software problem!
Please contact your hardware vendor
MCE 0
CPU 1 BANK 8
MISC 4702108000016000
TIME 1340154061 Wed Jun 20 01:01:01 2012
MCG status:
MCi status:
MCi_MISC register valid
MCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERR
Transaction: Memory scrubbing error
STATUS 88000040000200cf MCGSTATUS 0
MCGCAP 1c09 APICID 20 SOCKETID 1
CPUID Vendor Intel Family 6 Model 44
HARDWARE ERROR. This is NOT a software problem!
Please contact your hardware vendor
MCE 31
CPU 0 BANK 8
MISC d847010400011287 ADDR 87bc2aac0
TIME 1340215261 Wed Jun 20 18:01:01 2012
MCG status:
MCi status:
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
Transaction: Memory read error
STATUS 8c0000400001009f MCGSTATUS 0
MCGCAP 1c09 APICID 0 SOCKETID 0
CPUID Vendor Intel Family 6 Model 44