Hi,
We are running redhat linux 6.3 and see the following errors. New dell r720xd with 2 x CPU's
model name : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz. Is this a CPU or memory issue, it does say memory read issue?
thanks
[root@chips log]# mcelog --p4 --ascii < chips-mce-error.log
CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090
Hardware event. This is not a software error.
CPU 1 BANK 5
MISC 425c3c86 ADDR 542cc61180
TIME 1345180830 Fri Aug 17 17:20:30 2012
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
STATUS 8c00004000010090 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 45
TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20
CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090
Hardware event. This is not a software error.
CPU 1 BANK 5
MISC 425c3c86 ADDR 542cc61180
TIME 1345180830 Fri Aug 17 17:20:30 2012
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
STATUS 8c00004000010090 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 45
TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20
CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090
Hardware event. This is not a software error.
CPU 1 BANK 8
MISC 5229410001000a00
TIME 1345180830 Fri Aug 17 17:20:30 2012
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
STATUS 8800004500800090 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 45
TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20
CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090
Hardware event. This is not a software error.
CPU 1 BANK 8
MISC 5229410001000a00
TIME 1345180830 Fri Aug 17 17:20:30 2012
MCG status:
MCi status:
Corrected error
MCi_MISC register valid
MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR
Transaction: Memory read error
STATUS 8800004500800090 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 45
TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20
EDAC MC1: CE - no information available: Can't discover the TAD target
EDAC MC0: CE row 0, channel 0, label "CPU_SrcID#0_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=1 Err=0080:0090 (ch=0), addr = 0x00000000 => socket=0, Channel=0(mask=3), rank=0
I just got a new Xeon E5 workstation. In Ubuntu 12.10 and 12.04LTS, /var/log/syslog fills up with messages of the form
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111662] sbridge: HANDLING MCE MEMORY ERROR
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111665] CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111667] TSC 0 ADDR f745ee640 MISC 21403cbc86 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111688] sbridge: HANDLING MCE MEMORY ERROR
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111690] CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010091
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111692] TSC 0 ADDR f745da540 MISC 2140444486 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20
Mar 26 16:13:23 rutcor01 kernel: [ 1127.111718] sbridge: HANDLING MCE MEMORY ERROR
These all decode to
sbridge: HANDLING MCE MEMORY ERROR
CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091
Hardware event. This is not a software error.
CPU 8 BANK 5
MISC 4034b486 ADDR 10231c5040
TIME 1364328295 Tue Mar 26 16:04:55 2013
MCG status:
MCi status:
Error overflow
Corrected error
MCi_MISC register valid
MCi_ADDR register valid
MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR
Transaction: Memory read error
STATUS cc0000c000010091 MCGSTATUS 0
CPUID Vendor Intel Family 6 Model 45
TSC 0 ADDR 10231c5040 MISC 4034b486 PROCESSOR 0:206d7 TIME 1364328295 SOCKET 1 APIC 20
It's very similar to what you have. In just an hour of the system being on, /var/log/syslog has grown to 20GB and the mcelog and rsyslog processes are constantly running. I tried installing Windows instead, and there is no indication anything is wrong. I tried the Intel Processor Diagnostic Tool and everything passes.
Is this an error in the Linux kernel with compatibility with E5-26xx processors? Is it a real hardware error? If so, how can I make if visible in Windows. They were not clear about it up front, but my system vendor will only fix the problem if it is visible in Windows.
Did you ever get your problem resolved? HELP!
Jonathan

