2 Replies Latest reply: Jun 3, 2013 1:46 PM by kevin_intel RSS

Machine Check Exception

shaunmnv Community Member
Currently Being Moderated

Hi,

 

We are running redhat linux 6.3 and see the following errors.  New dell r720xd with 2 x CPU's

model name    : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz.  Is this a CPU or memory issue, it does say memory read issue?

 

thanks

 

[root@chips log]# mcelog --p4 --ascii < chips-mce-error.log

 

CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090

Hardware event. This is not a software error.

CPU 1 BANK 5

MISC 425c3c86 ADDR 542cc61180

TIME 1345180830 Fri Aug 17 17:20:30 2012

MCG status:

MCi status:

Corrected error

MCi_MISC register valid

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

Transaction: Memory read error

STATUS 8c00004000010090 MCGSTATUS 0

CPUID Vendor Intel Family 6 Model 45

TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

 

CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090

Hardware event. This is not a software error.

CPU 1 BANK 5

MISC 425c3c86 ADDR 542cc61180

TIME 1345180830 Fri Aug 17 17:20:30 2012

MCG status:

MCi status:

Corrected error

MCi_MISC register valid

MCi_ADDR register valid

MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

Transaction: Memory read error

STATUS 8c00004000010090 MCGSTATUS 0

CPUID Vendor Intel Family 6 Model 45

TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

 

CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090

Hardware event. This is not a software error.

CPU 1 BANK 8

MISC 5229410001000a00

TIME 1345180830 Fri Aug 17 17:20:30 2012

MCG status:

MCi status:

Corrected error

MCi_MISC register valid

MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

Transaction: Memory read error

STATUS 8800004500800090 MCGSTATUS 0

CPUID Vendor Intel Family 6 Model 45

TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

 

CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090

Hardware event. This is not a software error.

CPU 1 BANK 8

MISC 5229410001000a00

TIME 1345180830 Fri Aug 17 17:20:30 2012

MCG status:

MCi status:

Corrected error

MCi_MISC register valid

MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

Transaction: Memory read error

STATUS 8800004500800090 MCGSTATUS 0

CPUID Vendor Intel Family 6 Model 45

TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

 

EDAC MC1: CE - no information available: Can't discover the TAD target

EDAC MC0: CE row 0, channel 0, label "CPU_SrcID#0_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=1 Err=0080:0090 (ch=0), addr = 0x00000000 => socket=0, Channel=0(mask=3), rank=0

  • 1. Re: Machine Check Exception
    Jonathan Eckstein Community Member
    Currently Being Moderated

    I just got a new Xeon E5 workstation.  In Ubuntu 12.10 and 12.04LTS, /var/log/syslog fills up with messages of the form

     

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111662] sbridge: HANDLING MCE MEMORY ERROR

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111665] CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111667] TSC 0 ADDR f745ee640 MISC 21403cbc86 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111688] sbridge: HANDLING MCE MEMORY ERROR

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111690] CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010091

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111692] TSC 0 ADDR f745da540 MISC 2140444486 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20

    Mar 26 16:13:23 rutcor01 kernel: [ 1127.111718] sbridge: HANDLING MCE MEMORY ERROR

     

    These all decode to

     

    sbridge: HANDLING MCE MEMORY ERROR

    CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091

    Hardware event. This is not a software error.

    CPU 8 BANK 5

    MISC 4034b486 ADDR 10231c5040

    TIME 1364328295 Tue Mar 26 16:04:55 2013

    MCG status:

    MCi status:

    Error overflow

    Corrected error

    MCi_MISC register valid

    MCi_ADDR register valid

    MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR

    Transaction: Memory read error

    STATUS cc0000c000010091 MCGSTATUS 0

    CPUID Vendor Intel Family 6 Model 45

    TSC 0 ADDR 10231c5040 MISC 4034b486 PROCESSOR 0:206d7 TIME 1364328295 SOCKET 1 APIC 20

     

    It's very similar to what you have.  In just an hour of the system being on, /var/log/syslog has grown to 20GB and the mcelog and rsyslog processes are constantly running.  I tried installing Windows instead, and there is no indication anything is wrong.  I tried the Intel Processor Diagnostic Tool and everything passes.

     

    Is this an error in the Linux kernel with compatibility with E5-26xx processors?  Is it a real hardware error?  If so, how can I make if visible in Windows.  They were not clear about it up front, but my system vendor will only fix the problem if it is visible in Windows.

     

    Did you ever get your problem resolved?  HELP!

     

      Jonathan

  • 2. Re: Machine Check Exception
    kevin_intel Community Member
    Currently Being Moderated

    Hello All,

     

    I am afraid to say that Intel® limits the support to Windows* on Intel® products. Since you are having this issue with an OEM system, you need to contact them directly for further instructions or contact Linux* Communities.

     

    Thanks.

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points