2 Replies Latest reply: Jun 3, 2013 1:46 PM by kevin_intel RSS

    Machine Check Exception

    shaunmnv

      Hi,

       

      We are running redhat linux 6.3 and see the following errors.  New dell r720xd with 2 x CPU's

      model name    : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz.  Is this a CPU or memory issue, it does say memory read issue?

       

      thanks

       

      [root@chips log]# mcelog --p4 --ascii < chips-mce-error.log

       

      CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090

      Hardware event. This is not a software error.

      CPU 1 BANK 5

      MISC 425c3c86 ADDR 542cc61180

      TIME 1345180830 Fri Aug 17 17:20:30 2012

      MCG status:

      MCi status:

      Corrected error

      MCi_MISC register valid

      MCi_ADDR register valid

      MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

      Transaction: Memory read error

      STATUS 8c00004000010090 MCGSTATUS 0

      CPUID Vendor Intel Family 6 Model 45

      TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

       

      CPU 1: Machine Check Exception: 0 Bank 5: 8c00004000010090

      Hardware event. This is not a software error.

      CPU 1 BANK 5

      MISC 425c3c86 ADDR 542cc61180

      TIME 1345180830 Fri Aug 17 17:20:30 2012

      MCG status:

      MCi status:

      Corrected error

      MCi_MISC register valid

      MCi_ADDR register valid

      MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

      Transaction: Memory read error

      STATUS 8c00004000010090 MCGSTATUS 0

      CPUID Vendor Intel Family 6 Model 45

      TSC 0 ADDR 542cc61180 MISC 425c3c86 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

       

      CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090

      Hardware event. This is not a software error.

      CPU 1 BANK 8

      MISC 5229410001000a00

      TIME 1345180830 Fri Aug 17 17:20:30 2012

      MCG status:

      MCi status:

      Corrected error

      MCi_MISC register valid

      MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

      Transaction: Memory read error

      STATUS 8800004500800090 MCGSTATUS 0

      CPUID Vendor Intel Family 6 Model 45

      TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

       

      CPU 1: Machine Check Exception: 0 Bank 8: 8800004500800090

      Hardware event. This is not a software error.

      CPU 1 BANK 8

      MISC 5229410001000a00

      TIME 1345180830 Fri Aug 17 17:20:30 2012

      MCG status:

      MCi status:

      Corrected error

      MCi_MISC register valid

      MCA: MEMORY CONTROLLER RD_CHANNEL0_ERR

      Transaction: Memory read error

      STATUS 8800004500800090 MCGSTATUS 0

      CPUID Vendor Intel Family 6 Model 45

      TSC 0 ADDR 0 MISC 5229410001000a00 PROCESSOR 0:206d7 TIME 1345180830 SOCKET 1 APIC 20

       

      EDAC MC1: CE - no information available: Can't discover the TAD target

      EDAC MC0: CE row 0, channel 0, label "CPU_SrcID#0_Channel#0_DIMM#0": 1 Unknown error(s): memory read on FATAL area : cpu=1 Err=0080:0090 (ch=0), addr = 0x00000000 => socket=0, Channel=0(mask=3), rank=0

        • 1. Re: Machine Check Exception
          Jonathan Eckstein

          I just got a new Xeon E5 workstation.  In Ubuntu 12.10 and 12.04LTS, /var/log/syslog fills up with messages of the form

           

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111662] sbridge: HANDLING MCE MEMORY ERROR

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111665] CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111667] TSC 0 ADDR f745ee640 MISC 21403cbc86 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111688] sbridge: HANDLING MCE MEMORY ERROR

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111690] CPU 8: Machine Check Exception: 0 Bank 5: 8c00004000010091

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111692] TSC 0 ADDR f745da540 MISC 2140444486 PROCESSOR 0:206d7 TIME 1364328803 SOCKET 1 APIC 20

          Mar 26 16:13:23 rutcor01 kernel: [ 1127.111718] sbridge: HANDLING MCE MEMORY ERROR

           

          These all decode to

           

          sbridge: HANDLING MCE MEMORY ERROR

          CPU 8: Machine Check Exception: 0 Bank 5: cc0000c000010091

          Hardware event. This is not a software error.

          CPU 8 BANK 5

          MISC 4034b486 ADDR 10231c5040

          TIME 1364328295 Tue Mar 26 16:04:55 2013

          MCG status:

          MCi status:

          Error overflow

          Corrected error

          MCi_MISC register valid

          MCi_ADDR register valid

          MCA: MEMORY CONTROLLER RD_CHANNEL1_ERR

          Transaction: Memory read error

          STATUS cc0000c000010091 MCGSTATUS 0

          CPUID Vendor Intel Family 6 Model 45

          TSC 0 ADDR 10231c5040 MISC 4034b486 PROCESSOR 0:206d7 TIME 1364328295 SOCKET 1 APIC 20

           

          It's very similar to what you have.  In just an hour of the system being on, /var/log/syslog has grown to 20GB and the mcelog and rsyslog processes are constantly running.  I tried installing Windows instead, and there is no indication anything is wrong.  I tried the Intel Processor Diagnostic Tool and everything passes.

           

          Is this an error in the Linux kernel with compatibility with E5-26xx processors?  Is it a real hardware error?  If so, how can I make if visible in Windows.  They were not clear about it up front, but my system vendor will only fix the problem if it is visible in Windows.

           

          Did you ever get your problem resolved?  HELP!

           

            Jonathan

          • 2. Re: Machine Check Exception
            kevin_intel

            Hello All,

             

            I am afraid to say that Intel® limits the support to Windows* on Intel® products. Since you are having this issue with an OEM system, you need to contact them directly for further instructions or contact Linux* Communities.

             

            Thanks.