2 Replies Latest reply: Jun 28, 2012 9:13 AM by Rob RSS

    MCE Analysis Help

    Rob

      Any ideas where I can get some help with the analysis of the MCEs below? Using an "unqualified" OS (CentOS), which my OEM vendor doesn't support and therefore doesn't have the support pack tools that hook into the OS for analysis. They suggested I "ask Intel" to provide an analysis of what part of the subsystem may be having the problem. OEM vendor is suggesting this is potentially not strictly a hardware error despite what the MCE says, and might actually be an interop problem between the OS and the hardware. These are IA64 systems, and I'm seeing them occur regularly on multiple machines.

       

      Thanks in advance,

       

      -Rob

       

      HARDWARE ERROR. This is NOT a software problem!
      Please contact your hardware vendor
      MCE 12
      CPU 0 BANK 8
      MISC 14a6688000011080 ADDR 8e41d65c0
      TIME 1340190061 Wed Jun 20 11:01:01 2012
      MCG status:
      MCi status:
      MCi_MISC register valid
      MCi_ADDR register valid
      MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
      Transaction: Memory read error
      STATUS 8c0000400001009f MCGSTATUS 0
      MCGCAP 1c09 APICID 0 SOCKETID 0
      CPUID Vendor Intel Family 6 Model 44

       

      HARDWARE ERROR. This is NOT a software problem!
      Please contact your hardware vendor
      MCE 0
      CPU 1 BANK 8
      MISC 4702108000016000
      TIME 1340154061 Wed Jun 20 01:01:01 2012
      MCG status:
      MCi status:
      MCi_MISC register valid
      MCA: MEMORY CONTROLLER MS_CHANNELunspecified_ERR
      Transaction: Memory scrubbing error
      STATUS 88000040000200cf MCGSTATUS 0
      MCGCAP 1c09 APICID 20 SOCKETID 1
      CPUID Vendor Intel Family 6 Model 44

       

      HARDWARE ERROR. This is NOT a software problem!
      Please contact your hardware vendor
      MCE 31
      CPU 0 BANK 8
      MISC d847010400011287 ADDR 87bc2aac0
      TIME 1340215261 Wed Jun 20 18:01:01 2012
      MCG status:
      MCi status:
      MCi_MISC register valid
      MCi_ADDR register valid
      MCA: MEMORY CONTROLLER RD_CHANNELunspecified_ERR
      Transaction: Memory read error
      STATUS 8c0000400001009f MCGSTATUS 0
      MCGCAP 1c09 APICID 0 SOCKETID 0
      CPUID Vendor Intel Family 6 Model 44

        • 1. Re: MCE Analysis Help
          Adolfo_Intel

          Please let me know what is the Kernel that you are using?

          Also let me know the processor model that you are using, and if possible the system configuration itself (hardware components)

           

          Please bear in mind that Intel desktop motherboards do not support Linux operating systems, so we should check for this type of issues on Linux forums.

          • 2. Re: MCE Analysis Help
            Rob

            Adolfo,

             

            Thanks for your response. I'm getting the exact Kernel version info on the CentOS build now, and will reply with that shortly. The system is an HP Proliant DL360cG7 with the Itanium IA64 (Westmere) processor; so not a desktop motherboard. Red Hat Linux is a supported OS on this box, and CentOS is essentially an open-source version of it, but not one that HP officially supports, which is why I'm posting here. I'll have more information on the exact configuration of the box shortly.

             

            I did find the following document on IA64 MCE codes, and am attempting to understand it now: http://www.intel.com/Assets/ja_JP/PDF/manual/253668.pdf

             

            Regards,

            -Rob