1 Reply Latest reply on Dec 2, 2014 12:05 PM by Salem_Intel

    Catastrophic Error - Modular Server


      We have MFS2600KI Compute Modules all running ESXi 5.5.  There have been intermittent issues where a catastrophic error occurs and the blades reboot.  It's completely random and can happen on any of the 6 blades.  Here is an example of the error:


      ID:2101Type:IPMIDetailed Description:A catastrophic error has occurred. The system has halted.Cause:An uncorrectable memory error is often the cause.Action:Check for other events that occurred near the same time which may help identify the cause or potential hardware failure.Extra Data:s:68:"Raw IPMI (hex): Gen:3000 Num:80 Type:07 EDir:83 ED1:a1 ED2:01 ED3:01";


      The error indicates a possible memory issue but Intel support has been unable to identify the exact issue.  We've replaced a module completely but others are still throwing these errors.  Has anyone seen this before and know of a possible resolution?