2 Replies Latest reply on Aug 31, 2012 1:33 PM by Lukasz

    SR2612UR with Debian Linux resets about once a day. Why?

    Lukasz

      I have a few servers that are rebooting (ungracefully) unexpectedly.

       

      There is no errors in the logs.

      $ last

       

      shows me:

      reboot   system boot  2.6.32-5-amd64   Sun Aug 12 20:53 - 12:25 (18+15:32)

       

      As if they were legitimate reboots. The server reboots, and comes back online.

       

      Any idea why this is happening?

       

       

       

      Running Debian Squeeze on SR2612UR with SAS drives.

        • 1. Re: SR2612UR with Debian Linux resets about once a day. Why?
          john_s@intel

          Lukasz,

           

          A couple things:

          • The processor may be overheating
            • Make sure your vents are not blocked by dust. Dust can accumulate over time
          • A faulty Power Supply Unit
          • It could be because of operating system corruption
          • It could be a memory error
            • If it is due to faulty memory, either it just needs to be reinserted/cleaned or even replaced
          • A faulty motherboard

           

          I'm not sure about the Debian operating system  and what more you can do to detect software events.

           

          You could use the SEL Viewer for UEFI/Windows*/Linux* for S5500 and S5520 boards to see if hardware errors are detected there.

           

          Can you reboot to a DOS USB stick , or uEFI and let it sit for a time being to see if it reboots under that?

           

          Regards,

          John

          • 2. Re: SR2612UR with Debian Linux resets about once a day. Why?
            Lukasz

            - Vents not blocked.

                 - Shouldn't this show up in the RMM3 ?

            - Power Supply Unit

                 - Possibly.

            - Operating system.

                 - I  tried reinstalling it.

                 - This same OS works on other nodes (exact same server type and config)

            - Memory error.

                 - Shouldn't this show up in some logs?

            - Faulty motherboard

                 - Shouldn't this indicate some type of error?

             

            I have bought about 50 of these servers so far, and about 10 of them have had this problem.

            Since I ship them across the world, its not very convienent to 'plug in a USB key', or 'reinsert memory sticks'.

             

            I don't understand why the quality of these is so low.

            I will also try to connect the SEL Viewer to see if it shows up anything.

             

            RMM3 should show me every hardware problem with the system, but it doesn't.

             

             

            I also try them in my lab for about a week to make sure its fine. Then I ship it on-site, and its faulty.

             

            I did see in the ssh session of RMM3:

            ufip=/system1/sp1/logs1/record121

              Properties:

                  LogCreationClassName=CIM_LogRecord

                  LogName=IPMI SEL

                  CreationClassName=CIM_LogRecord

                  RecordID=121

                  MessageTimeStamp=13:56:12,January 15,1970

                  RecordData=System Event - OEM System Boot Event - Asserted

                  identity=SEL ENTRY

             

            ufip=/system1/sp1/logs1/record123

              Properties:

                  LogCreationClassName=CIM_LogRecord

                  LogName=IPMI SEL

                  CreationClassName=CIM_LogRecord

                  RecordID=123

                  MessageTimeStamp=13:57:59,January 15,1970

                  RecordData=Power Unit - Power Unit Failure detected - Asserted

                  identity=SEL ENTRY

             

            ufip=/system1/sp1/logs1/record124

              Properties:

                  LogCreationClassName=CIM_LogRecord

                  LogName=IPMI SEL

                  CreationClassName=CIM_LogRecord

                  RecordID=124

                  MessageTimeStamp=13:57:59,January 15,1970

                  RecordData=Power Unit - Power Off / Power Down - Deasserted

                  identity=SEL ENTRY

             

            ufip=/system1/sp1/logs1/record125

              Properties:

                  LogCreationClassName=CIM_LogRecord

                  LogName=IPMI SEL

                  CreationClassName=CIM_LogRecord

                  RecordID=125

                  MessageTimeStamp=13:57:59,January 15,1970

                  RecordData=Power Unit - Power Unit Failure detected - Deasserted

                  identity=SEL ENTRY

             

            ufip=/system1/sp1/logs1/record126

              Properties:

                  LogCreationClassName=CIM_LogRecord

                  LogName=IPMI SEL

                  CreationClassName=CIM_LogRecord

                  RecordID=126

                  MessageTimeStamp=13:57:59,January 15,1970

                  RecordData=OEM - Asserted

                  identity=SEL ENTRY