12 Replies Latest reply on Dec 21, 2017 5:34 PM by Intel Corporation

    s2600cp BMC error help

    drock

      I picked up a s2600cp that I am trying to get running. I have got the BIOS updated to the latest version (02.06.0006) after stepping through many upgrades.  I have 3 4pin fans plugged into system fan 1, 2 and 3.  I ran frusdr and configured those fans, all other fans off, and marked no for chassis intrusion.  Upon completing the config update the fans immediately spin down, but after the BMC reinitialized they ramp back up to full speed. I ran sysinfo and after digging through the log saw this:

      Management Subsystem Health, BMC FW Health (#0x10) Warning event: BMC FW Health reports the sensor has failed and may not be providing a valid reading.

      It I dug deeper and found that it started popping up somewhere during the bios updates(The version that I started with was so low i stepped through maybe 12 to be safe).  After seeing this I reflashed the BMC, and reflashed frusdr to no change. 

       

      I have found very little after scouring the net for this error. I have seen some people say this means the board is failing, and I have seen some people say it means its reading something wrong.  Either way both pieces seem a bit vague.

       

      I am fairly certain this is the cause of the full speed fans as the system status LED is flashing green and I believe a fault like this will trigger fans to run at max speed.

       

      Also the system seems to be running fine other than this. Does anyone have any information that could shed some light on this?  Thanks

        • 1. Re: s2600cp BMC error help
          Intel Corporation
          This message was posted on behalf of Intel Corporation

          Hello drock,

          Could you please tell what chassis manufacturer and model are you using with this board? Is it an Intel chassis or 3rd party chassis?

          The fans issue could be related to an unsupported chassis.

          This is a list of supported and tested chassis models and manufacturers that you could use as reference. 
                             

          P4308CP4MHENS2600CP4 board in an Intel P4000 pedestal chassis, rackable
                            
          P4308CP4MHGCS2600CP4 board in an Intel P4000 pedestal chassis, rackable
                            
          P4208CP4MHGCS2600CP4 board in Intel P4000 pedestal chassis, rackable
          Reference chassis list
          The reference chassis list includes third-party chassis tested for the Intel® Server Board S2600CP Family. Chassis are tested to see if they provide adequate airflow to meet individual manufacturer temperature specifications.                                                                                                                                                                                                                                                                                                                                                                                                                            
          VendorModelChassis TypePower SupplyUnpackage Shock TestThermal Test LevelDriver Support
          Chenbro*SR112PedestalSinglePass with 25G1A and B
          ChenbroSR105PedestalSinglePass with 25G2A and B
          ChenbroRM137Rack/1USinglePass with 25G2A and B
          ChenbroRM13604H01*13114Rack/1USingleN/A3A and B
          ChenbroRM417Rack/4UH/SN/A1A and B
          ChenbroRM13704-500CPRack/1USinglePass with 25G2A and B
          In-win*PV689PedestalSinglePass with 25G1A and B
          In-winPP689PedestalSinglePass with 25G2A and B
          TST*ESR316Rack/3UH/SPass with 25G1A and B
          ****ST104A-HB-L-I2600Rack/1USinglePass with 25G1A and B
          CI-Design*NSR224Rack/2UH/SPass with 25G1A and B

           Please reply back with your chassis model so we can continue with the troubleshooting process.

          Hope this helps

          Jose A.

          • 2. Re: s2600cp BMC error help
            drock

            Thanks for the reply.  The chassis is a chenbro RM23212 2u rack. It has 3 80mm fans.

            • 3. Re: s2600cp BMC error help
              Intel Corporation
              This message was posted on behalf of Intel Corporation

              Hello drock,

              Thanks for the info.

              Since the chassis is not within the tested list could you please attach a sysinfo log so we can try to determine if the issue is chassis sensor related or board sensor related. You can download the sysinfo utility on the following URL: https://downloadcenter.intel.com/download/26988/System-Event-Log-SEL-Viewer-Utility?product=61088

              Regards

              Jose A.

              • 4. Re: s2600cp BMC error help
                drock

                Log is attached.  I had some issues after the first bios update and and cleared the cmos, which is why the date jumps back to 2005 for a bit. 

                 

                System fans 1, 2, 3 are connected.

                • 5. Re: s2600cp BMC error help
                  Intel Corporation
                  This message was posted on behalf of Intel Corporation

                  Hello drock,

                  Thanks for attaching the logs. I looked at them and found some errors from 2016. Is that the date on the server or are the errors back from a year ago?

                  The errors found say "PCI Express Receiver error" which might be related to a PCI riser card not correctly installed. Another error says "Mmry ECC Sensor reports uncorrectable error. There has been an uncorrectable ECC or other uncorrectable memory error for the memory module  RANK_0, CPU_2, Channel = A, DIMM_1."

                  A third one says "SPS FW Health reports SPS Health event type FW status. Internal error. Operational image shall be updated or hardware board repair is needed(if error is persistent)" which seems to be related to a BIOS or board related error.

                  Let me know if you have corrected any of this errors like replacing memory or reseating PCI riser card

                  Regards

                  Jose A.

                  • 6. Re: s2600cp BMC error help
                    Intel Corporation
                    This message was posted on behalf of Intel Corporation

                    Hello drock,

                    Do you have any updates, questions or comments in regards to this issue?

                    Please do not hesitate to contact us back.

                    If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

                    Regards

                    Jose A.

                    • 7. Re: s2600cp BMC error help
                      drock

                      Hi Jose,

                       

                      I apologize I have been busy, but wanted to go through the logs you provided.  I did not have the board until 2017.  It does not have any pcie riser cards in it, nor has it thrown that error since I have had the board.  The same with the memory error.  I did take out a stick that was causing a fus in h1/2 bank because someone through non matching sticks in it.

                       

                      As far as the last error you mentioned "SPS FW Health reports SPS Health event type FW status...." it stopped throwing that after one of the bios updates.  The dates jump to 2005 after I cleared the cmos on 11/25/17

                      • 8. Re: s2600cp BMC error help
                        Intel Corporation
                        This message was posted on behalf of Intel Corporation

                        Hello drock,

                        So a couple more questions, the logs with 2005 date are actually newer? I was able to see some errors related to the flash device.

                        In this moment are you still getting the original "BMC FW Health reports the sensor has failed and may not be providing a valid reading." error message?

                        Regards

                        Jose A.

                        • 9. Re: s2600cp BMC error help
                          drock

                          Correct, the logs from 2005 embedded in the later 2017 entries are newer.  The entries are entered in the order they happened.  The BMC error is still happening.  I believe the flash error was a flash I plugged in that could be failing.

                          • 10. Re: s2600cp BMC error help
                            Intel Corporation
                            This message was posted on behalf of Intel Corporation

                            Hello drock,

                            I think the original error message "BMC FW Health reports the sensor has failed and may not be providing a valid reading" is related to the fact that you integrated this server in a non validated/tested 3rd party chassis . The original chassis and the tested ones have sensors that interact with the BMC and of course it will throw an error if an specific sensor is not found.

                            I might recommend to do a FW update one more time and this time take your time to complete the chassis info once the flash is completed.

                            Let me know how it goes.

                            Jose A.

                            • 11. Re: s2600cp BMC error help
                              Intel Corporation
                              This message was posted on behalf of Intel Corporation

                              Hello drock,

                              Do you have any updates, questions or comments in regards to this issue?

                              Please do not hesitate to contact us back.

                              If you consider the issue to be completed please let us know so we can proceed to mark this thread as resolved.

                              Regards

                              Jose A.

                              • 12. Re: s2600cp BMC error help
                                Intel Corporation
                                This message was posted on behalf of Intel Corporation

                                Hello drock,

                                We will proceed to mark this thread as resolved. If you have further issues or questions just create a new topic.

                                Jose A.