9 Replies Latest reply on Mar 24, 2015 2:28 PM by Dan_Intel

    S2600CO4 with (2) E5-2643 not reporting processor thermal margin

    wpmcnamara

      I have two S2600CO4 boards and neither of them reports processor thermal margin sensors.

       

      P1 Therm Margin  | na     | degrees C  | na| na    | na    | na    | na    | na    | na  
      P2 Therm Margin  | na     | degrees C  | na| na    | na    | na    | na    | na    | na  
      P1 Therm Ctrl %  | na     | percent| na| na    | na    | na    | 30.000| 50.000| na  
      P2 Therm Ctrl %  | na     | percent| na| na    | na    | na    | 30.000| 50.000| na  

       

      Nor do them report DIMM thermal margins

      DIMM Thrm Mrgn 1 | na     | degrees C  | na| na    | na    | na    | 5.000 | 10.000| na  
      DIMM Thrm Mrgn 2 | na     | degrees C  | na| na    | na    | na    | 5.000 | 10.000| na  
      DIMM Thrm Mrgn 3 | na     | degrees C  | na| na    | na    | na    | 5.000 | 10.000| na  
      DIMM Thrm Mrgn 4 | na     | degrees C  | na| na    | na    | na    | 5.000 | 10.000| na  

       

      The end result is that all the fans in the system run at full RPM.  The system itself is fully functional otherwise.  BIOS is at version 2.04.0003, BMC is at 1.22.6890, ME is at 02.01.07.328 and FRU/SDR is at 1.12.  Update was done from the latest EFI package for the board.  The system event log has entries noting the processor thermal margin sensor failures.

       

      I have reflashed all the images individually.  I have verified that the SDR table is getting updated -- I slightly changed the name of the processor thermal sensors and verified that the new name showed up in the the RMM sensor table as well as an "ipmitool sensor".

       

      The system is in a non-Intel chassis.  Fans are connected for system fan 1-3, rear fan, and cpu fan 1 and 2.  When updating the FRU/SDR records, the update script properly identifies it as an "other" chassis and prompts me for the various fan connections.

       

      I'm looking for suggestions on what I might be missing here.

        • 1. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
          Dan_Intel

          Hi,

           

          In order to better understand your issue, please let us know what console you are using to extract this information for the thermal margin.

          • 2. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
            wpmcnamara

            The pasted output is from "ipmitool sensor".  The  pertinent section of the RMM sensor page is below.

             

            P1 Statusreports the processor's presence has been detected OK0x0080
            P2 Statusreports the processor's presence has been detected OK0x0080
            P1 Therm MarginAll deassertedUnknownNot Available
            P2 Therm MarginAll deassertedUnknownNot Available
            P1 Therm Ctrl %All deassertedUnknownNot Available
            P2 Therm Ctrl %All deassertedUnknownNot Available
            P1 ERR2All deassertedOK0x0000
            P2 ERR2All deassertedOK0x0000
            CATERRAll deassertedOK0x0000
            P1 MSID MismatchAll deassertedOK0x0000
            CPU MissingAll deassertedOK0x0000
            P1 DTS Therm MgnAll deassertedUnknownNot Available
            P2 DTS Therm MgnAll deassertedUnknownNot Available
            P2 MSID MismatchAll deassertedOK0x0000
            P1 VRD HotAll deassertedOK0x0000
            P2 VRD HotAll deassertedOK0x0000
            P1 MEM01 VRD HotAll deassertedOK0x0000
            P1 MEM23 VRD HotAll deassertedOK0x0000
            P2 MEM01 VRD HotAll deassertedOK0x0000
            P2 MEM23 VRD HotAll deassertedOK0x0000
            DIMM Thrm Mrgn 1All deassertedUnknownNot Available
            DIMM Thrm Mrgn 2All deassertedUnknownNot Available
            DIMM Thrm Mrgn 3All deassertedUnknownNot Available
            DIMM Thrm Mrgn 4All deassertedUnknownNot Available
            • 3. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
              Doc_SilverCreek

              what is the part number of the CPUs?

               

              Should be something like SR0L7 or   QBxx

              • 4. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                wpmcnamara

                Apologies for the delay.  I had to wait for a time where I could tear the machine down.

                 

                The CPU part numbers are SR0L7.  While I had the system apart, I put these two CPUs in a S2600GZ board and verified that they do in fact report processor thermal margin values.  They do, as expected.  I also took the opportunity to try a pair of E5-2609s (SR0LA) in the S2600CO4 board.  These processors have also been verified to report thermal margin values.  As with the others, in the S2600CO4 board, the sensors report as unavailable.

                 

                This would certainly point the finger at the S2600CO board as having something misconfigured, or wrong with it.

                • 5. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                  Doc_SilverCreek

                  You could try re flashing the complete fw stack, especially the ME, BMC and SDRs.

                   

                  I would also clear the BMC defaults which can be done with the syscfg -rbfd (i think) command. (You may need to do  syscfg /? to get the help and find the restored BMC default command. )

                   

                  I would not give this very high odds of working as it is more likely a damaged CPU pin or damage on the Mother board.

                  • 6. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                    wpmcnamara

                    I am fairly certain it is not damage per se, as I have two boards that behave exactly the same way.  However, it might be the boards.  Prompted by your damage comment, I was looking at the second board in detail, just giving it a good looking over.  Turns out it is an engineering sample board.  Turns out both boards are.  Now, I wouldn't normally expect that to be the cause.  In the past, the engineering sample equipment we have gotten from Intel has been fully functional, if not at its final hardware rev.  Usually it just means that we got it before it had completed certifications.  I suppose that these boards could have not been fully functional yet, or that the ME connection to the processors could have been changed slightly such that release firmware expects things to be different.  If that is the case, it will be disappointing.  I hate to trash a couple of otherwise functional boards.

                     

                    I will give the BMC reset a try, just to cover all the bases.

                    • 7. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                      Dan_Intel

                      Hello,

                      Engineering samples are meant for OEM (Original Equipment Manufacturers) and Intel provides these for testing purposes only. They may lack features that the production units will have. We strongly recommend returning these to the place of purchase or your Intel representative and request production units instead.

                      • 8. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                        wpmcnamara

                        Well, that would require returning them directly to Intel as, at the time we purchased these boards, we were an Intel OEM.  In cleaning up recently, these boards were discovered.  We have a number of other engineering sample systems we use for various purposes in our hardware lab and it was decided to see if these boards could be put to use.  It seems the answer is "sort of" as they appear to be fully functional, other than the broken CPU thermal sensors.  While I expected some features to be missing or to not work, something as basic as CPU thermal sensors wasn't expected.  They will just have to be used where the extra fan noise is not an issue.  Not ideal, but not worth putting much effort into either.

                        • 9. Re: S2600CO4 with (2) E5-2643 not reporting processor thermal margin
                          Dan_Intel

                          We, at Intel, appreciate your feedback on this matter.Thank you for taking the time to communicate this issue to us.