9 Replies Latest reply on Apr 15, 2012 2:24 PM by kolotyluk

    Power Dissapation Imbalance

    kolotyluk

      I have and S5520SC motherboard with two Xeon 5580 processors.

       

      When running CoreTemp I notice that processor 1 tends to run about 10C hotter than processor 0. Presumably this is because I have a water cooling rig and processor 1 is downstream of processor 0. Processor 1 seams to average about 60C and max out at 80C, while processor 0 maxes out about 60C.

       

      Tj.max is 97C

       

      Under load both processors consume about 100W (not more) and processor 0 maxes out under 70C, but processor 1 hits 97C and does not go higher.

       

      Should I have any concerns about processor 1 failing prematurely? Do I need a better cooling solution?

       

      Cheers, Eric

        • 1. Re: Power Dissapation Imbalance
          Adolfo_Intel

          Make sure that the temperature that you are reading is not the cores temperature, but the CPU temperature itself.

          If the CPU temperature is higher than 67 degrees Celsius, the processor will be overheating.

          You can double check this at:

          http://ark.intel.com/products/37113/Intel-Xeon-Processor-W5580-(8M-Cache-3_20-GHz-6_40-GTs-Intel-QPI)

           

          It shows as “TCase: 67“

           

          I would suggest installing the latest BIOS version for your motherboard to make sure that the temperature readings are the right ones.

          • 2. Re: Power Dissapation Imbalance
            kolotyluk

            OK, now my heating problems seem worse.

             

            Adolfo, what was I supposed to find at the link you provided? I do not find anything relevant there.

             

            Is there some way to read the CPU temperature from Windows? Is there some application or utility I can run.

             

            For some reason, CPU #1 consistently draws more power (10 or 20 Watts more) that CPU #0, even though the system is idle.

             

            All the cores on CPU #1 seem to be running close to 100 degrees Celsius now, even though they are idle. What is up with that - why are they running so hot when they are doing no work. Why is CPU #1 drawing so much power when the system is idle?

             

            Does this indicate that CPU # 1 is damaged and malfunctioning?

             

            Cheers, Eric

            • 3. Re: Power Dissapation Imbalance
              kolotyluk

              I turned off my system this morning and gave it 6 hours to cool down. After powering on my system the cores on Processor #1 quickly ramped up to almost 100 degrees again, and stayed that way for about 15 minutes, but now they seem to have stablized at around 65 - 70. Every now and then they seem to spike to 100 again and I cannot figure out why. Processor #0 the cores seem to stay around 40 - 45.

               

              I still cannot figure out why Processor #1 consistently draws 20 more Watts than Processor #0. Is there some reason why a hotter chip would draw more power for the same work load, or is it simply hot because it is drawing more power? Could there be some sort of voltage regulation problem, and is there a way for me to check that?

               

              I enabled the Overheating Protection on CoreTemp to shut down my system if things get critical again.

              • 4. Re: Power Dissapation Imbalance
                kolotyluk

                Some more thoughts I am having.

                 

                My Xeon procossors are cooled with a water cooling system. Some years ago the bearing started failing in the water pump so I sent my system back to the factory and they replaced it with a newer water cooling rig.

                 

                Now when I got the first water cooling system I noticed thay slathered a thick layer of thermal greese on the wather blocks. I promply removed this mess and polished the surface of the water block and CPUs then applied a small quanity of Arctic Cermaic thermal paste - per the directions. All the literature I read said that with thermal greese/paste less is more. I remember having to remove the water blocks once because my S5520SC failed because a BIOS bug fried the 5520 chipset. I was rather proud when I removed the water block to see a perectly thin uniform footprint of thermal paste exactly like all the pictures showed it should look.

                 

                I suspect when I got the new water cooler back from the factory they just slathered on another thick layer of greese - which does not make as good a thermal bridge as my method. For all I know they did not do it evenly and so one processor has a worse thermal bridge than the other.

                 

                I was planning on replacing my two 5580 processors with 5690 processors some day. Now I may have to do that sooner than later. When I do I will be careful to do the thermal paste properly - and never let the factory idiots do it again.

                • 5. Re: Power Dissapation Imbalance
                  kolotyluk

                  Now this is really interesting...

                   

                  So I fire up my DivX to transcode a video file. For some reason Windows is running all the cores on the hot processor at 100% CPU, while it is only scheduling the cores/threads on the cool CPU at about 25% to 50%.

                   

                  Obviously Windows does not have the same capabilities as programs like CoreTemp and SpeedFan to detect core temperatures. Or if it does, the Microsoft kernel developers know absolutely nothing about hardware or physics, because if they did, then the kernel would be smart enough to assign more work to the coolest running cores. An exceedingly clever kernel developer would write code to not schedule any work to a hot processor and find some way to notify the user or administrator that there is a problem. On the other hand, why is the BMC not smart enough to deal with this situation responsibly?

                   

                  In any case, I find this an astonishing revelation as I cannot imagine why Windows seems to be favoring one CPU and abusing the other???

                   

                  Is there some other explanation for this? Is this a S5520SC BIOS problem?

                  • 6. Re: Power Dissapation Imbalance
                    Adolfo_Intel

                    Keep in mind that the usage of the cores or processors on a server system is determined by the operating system itself.

                    So basically the operating system determines what core or what processor to use for a particular task.

                    It is very common to see one processor working harder than the other processor, since that is the way the operating system assigns tasks to different processor or cores.

                     

                    Some newer applications (software) are capable of assigning tasks to multiple cores or processors at the same time, but that is totally handled at a software level, not at a hardware level.

                     

                    Regarding the cores temperature, please keep in mind that you do not need to worry about cores temperature, you need to pay attention to the CPU temperature itself, please read the following information:

                    The TCase for the Intel® Xeon® processor W5580 is 67 degrees Celsius.

                     

                    The TCase is a number established by Intel® as a point of reference in order to understand what could be expected as per normal processor temperature.

                     

                    Anything from the Tcase and below will be the expected temperature of the processor in normal use, anything that doesn't stress out the processor (watching movies, burning CDs, browsing the internet, creating documents, etc.) When the processor is stressed out meaning that you are running heavy processor applications that take control of the CPU or uses it at 100% the temperature will go beyond the Tcase. It can perfectly reach 80 to 85 degrees and the processor will still be OK. The cooling fan is in charge to keep that temperature there.

                     

                    If the processor temperature reaches 100 degrees or more it will send a signal to the motherboard to shut down to prevent mayor damages and most likely it won't be possible to turn the computer back in until it cools down.

                     

                    The normal processor temperature will depend on the chassis type, the hardware involved and the location of the computer, and it usually is lower than the Tcase.

                    • 7. Re: Power Dissapation Imbalance
                      kolotyluk

                      I would very much appreciate it, Adolfo, if you could tell me the best way to determine the actual CPU temperature rather that just talk about it.

                       

                      If you do not know how to answer this, then please just say say so.

                       

                      Cheers, Eric

                      • 8. Re: Power Dissapation Imbalance
                        kolotyluk

                        More interesting facts...

                         

                        According to Core Temp (the Windows Utility) the clock on CPU #0 is averaging 2.3 MHz, while the clock on CPU #1 is averaging about 1.5 MHz.

                         

                        Seems to me the chip logic is deliberately scaling back the clock to limit the power utilization. This makes good design sense.

                         

                        Neither processor is using above 100 watts, even though they are rated at 130, this seems to be an indication that they are not reveiving adequate cooling.

                         

                        Currently both CPUs are using about 80 watts, but CPU #1 is running over 20 degrees hotter than CPU #0 - there is definately something wrong with CPU #1.

                        • 9. Re: Power Dissapation Imbalance
                          kolotyluk

                          More Data . . .

                           

                          OK, I installed Intel Active System Console - but unfortunately it does not seem to report CPU temperatures anywhere.

                           

                          The Baseboard temp is 43 degrees Celcius.

                           

                          The water pumps on both CPUs are running at a nominal rate of 4,000 RPM, and all the fan speeds seem reasonable.

                           

                          It is interesting to note that Vccp for CPU #0 is 1.24 Volts, while it is 0.92 for CPU #1. Why that difference? Is this a CPU problem or a Basboard problem.

                           

                          I checked with the company that makes the water cooling system, and they said that even though one CPU is running downstream of the other CPU, a temperature difference of 20 degrees Celsius is very unsuaual, and any difference should be closer to 5 degrees.