<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:clearspace="http://www.jivesoftware.com/xmlns/clearspace/rss" xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0">
  <channel>
    <title>Intel Communities: Message List - Temperature Sensors</title>
    <link>http://communities.intel.com/community/marc?view=discussions</link>
    <description>Most recent forum messages</description>
    <language>en</language>
    <pubDate>Thu, 31 Mar 2011 15:15:44 GMT</pubDate>
    <generator>Jive SBS 5.0.2.0  (http://jivesoftware.com/products/clearspace/)</generator>
    <dc:date>2011-03-31T15:15:44Z</dc:date>
    <dc:language>en</dc:language>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/119762?tstart=0#119762</link>
      <description>&lt;!-- [DocumentBodyStart:10d9419b-a9fb-4fd0-8be8-3657ae1f04ef] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Devendra,&lt;/p&gt;&lt;p&gt;Looking at the quick and dirty plots you attached, I see that the noise level looks consistent (the scale of the plots varies so it only seems to be wider on some) and magnitude and noise that are not surprising.&amp;nbsp; I believe that you'll find the noise is Guassian white noise, you can run a statistical normality test adn an FFT to confirm. You report the higher value as the one closer to the core which is as expected.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;What are you using as the workload?&amp;nbsp; Workloads that access memory beyond the cache will impact routers on the path to the iMC and will stall the pipeline waiting for memory.&amp;nbsp; Also, how are you observing the value of the sensor?&amp;nbsp; Since you were not reading from the MCPC, how were you capturing the values?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;The comment by Andrea from Univ Bologna shows a systematic approach to characterizing the sensors.&amp;nbsp; I look forward to their result.&lt;/p&gt;&lt;p&gt;-Jim&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:10d9419b-a9fb-4fd0-8be8-3657ae1f04ef] --&gt;</description>
      <pubDate>Thu, 31 Mar 2011 15:15:04 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/119762?tstart=0#119762</guid>
      <dc:date>2011-03-31T15:15:04Z</dc:date>
      <clearspace:dateToText>2 years, 1 month ago</clearspace:dateToText>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/119724?tstart=0#119724</link>
      <description>&lt;!-- [DocumentBodyStart:92a94b94-d7c2-456e-91d5-1f9aa0def135] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Hi Yang and Davendra,&lt;/p&gt;&lt;p&gt;in university of bologna we had worked extensively in the last months in try to characterize and reverse ingeenering the thermal sensors of SCC.&lt;/p&gt;&lt;p&gt;First of all from our tests, for uniform stress condition and same sensors settings, we see high difference in the sensor output. The difference is not clusterized and so it could be supposed to be noise. Nonetheless a series of readings shows a high noise in the output of the sensors.&lt;/p&gt;&lt;p&gt;We then execute a series of stress test and we recognize a negative sensibility with the temperature.&lt;/p&gt;&lt;p&gt;Thus considering a linear dependency we have that the sensor output is SO = A+ BT, where T is the temperature and A,B are coefficient. A is positive B is negative.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Said that we performed a test on the sensor time windows (a.k.a integral time)&amp;nbsp; of each sensors. What we discoverd that increasing the time windows, keeping constant the core stress, the counter value increses then overflow, then increase, then overflows. Nonetheless incresing the time windows increases the errors but also increase the absolute value of the reading (if you manually account the overflow) thus higher time windows improves the signal-to-noise ratio.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;We are also currently developeding a strategy to characterize the thermal sensors by using a combination of stress patterns and least square optimizzation. We are trying to make it consistent and soon we will let it be pubblic to the marc community.&lt;/p&gt;&lt;p&gt;The main objective of this strategy is to override the problem of the two point characterizzation, suggested in the ReadingSensor.pdf, indeed with this streategy you are forced to impose at maximum power consumption the temperature of all the cores to be equal within each other to a maximum values. We cross checked it with hotspot simulation and we see that center cores in this case should have higher temperature leading to loose of accuracy.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Just as early results we assembled a set of videos of the thermal response of SCC under different stress patterns. Here the link:&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;a class="jive-link-external-small" href="http://www.youtube.com/watch?v=Ic43Cd4yn7s&amp;amp;feature=related" target="_blank"&gt;http://www.youtube.com/watch?v=Ic43Cd4yn7s&amp;amp;feature=related&lt;/a&gt;&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Of course still work in progress and this is why we still not have released it.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Best,&lt;/p&gt;&lt;p&gt;Andrea&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:92a94b94-d7c2-456e-91d5-1f9aa0def135] --&gt;</description>
      <pubDate>Thu, 31 Mar 2011 09:13:40 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/119724?tstart=0#119724</guid>
      <dc:date>2011-03-31T09:13:40Z</dc:date>
      <clearspace:dateToText>2 years, 1 month ago</clearspace:dateToText>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/118280?tstart=0#118280</link>
      <description>&lt;!-- [DocumentBodyStart:0c18e71b-eec1-4a06-9c53-d43847911cf6] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Hello Everyone,&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Partly to help others who are trying to make sense of temperature sensors, and partly to get a quick sanity check on the results, I am attaching some graphs that I printed while measuring the thermal response of the system.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;(1). All cores were cooled by resetting the SCC and not booting for about 20 minutes. (I know there is a better way to do it, I am working on it).&lt;/p&gt;&lt;p&gt;(2). Assuming that the cores reached a stable temperature, I ran a CPU-intensive application on cores00 and core01 (Tile 00). The application consistently consumes ~95% CPU for about 20 minutes or so.&lt;/p&gt;&lt;p&gt;(3). Measured tiles sensors (or rather, counts).&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;We know from the documents elsewhere on marc that count is sort of inversely proportional to temperature. So, a lower count will mean a higher temperature.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;(4). Measurements were started 3 minutes before the application was activated, to capture the background noise. The measurement continuted till after the application finished.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;(5). We do not have any control on sensors, and each sensor will have a bias and a noise. Can someone help in how to model this noise?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;I have still not done any cleaning of data, and what I present is just quick graphing of the sensor reads. In each graph, there are two plots, one for the sensor close to the cores (the upper one) and the other one is for sensor close to the network switch. The graph colors are *not* consistent. Again, before putting lot of effort into this, I will like a sanity check on what I am doing.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;In the attachment, there are several jpeg plots, with the name Tilexy.jpg, where x and y are the co-ordinates of the tile where the sensors are located. Again, the load is running only on tile 00 (cores 0, 1). The load starts executing exactly at the same time, in both cores.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;So, here are my questions:&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;(a). I see an observable difference in the counter readings only for Tile 00. Also, I had expected the "difference" to be rather large.&lt;/p&gt;&lt;p&gt;(b). How large, would "large" be? Ah, The Intel document explains how to set the starting point for calibration, but I will need at least one more co-ordinate to determine the slope. So, I have no idea on how "large" large actually is.&lt;/p&gt;&lt;p&gt;(c). Can someone , who has more experience tell me if the results are sane? The readings were done by a script that I wrote, and I compared my results to Intel provided "sccTherm". sccTherm does not provide time-stamps, nor does it provide any control on frequency of observations recorded, also it used PCIe connection, which I think is best not used.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;If you need numerical data, please drop me a line.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Thanks for any help.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Devendra Rai&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:0c18e71b-eec1-4a06-9c53-d43847911cf6] --&gt;</description>
      <pubDate>Mon, 14 Mar 2011 22:35:26 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/118280?tstart=0#118280</guid>
      <dc:date>2011-03-14T22:35:26Z</dc:date>
      <clearspace:dateToText>2 years, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>1</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/117840?tstart=0#117840</link>
      <description>&lt;!-- [DocumentBodyStart:6d18d9c2-f9f5-40e2-b880-1a150cba2acc] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Thank you very much for your reply. I read the counters when all tiles run at the same frequency and use sccTherm &amp;#8211;initTherm to initilize refresh rate first, so I think the settings should be same for all tiles. I have this question because I found the difference between the readings of two counters can be very large. For example, the difference of the two counters on tile 0 can be as large as 500 when there are no programs running on tile 0 (the frequency is 533 MHz and the sampling rate is 2.56 us). On the other hand, the sensor reading difference between a tile running two frequencies is not that significant. For example, the difference of sensor readings when tile 0 runing at 533MHz and 100MHz is less than 50. It is a little bit strange to me that the temperature difference caused by power change is much less than location difference. I was wondering if someone has similar experience.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Best,&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Yang&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:6d18d9c2-f9f5-40e2-b880-1a150cba2acc] --&gt;</description>
      <pubDate>Tue, 08 Mar 2011 16:39:44 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/117840?tstart=0#117840</guid>
      <dc:date>2011-03-08T16:39:44Z</dc:date>
      <clearspace:dateToText>2 years, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>3</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/117809?tstart=0#117809</link>
      <description>&lt;!-- [DocumentBodyStart:50a8b933-4725-42b3-ac0f-696006022a08] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;Hello&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;Yes, I guess so. The count is sort of inversely proportional to temperature. When you read the count, make sure that other parameters like your register refresh rate remain constant. Else, you will have to compare apples to oranges.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;HTH.&lt;/p&gt;&lt;p&gt;&lt;br/&gt;Devendra&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:50a8b933-4725-42b3-ac0f-696006022a08] --&gt;</description>
      <pubDate>Tue, 08 Mar 2011 09:37:40 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/117809?tstart=0#117809</guid>
      <dc:date>2011-03-08T09:37:40Z</dc:date>
      <clearspace:dateToText>2 years, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>4</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/117770?tstart=0#117770</link>
      <description>&lt;!-- [DocumentBodyStart:58d59cc4-c629-48ee-87ad-0a6da9039940] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I am wondering if the reading of one temperature sensitive counter is comparable to that of other temperature sensitive counters? For example, if the sensor readings on tile 0 are smaller than readings on tile 1, can I say tile 0 is hotter than tile 1?&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:58d59cc4-c629-48ee-87ad-0a6da9039940] --&gt;</description>
      <pubDate>Tue, 08 Mar 2011 04:04:20 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/117770?tstart=0#117770</guid>
      <dc:date>2011-03-08T04:04:20Z</dc:date>
      <clearspace:dateToText>2 years, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>5</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/117545?tstart=0#117545</link>
      <description>&lt;!-- [DocumentBodyStart:e0833f17-7f10-4a38-9852-d0fcd2003063] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;There are no temperature sensors on the SCC chip or board. We do have counters on the chip that are temperature sensitive. You have to do the calibration yourself. The file "How to Read the Thermal Sensor Registers" describes how to use and read these counters. It also provides a link to an sccTherm program that we used internally. Is there any additional information we can provide?&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p&gt;&lt;a class="jive-link-wiki-small" data-containerId="2267" data-containerType="14" data-objectId="5997" data-objectType="102" href="http://communities.intel.com/docs/DOC-5997"&gt;http://communities.intel.com/docs/DOC-5997&lt;/a&gt;&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:e0833f17-7f10-4a38-9852-d0fcd2003063] --&gt;</description>
      <pubDate>Fri, 04 Mar 2011 18:22:08 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/117545?tstart=0#117545</guid>
      <dc:date>2011-03-04T18:22:08Z</dc:date>
      <clearspace:dateToText>2 years, 2 months ago</clearspace:dateToText>
      <clearspace:replyCount>6</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/113112?tstart=0#113112</link>
      <description>&lt;!-- [DocumentBodyStart:1b90aa58-c54b-46c8-9516-d9007d1e957f] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p style="text-align: left;"&gt;First of all, thanks for all the useful information and all the well done documents and suggestion you keep posting in this community.&lt;/p&gt;&lt;p style="text-align: left;"&gt;We have access to a local SCC system. Probably as you suggest, in order to avoid surrounding temperature changing it could be useful to probe the temperature inside the case. Do you know if there is any temperature sensors already built in in the&amp;nbsp; rocky lake board that could be used for this purpose?&amp;nbsp; I will make few test on the thermal sensors in the following weeks, actually our research topic is in thermal and energy management. So using the thermal sensors in a reliable way is of primary importance for us.&lt;/p&gt;&lt;p style="text-align: left;"&gt;Regarding the performance counters, now I'm tring to use it from inside the Linux kernel. I'm patching it to create a kernel module that throught a set of IO control can start, stop it and read it. Now still end debugging it. Unfortunately I'm not an expert on PAPI, actually I never use it before. In the past I had always worked with self made modification on the linux kernel to use the performance counters.&amp;nbsp; I like to have low level control in what i see.&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:1b90aa58-c54b-46c8-9516-d9007d1e957f] --&gt;</description>
      <pubDate>Fri, 14 Jan 2011 14:40:06 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/113112?tstart=0#113112</guid>
      <dc:date>2011-01-14T14:40:06Z</dc:date>
      <clearspace:dateToText>2 years, 4 months ago</clearspace:dateToText>
      <clearspace:replyCount>7</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/113100?tstart=0#113100</link>
      <description>&lt;!-- [DocumentBodyStart:045bfbbf-d8a7-4a5a-b8dc-a187656b002b] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p style="margin-left: 0.5in;"&gt;Are you running remotely? It's very difficult even when running locally to get quantittive results from this calibration. You have to calibrate with the chip in a temperature-controlled environment.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p style="margin-left: 0.5in;"&gt;The voltage changes do afffect the calibration, but I think it&amp;#8217;s small compared to other effects. Random changes in the ambient temperature of the data center may even be more significant. When you are working remotely, the sensor registers are useful, but mostly in a qualitative sense.&lt;/p&gt;&lt;p style="min-height: 8pt; height: 8pt; padding: 0px;"&gt;&amp;nbsp;&lt;/p&gt;&lt;p style="margin-left: 0.5in;"&gt;Are you also using the performance counters? Are you using PAPI? Did you make a new kernel with PAPI included? If you have information to report about how to use the performance counters, that would be much appreciated .Did you go back in the PAPI archives for the last version which supported P54C?&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:045bfbbf-d8a7-4a5a-b8dc-a187656b002b] --&gt;</description>
      <pubDate>Fri, 14 Jan 2011 06:26:52 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/113100?tstart=0#113100</guid>
      <dc:date>2011-01-14T06:26:52Z</dc:date>
      <clearspace:dateToText>2 years, 4 months ago</clearspace:dateToText>
      <clearspace:replyCount>8</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
    <item>
      <title>Re: Temperature Sensors</title>
      <link>http://communities.intel.com/message/113018?tstart=0#113018</link>
      <description>&lt;!-- [DocumentBodyStart:63f586fa-1667-44fd-8d90-6fd34c957355] --&gt;&lt;div class="jive-rendered-content"&gt;&lt;p&gt;I have a question. I read the ReadSensor.pdf documents and I was worried by the dependency of the voltage supply and the temperature relations between temperature sensor reading and real temperature. My question is if do not change the voltage value explicitly is the tiles voltage changing anyway? Should I consider it or I can forgot about it.&lt;/p&gt;&lt;p&gt;This because if I execute sccBMC -c status with all the core running linux in idle, i see small fluctuation on the voltage level. Is this one significant in therm of error on the temperature sensor value meaning?&lt;/p&gt;&lt;p&gt;Thanks,&lt;/p&gt;&lt;p&gt;Andrea&lt;/p&gt;&lt;/div&gt;&lt;!-- [DocumentBodyEnd:63f586fa-1667-44fd-8d90-6fd34c957355] --&gt;</description>
      <pubDate>Thu, 13 Jan 2011 16:34:01 GMT</pubDate>
      <author>webadmin@intel.com</author>
      <guid>http://communities.intel.com/message/113018?tstart=0#113018</guid>
      <dc:date>2011-01-13T16:34:01Z</dc:date>
      <clearspace:dateToText>2 years, 4 months ago</clearspace:dateToText>
      <clearspace:replyCount>9</clearspace:replyCount>
      <clearspace:objectType>0</clearspace:objectType>
    </item>
  </channel>
</rss>

