Regarding setting STPCLK bit to zero, after changing it to zero did you tried to read it back?
as i tried same thing (using Send Flit via SystemIF) and reading on meter did not changed, so I read the value in register GLCFG0 and found out that STPCLK bit was still 1.
P.S. How did you access the STPCLK bit from rcce program?
Thank you very much for you reply! I can set the STPCLK to 0, and if I read it back immediately, it's zero. The default value for the configure register is 348df8 in my machine, and it changes to 348db8 after I set the STPCLK to 0. However, If I ran a program on the core which I turn off before, the STPCLK will automatically change back to 1. I am wondering if you have the similar case?
I use mmap in my rcce program to change the core configuration register values. It is very similar as the second example named readTileID.c in SCCProgrammersGuide (in the page 42-43). (http://communities.intel.com/docs/DOC-5684)
You have three configuration bits that can control the clock. And if you want to shut off power to a core, you can use all three.
You have the STPCLK bit in the Core Configuration Register (look in Table 7 of the SCC EAS), and you have two bits in the L2 Cache Configuration Register. These are the two clock gating bits called stopl2ccclk and stopl2arrayclk (look in Table 9 of the SCC EAS).
STPCLK will gate the clock to the CPU. Even with this gating, you still have leakage current and hence some leakage power. But if all you use is STPCLK, you'll have both dynamic power and leakage power from L2. By setting the other two bits, you can avoid dynamic power from L2.
I haven't actually tried this. My answer is from reading the SCC EAS and talking to some hardware folks a couple of offices over.
A difference of 20w in power with the SCC at the same frequency and different voltages is reasonable (CV**2f + I_leak*V). It's due to both dynamic and leakage power. You can avoid the dynamic power contribution by gating the clock.
I have tried running the rcce program to read values, i am getting different value each time i run the program for same core (either 340df8 or 348df8) have you observed same thing?
When i run it for 48 cores it gives mix result as well, i mean some cores prints 340df8 and some 348df8.
As 340df8 mean STPCLK is 0 and
348df8 mean STPCLK is 1.
Thanks for sharing your informaiton. I only have seen two readings, 348db8 and 348df8, and I never seen 340df8 before. The STPCLK is the 6th bit in the configuration register according to table 7 in the EAS doc, so when the clock is off, I think the value should be 348db8, not 340df8.
Thanks for pointing this! I followed your instruction and set stopl2ccclk and stopl2arrayclk to 1 to clock gates them (it is a little bit weird that set 1 to disable them). What I found is that the power consumption of the SCC goes up quite a bit, more than 10W compare to without gating them. The power consumption of the SCC equals to the power which there is no OS running on the cores. It seems that gating them would completely remove the OS from the cores. I can not ssh to the cores and the power meter goes to gray color in the GUI. By setting these bits back to 0 can not bring the power down and the OS back. I need to reload the OS from the very beginning. I was wondering did any one have similar experience?
P.S. Is it possible to set the voltage to 0 by using some API?
Thank you very much!
"power consumption of the SCC goes up"
Increased power consumption with the clock stopped doesn't make sense to me. Are you certain power goes "UP"?
By "power meter" do you mean the total reported power, or the CPU utilization meters? (they indicate core utilization, not power).
Did you stop the core before stopping the L2 cache clocks? if not wouldn't you expect it to be causing an OS crash?
The core will timeout on its request for data.
"I found when the SCC is running at the same frequency but at different voltage (without running any program),
the power consumption difference can be as high as 20W. I was wondering if the difference could
be the leakage power?"
There may be a some increase in leakage, but I believe most of the increase is active power you are wasting by running at a high voltage than necessary.
You may not be running a program, but the OS is running the idle thread and the clock. Not very busy, since most of the core is not working, but the clocks are running and some work is being done.
In CMOS circuits active power is proportional to frequency and the square of the voltage. Running at higher frequency requires a higher voltage, but going higher than necessary just burns extra power as your experiment is doing.
See the ISSCC paper for a plot of the proportion of leakage to active power at various voltages.
I checked with the SCC team in Germany and they found the source of the STPCLK bit reset. The network driver doesn't do a read/modify/write of the control register when it sets the interrupt bit for the core, so it implicitly resets the STLCLK bit when sending a packet to the core.
They also report seeing a clear drop in power on setting the STPCLK (98W down to 60W) as expected.
Thank you very much for your reply!
Now I understand why power goes up while clock gating the core and L2 cache. I did do the wrong way, I stop the clock of L2 before stop the core. So as you said, the OS crashes. The core consumes more power when there is no OS running on it. For the machine in our lab, the power consumption is about 65W when there is no OS running on it, and is about 47W when there is OS running (in idle state).
However, I was still not be able to see significant power change when I disable all clocks (cores & caches). Please see the attachment for my source code. There is README file to explain how to use it.
Thank you very much!
clockgating.tar 640.0 K
I looked briefly at the SCC Linux kernel, it appears that it already uses HLT in idle. So the core will be very low power when not running a program, but running the OS. When in reset the system is ready to start and is burning more power than HLT.
The power figures you cite seem about right for cores active versus cores in low power mode. The 47W looks about right for the balance of the system, i.e. power used by all 4 memory controllers and the mesh routers as they are still running. Again - please look at the ISSCC paper for a breakdown of these other elements of the SCC processor and their power consumption. Their contribution can be reduced on configuration of the SCC, when mesh, tile and memory clocks are set. Only the tile power contribution can be changed at runtime, core with HLT & STPCLK, tile with frequency divider, and voltage domain (4 tiles) through the Voltage Regulator Controller.
If you want lower power, be certain to figure in the total SCC power and the performance balance of the components based on your workload demands.