In emulator mode, RCCE_wtime() directly maps to omp_get_wtime(), which works with a resolution of either 1us or 1ns, depending on the availability of clock_gettime() on your system. I suspect you end up with the lower resolution here... See if you can use this function instead of RCCE_wtime():
struct timespec ts;
return ts.tv_sec + ts.tv_nsec/1e9;
I tried and it still gives me a different result each run (always around 0.01 second now).
Clock resolution (given by clock_getres()) is
I have a feeling this is not what I need.
I am trying to understand how many clock-cycles it takes the SCC to finish a code block of mine.
What tools do you have for performance measuring?
Unless you find a way to count the number of processor cycles, it's probably the best bet to use clock_gettime with a high-resolution time source. It looks like the clock you're using is tied to the resolution provided by the kernel's system timer. One additional thing you could try out is replace CLOCK_MONOTONIC with CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID. Both should have a higher resolution (1ns on my system). Repeat your measurements a couple times and see how long it takes on average.
Have you actually tried to use RCCE_wtime() on the SCC?
Regarding performance tools, I've seen some discussion about porting PAPI to the SCC. Perhaps you can look into that.
Why aren't you using the instruction to read the timestamp register? Read before, read after, diffference is number of clocks for execution.Will work on SCC and on the emulator. Take the lowest of several passes through the code to deal with interrupts that may fall within your block.
Good point. This is what clock_gettime does on x86, I think. I wasn't sure about the frequency of the timestamp counter. Is it incremented on every processor clock cycle?
1 of 1 people found this helpful
Yes, by definition incremented each clock cycle. From the manual:
The Pentium processor maintains a 64-bit Time Stamp Counter (EDX[4:4]} that increments
every clock cycle. The RDTSC instruction copies the content of the Time Stamp Counter
into EDX:EAX. EDX is loaded with the high-order 32 bits, and EAX is loaded with the loworder
I didn't know about this clk counter. It sounds great for my needs.
I just don't understand how to use it.. Maybe you can help:
I tried using the manual but it left me confused.
How can one invoke RDTSC during runtime?
How can one then read EDX:EAX, or any other register?
Doc SCCProgrammersGuide, chapter 6.2 - It says I need a 'config.h' file in order to read .
I am using RCCE_V1.0.13 emulation and I don't have this file. Instead, I see the file 'SCC_API.h' which is documented as if it was 'config.h'.
Assuming this is what I need to include, I added 'SCC_API.h' to the Makefile and I tried to use its 'ReadConfigReg()' function, but I get the error message:
try.o:try.c:(.text+0x1132): undefined reference to `ReadConfigReg(unsigned int)'
although the function is declared right there.
RDTSC is an asm instruction, there are many examples available on the Internet - cf. wikipedia for example: http://en.wikipedia.org/wiki/Time_Stamp_Counter#C.2B.2B although there are a lot of examples with more explanation available elsewhere. Just google for it.
If you are doing inline assembly and using RCCE, you should use the gas syntax (also called the AT&T syntax), not the MASM syntax. RCCE already has some of that AT&T syntax, and I found that you can't reliably mix MASM with it.
Sorry I just don't get it.
Maybe explain by an example. I will then search for more info about it. What would you do if you wanted to measure how many cycles it takes the processor to compute x=x+5?
int func(int x)
x = x + 5;
What do I have to write instead of <$> ?
1 of 1 people found this helpful
Here is an example:
static inline unsigned long long getticks(void)
unsigned int lo, hi;
// RDTSC copies contents of 64-bit TSC into EDX:EAX
asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));
return (unsigned long long)hi << 32 | lo;
Use it like this:
unsigned long long start, end;
start = getticks();
end = getticks();
printf("Code took %llu cycles\n", end - start);
Yes, config.h got changed to SCC_config.h.
The ReadConfigReq() is in SCC_API.c .. which is not included when you build the emulator. Note the following in the Makefile.
The RCCE emulator is intended to emulate application programs, not to simulate the chip. The RCCE emulator doesn't provide you simulated access to the SCC configuration registers.
In fact, I think that's why the name change occurred. The prepended SCC_ is meant to indicate that the file is for use on SCC hardware.
About the emulator ... I should add that we are interested in finding out what functions users would like to see in the emulator. So please bring up requests. And note that RCCE source code is available. If you have contributions or bug fixes, please share them.
The emulator cannot of course let you pretend you have an SCC chip, albeit slower. It is intended primarily to emulate applications. But there is some limited simulation, especially in the area of power management. Look under #ifndef COPPERRIDGE in RCCE_power_management.c.
In the RCCE trunk COPPERRIDGE has been changed to SCC. COPPERRIGE was the name of the previous board containing the RCK chip. RocklyLake is the name of the current board. Using SCC gets hw away from the names of individual boards.
I understand now.
I can't simulate the HW...
I can only measure performance when I am running on real SCC, from remote. True?