1 2 Previous Next 15 Replies Latest reply on Feb 28, 2011 3:03 PM by tedk

    Emulator runtime is different each run

    ohntz

      Hi,

       

      I am running a program of mine on the RCCE emulator. The run uses a single CPU.

       

      I measure the runtime of a code block with RCCE_wtime() at the beginning and end.

      Nothing is random in my code, so I expect the results to be deterministic - each run should have same performance.

       

      Nevertheless, I get different results in each run:

      Most of the time the results are around 1uSec.

      Sometimes the results are around 2uSec (it just looks double, results are never really in-between 1uSec and 2uSec).

       

      My Qs:

      • What is the reason for this variance in results?
      • Even if results were always around 1uSec, why aren't they the same?

       

      Thanks,

      Ohn

        • 1. Re: Emulator runtime is different each run
          aprell

          In emulator mode, RCCE_wtime() directly maps to omp_get_wtime(), which works with a resolution of either 1us or 1ns, depending on the availability of clock_gettime() on your system. I suspect you end up with the lower resolution here... See if you can use this function instead of RCCE_wtime():

           

          #include <time.h>

           

          double get_wtime(void)
          {

               struct timespec ts;

               clock_gettime(CLOCK_MONOTONIC, &ts);

               return ts.tv_sec + ts.tv_nsec/1e9;
          }

          • 2. Re: Emulator runtime is different each run
            ohntz

            I tried and it still gives me a different result each run (always around 0.01 second now).

             

            Clock resolution (given by clock_getres()) is

            SEC     0

            NSEC   1,000,000

             

            I have a feeling this is not what I need.

            I am trying to understand how many clock-cycles it takes the SCC to finish a code block of mine.

            What tools do you have for performance measuring?

             

            Thanks!

            Ohn

            • 3. Re: Emulator runtime is different each run
              aprell

              Unless you find a way to count the number of processor cycles, it's probably the best bet to use clock_gettime with a high-resolution time source. It looks like the clock you're using is tied to the resolution provided by the kernel's system timer. One additional thing you could try out is replace CLOCK_MONOTONIC with CLOCK_PROCESS_CPUTIME_ID or CLOCK_THREAD_CPUTIME_ID. Both should have a higher resolution (1ns on my system). Repeat your measurements a couple times and see how long it takes on average.  

               

              Have you actually tried to use RCCE_wtime() on the SCC?

               

              Regarding performance tools, I've seen some discussion about porting PAPI to the SCC. Perhaps you can look into that.

              • 4. Re: Emulator runtime is different each run
                jheld

                Why aren't you using the instruction to read the timestamp register? Read before, read after, diffference is number of clocks for execution.Will work on SCC and on the emulator.   Take the lowest of several passes through the code to deal with interrupts that may fall within your block.

                • 5. Re: Emulator runtime is different each run
                  aprell

                  Good point. This is what clock_gettime does on x86, I think. I wasn't sure about the frequency of the timestamp counter. Is it incremented on every processor clock cycle?

                  • 6. Re: Emulator runtime is different each run
                    jheld

                    Yes, by definition incremented each clock cycle. From the manual:

                     

                    The Pentium processor maintains a 64-bit Time Stamp Counter (EDX[4:4]} that increments
                    every clock cycle
                    . The RDTSC instruction copies the content of the Time Stamp Counter
                    into EDX:EAX. EDX is loaded with the high-order 32 bits, and EAX is loaded with the loworder
                    32 bits.

                    1 of 1 people found this helpful
                    • 7. Re: Emulator runtime is different each run
                      ohntz

                      Hi Jim,

                       

                      I didn't know about this clk counter. It sounds great for my needs.

                      I just don't understand how to use it.. Maybe you can help:

                       

                      1.

                      I tried using the manual but it left me confused.

                      How can one invoke RDTSC during runtime?

                      How can one then read EDX:EAX, or any other register?

                       

                      2.

                      Doc SCCProgrammersGuide, chapter 6.2 - It says I need a 'config.h' file in order to read .

                      I am using RCCE_V1.0.13 emulation and I don't have this file. Instead, I see the file 'SCC_API.h' which is documented as if it was 'config.h'.

                      Assuming this is what I need to include, I added 'SCC_API.h' to the Makefile and I tried to use its 'ReadConfigReg()' function, but I get the error message:

                           try.o:try.c:(.text+0x1132): undefined reference to `ReadConfigReg(unsigned int)'

                      although the function is declared right there.

                       

                      Thanks,

                      Ohn

                      • 8. Re: Emulator runtime is different each run
                        philippg

                        RDTSC is an asm instruction, there are many examples available on the Internet - cf. wikipedia for example: http://en.wikipedia.org/wiki/Time_Stamp_Counter#C.2B.2B although there are a lot of examples with more explanation available elsewhere. Just google for it.

                        • 9. Re: Emulator runtime is different each run
                          tedk

                          If you are doing inline assembly and using RCCE, you should use the gas syntax (also called the AT&T syntax), not the MASM syntax. RCCE already has some of that AT&T syntax, and I found that you can't reliably mix MASM with it.

                          • 10. Re: Emulator runtime is different each run
                            ohntz

                            Sorry I just don't get it.

                             

                            Maybe explain by an example. I will then search for more info about it. What would you do if you wanted to measure how many cycles it takes the processor to compute   x=x+5?

                             

                            int func(int x)

                            {

                            <$>

                            x = x + 5;

                            <$>

                            return x;

                            }

                             

                            What do I have to write instead of <$> ?

                             

                            Thanks.

                            • 11. Re: Emulator runtime is different each run
                              aprell

                              Here is an example:

                               

                              static inline unsigned long long getticks(void)
                              {

                                   unsigned int lo, hi;

                                   // RDTSC copies contents of 64-bit TSC into EDX:EAX

                                   asm volatile ("rdtsc" : "=a" (lo), "=d" (hi));

                                   return (unsigned long long)hi << 32 | lo;
                              }

                               

                              Use it like this:

                               

                              unsigned long long start, end;

                               

                              start = getticks();

                              ...

                              end = getticks();

                               

                              printf("Code took %llu cycles\n", end - start);

                              1 of 1 people found this helpful
                              • 12. Re: Emulator runtime is different each run
                                tedk

                                Yes, config.h got changed to SCC_config.h.

                                The ReadConfigReq() is in SCC_API.c .. which is not included when you build the emulator. Note the following in the Makefile.

                                 

                                ifeq ($(OMP_EMULATOR),0)
                                  PLATFORMOBJS=SCC_API.o
                                else
                                  PLATFORMOBJS=RCCE_emulator_driver.o
                                endif

                                 

                                The RCCE emulator is intended to emulate application programs, not to simulate the chip. The RCCE emulator doesn't provide you simulated access to the SCC configuration registers.

                                 

                                In fact, I think that's why the name change occurred. The prepended SCC_ is meant to indicate that the file is for use on SCC hardware.

                                • 13. Re: Emulator runtime is different each run
                                  tedk

                                  About the emulator ... I should add that we are interested in finding out what functions users would like to see in the emulator. So please bring up requests. And note that RCCE source code is available. If you have contributions or bug fixes, please share them.

                                   

                                  The emulator cannot of course let you pretend you have an SCC chip, albeit slower. It is intended primarily to emulate applications. But there is some limited simulation, especially in the area of power management. Look under #ifndef COPPERRIDGE in RCCE_power_management.c.

                                   

                                  In the RCCE trunk COPPERRIDGE has been changed to SCC. COPPERRIGE was the name of the previous board containing the RCK chip. RocklyLake is the name of the current board. Using SCC gets hw away from the names of individual boards.

                                  • 14. Re: Emulator runtime is different each run
                                    ohntz

                                    I understand now.

                                    I can't simulate the HW...

                                    I can only measure performance when I am running on real SCC, from remote. True?

                                    1 2 Previous Next