13 Replies Latest reply on Oct 2, 2011 3:59 PM by ohntz

    marc011 - rck00 slower than the other cores?




      I am running an application on all 48 cores, in marc011, and I measure the time it takes the program to finish in each core.

      For some reason rck00 is consistently about x2 slower than others.


      Is it because it is running another process (and therefore it gives my app only 1/2 of its resources)?


      What can I do about this?




        • 1. Re: marc011 - rck00 slower than the other cores?

          I forgot to add that this is a prototype - the app is symmetric between all cores, and no messages are at use.

          (so, rck00 never waits for anything during the app)

          • 2. Re: marc011 - rck00 slower than the other cores?

            rck00 is not doing anything special unless you tell it to ... like with a

            if(iam == 0) {...}


            I hadn't seen rck00 running about 1/2 the speed of other cores. If you tell me where your app is located, I'll run it on another marc system and see if rck00 also runs slower. there.

            • 3. Re: marc011 - rck00 slower than the other cores?

              I just made a try.c program that counts to 4,000,000,000 and prints the timer.


              Ran it on all 48 cores in marc011.


              All cores finished in 1.6 seconds,

              except for rck00 - it finished in 3.2 seconds.


              You can find it in:


              • 4. Re: marc011 - rck00 slower than the other cores?

                I didn't see the try.c anywhere but I did find the try executable.

                I ran it on marc011 (which has 1.3.0)  and on marc101 (which has

                I attached script files for each run. I'm not sure how exactly to interpret your output. Can you take a look at the two output files and see if your issue is unique to marc011 or not? If it does look like a marc011 problem, please file a bug.

                • 5. Re: marc011 - rck00 slower than the other cores?

                  Hi Ted,



                  I didn't put try.c there but it's there now. It's the simplest code.


                  You attached two outputs, you may grep the word "time" in the output to get this:

                  Running on marc011 you got that rck01-rck47 ran for 3 seconds and rck00 ran for 6 seconds.

                  Running on marc101 you got that rck00-rck47 all ran for 3 seconds.


                  So it seems that the problem does not occur in every version / board.


                  By the way -

                  When I run it it takes 1.6 seconds and 3.2 seconds.

                  When you run it it takes 3 seconds and 6 seconds.

                  Why is that?!


                  Thanks again!

                  • 6. Re: marc011 - rck00 slower than the other cores?




                    is it possible for you to share source code?

                    • 7. Re: marc011 - rck00 slower than the other cores?

                      When you run the same executable on marc011 you get 1.6 sec (rck01-rck48/3.2 sec (rck00) and I get 3.0 sec(rck01-rck48)/6.0 sec (rck00). This seems strange ... I did not even recompile. Are you using rccerun to execute your program? Did you change the CPU frequency?


                      There's nothing special about the source code, As Ohn says, it's pretty simple. Call a barrier, start a timer, do a sum in nested for loops, end the timer, print results.But it's Ohn's code and Ohn's decision whether to post it or not.


                      Has rck00 always been slower or is this a new development? If you want us to replace the CPU chip, please file a bug.

                      • 8. Re: marc011 - rck00 slower than the other cores?

                        Hi Nil, Ted,


                        The source code:


                        The executable:



                        "try.c" was made just to show you the problem. You are welcome to use the code however you like. Maybe use it as a test to see if the problem occurs in other stations?



                        I don't change the frequency or do anything exotic there.

                        I don't know whether this always happened or whether it is new. I just noticed it.

                        I run it with:

                        /home/ohntz/RCCE_V1.0.13/rccerun -nue 48 -f /home/ohntz/RCCE_V1.0.13/hosts/rc.hosts try


                        rc.hosts is the default thing (00, 01, 02, ..., 47). I never touched it.


                        I don't want to file a bug and replace the chip. It doesn't feel like a HW problem to me yet.


                        Maybe the sccKit needs an upgrade?

                        What else can it be? Maybe some phantom process that consumes 50% of rck00?

                        Can you try this little benchmark in more stations? More versions?

                        Maybe we will close on the problem once we know more.



                        • 9. Re: marc011 - rck00 slower than the other cores?

                          There is a minor difference ... I use rccerun from the trunk not the latest tag as you do.

                          But we saw a timing difference when I ran on marc011, so on marc011 we are using the same hw and the same executable. The version of Linux and the sccKit is irrelevant.


                          We could take rccerun out of the picture entirely. What we do to do this is to invoke sccKonsole on all cores (requires a desktop, usually through VNC). Then set rck00 to broadcast its input to all cores. Then copy the executable from /shared to /root. You might get a little better performance by not going to a mounted directory for the executable.


                          Then, issue

                          try 48 0.533 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47

                          This line shows the command-line arguments that rccerun puts in for you. The 48 is the number of cores; the 0.533 is the frequency (this doesn't actually change the frequency; it just tells rcce what value to use when calculating times) and then the list of all the cores on which to run.


                          If you want to upgrade from 1.3.0 to, please file a bug under "Marc Admnistration Needed". The perfomance baseline should change with this upgrade (for the better). Because the baseline changes, we've been requiring the approval of the PI. So if that's not you, having the PI comment on the upgrade bug would be sufficient.


                          Note that no one but you has access to your directory under /shared. Well, I do because I have marcadmin privileges, but Nil is not going to be able to see the code unless you post it here.

                          • 10. Re: marc011 - rck00 slower than the other cores?

                            Nil, here is the code:


                            #include <stdio.h>
                            #include <stdlib.h>
                            #include <assert.h>
                            #include <string.h>
                            #include <pthread.h>
                            #include <errno.h>
                            #include <sched.h>
                            #include <stdint.h>
                            #include "RCCE.h"


                            int RCCE_APP(int argc, char **argv) {


                              double timer;
                              int i, j;
                              unsigned int sum;


                              RCCE_init(&argc, &argv);


                              sum = 0;


                              // synchronize before starting the timer


                              // Get time
                              timer = RCCE_wtime();


                              for (i=0; i<400;i++) {
                                for (j=0; j<10000000; j++)
                                  sum += 1;


                              // Get time
                              timer = RCCE_wtime() - timer;
                              printf ("UE %d - Time = %G, sum = %d\n", RCCE_ue(), timer, sum);




                              // Exit the test


                            } // end RCCE_APP()

                            • 11. Re: marc011 - rck00 slower than the other cores?



                              @ Ohn


                              Thank you for the code.


                              I have tried it on marc022 and i see no difference in rumtimes for cores.

                              output attached.


                              RCCE (v 213) from trunk and sccKit 1.0.4 (not sure if any patch applied or not).




                              • 12. Re: marc011 - rck00 slower than the other cores?

                                Thanks, Nil. Interesting that you see 3.0 seconds also.

                                Ohn, do you know the last time marc011 was power-cycled? If there is some strange process on rck00, a power cycle and reboot may get rid of it.

                                • 13. Re: marc011 - rck00 slower than the other cores?

                                  Sorry for the late response.

                                  I just got back to this subject.


                                  After more than a month, the problem is no more.


                                  If no one actually fixed anything - I guess that the many resets did the job.


                                  Resets solve many problems... Also for my PC at home