14 Replies Latest reply on Jan 17, 2012 3:42 AM by aprell

    Flags and performance issues




      1. Is RCCE_flag_read() as fast as (or as slow as) RCCE_get() from MPB to local buffer? I don't find documentation about flag accesses' speed.
      2. Same question with RCCE_flag_write() and RCCE_put().
      3. Can we exploit the fact that when SINGLEBITFLAGS=0 flags sit byte by byte - to the favor of reading several flags in one access?
        When I try RCCE_get(32 bytes) from the location of the first flag - I get a "source error" because the flags are out of bounds for RCCE_get(). If there was a way to read 32 flags with one shot it would be great.


      Thanks and happy holidays!


        • 1. Re: Flags and performance issues

          1+2. RCCE_flag_read()/RCCE_flag_write() basically read/write a single byte, whereas RCCE_get()/RCCE_put() move an entire cache line or multiple cache lines at once. Well, it depends on what you want to do... If you're referring to 3., I think RCCE_get() would be the better choice.


          3. In principle, yes. Have a look at the following example (tested with emulator). Note that RCCE_init() allocates two flags in gory mode, so the first flag allocated by the user is actually the third flag in the corresponding cache line.
          #include <stdio.h>
          #include <stdlib.h>
          #include <assert.h>
          #include <unistd.h>
          #include "RCCE.h"
          #include "RCCE_lib.h"
          #define WORKER(i) if (ID == (i))
          #define LOG(...) { printf(__VA_ARGS__); fflush(stdout); }
          #define NFLAGS 32
          extern RCCE_FLAG_LINE RCCE_flags;
          int RCCE_APP(int argc, char *argv[])
               int ID, i;
               RCCE_FLAG flags[NFLAGS];
               dup2(STDOUT_FILENO, STDERR_FILENO);
               RCCE_init(&argc, &argv);
               // RCCE_comm_split() in RCCE_init() allocates two 
               // synchronization flags!
               assert(RCCE_flags.members == 2);
               ID = RCCE_ue();
               for (i = 0; i < NFLAGS-2; i++) {
               // Now we have a cache line full of flags
               // The first two flags in the line are internal flags!
               assert(RCCE_flags.members == NFLAGS);
               // Set user flags 1, 2, 5, and 29 on core 2
               WORKER(0) {
                    RCCE_flag_write(&flags[1], RCCE_FLAG_SET, 2);
                    RCCE_flag_write(&flags[2], RCCE_FLAG_SET, 2);
                    RCCE_flag_write(&flags[5], RCCE_FLAG_SET, 2);
                    RCCE_flag_write(&flags[29], RCCE_FLAG_SET, 2);
               WORKER(1) {
                    unsigned char f[NFLAGS];
                    // Get all 32 flags
                    RCCE_get(f, flags[0].line_address, NFLAGS, 2);
                    // Only print user flags
                    for (i = 2; i < NFLAGS; i++) {
                         LOG("Flag %2d: %d\n", i-2, f[i]);
               for (i = 0; i < NFLAGS-2; i++) {
               return 0;
          • 2. Re: Flags and performance issues

            The error returned from RCCE_get() was my mistake. Let's forget about it...


            Anyway, I am interested in SINGLEBITFLAGS=0 and not 1. Therefore I can't use the .line_address as you used in your example. I need to cast the flag array address to (t_vcharp).

            I have:

            RCCE_get(f, (t_vcharp)flags[0], NFLAGS, 2);

            instead of your line:

            RCCE_get(f, flags[0].line_address, NFLAGS, 2);


            I ran this example with that line change on marc011, a real chip. According to the prints - all flags are UNSET.

            When I try RCCE_flag_read() - I get that 1, 2, 5, 29 are SET. So I know the flags hold correct values, but I just can't read them with RCCE_get().


            Your example did not work for me also with SINGLEBITFLAGS=1.


            Can you please try it on a real HW?



            • 3. Re: Flags and performance issues

              Yes, SINGLEBITFLAGS=0 means byte flags, right? That line_address is important because you need to get the whole cache line with all the flags. Should work regardless of flag type. Taking just &flags[0] on the other hand doesn't work if flag[0] is not the first flag in the cache line, as in this example. Can you check the return code of RCCE_get(), if it's non-zero? I can test it on real hardware, but unfortunately not before next week. I'll be away for the rest of the week.

              • 4. Re: Flags and performance issues

                Yes I am interested in BYTE flags, not BIT flags.


                From RCCE.h:

                #ifdef SINGLEBITFLAGS
                     typedef struct {
                        int  location;      /* location of bit within line (0-255)  */
                        t_vcharp line_address; /* start of cache line containing flag  */
                     }  RCCE_FLAG;
                     typedef volatile int *RCCE_FLAG;


                So line_adderss is only defined for BIT-flags (well RCCE_get() is not getting the line for me in BIT flags mode either, but that's a different story, also important I guess).


                I also tried RCCE_get() from an address which was the alignment of &flag[0] to a cache-line, using "& 0xFFFFFFE0" (it ends with C0 after alignment) - but all I get is just junk which is not the flags we set.


                Is there any other way to get the flag's address in MPB, in BYTE-flags mode?


                The return code of RCCE_get() in all examples is 0 - success.



                • 5. Re: Flags and performance issues

                  Oh, I see. Which version of RCCE are you using? I tested against the trunk, which has this definition for RCCE_FLAG:


                  typedef struct {
                       int location;          /* location of flag within line (0-31 or 0-255) */
                       t_vcharp flag_addr;    /* address of byte containing flag inside cache line */
                       t_vcharp line_address; /* start of cache line containing flag */
                  } RCCE_FLAG;


                  Is it possible for you to update?

                  • 6. Re: Flags and performance issues

                    I was using RCCE_V1.0.13.

                    In RCCE_V1.0.13 it doesn't work also for the SINGLEBITFLAGS=1, where .line_address is defined in a similar manner to the trunk version.


                    - With the trunk version it works alright - I read a bunch of flags with one shot.


                    This feature is useful for applications in which cores can buffer more than a single new message, and sometimes the cores need to check for new messages. I wonder why this issue with V1.0.13 was not raised here earlier.


                    Andreas, thanks for the support!



                    • 7. Re: Flags and performance issues

                      Just to verify ... are you saying that 1.0.13 is broken? And the trunk is working? If so I'll make another tag. About time to call is V2.0.

                      • 8. Re: Flags and performance issues

                        I am not sure what you call "broken".

                        When non-SINGLEBITFLAGS - The fields of the struct RCCE_FLAG simply do not exist in 1.0.13

                        When SINGLEBITFLAGS - Some fields exist but when I use them I don't get what I wanted (reading 256 flags with one MPB access).


                        If you need to decide whether 1.0.13 is "broken" or not, then maybe it should be investigated some more (and maybe there is a workaround to the above problems).


                        For me - the problem was solved by migrating to the trunk version.

                        • 9. Re: Flags and performance issues

                          I just checked 1.0.13... This version of RCCE still implements cache line flags instead of byte flags. The single-bit flags code has also changed significantly over time. Does everything work for you now in the current version? With both flag types?

                          • 10. Re: Flags and performance issues

                            We don't want our latest release doing cacheline flags. I made a RCCE 2.0 from the trunk.


                            Our release tags are snapshots. We don't update them.


                            I also changed the lower limit of RC_V_MHz_cap[] to 0.8 from 0.7. The 0.7 was causing some instability on some (not all) chips.

                            So right now the trunk and RCCE_V2.0 are identical.

                            • 11. Re: Flags and performance issues

                              Hi guys,


                              I think there are still problems with RCCE_get() from MPB to local memory buffer when you read flags.


                              Attached is flags.c, which is a simple experiment of how RCCE_get() gets flags with 1 access.

                              A few days ago I confirmed it works, but today I wanted to test the flags allocation behavior. I noticed that "not always" RCCE_get() really gets the cache line.

                              The program allocates and sets some flags and then reads the cache line with RCCE_get(). I print the line_address to be sure there is no mistake in address.


                              I run it again and again. "most" of the times the results are the cache line with my 30 flags and 2 reserved bytes, as expected.

                              But the other times RCCE_get() simply gets all zeros.


                              The same behavior is observed with bit flags and with byte flags.

                              I tried booting the SCC several times, but it didn't help.


                              I am running from marc011 and the RCCE version is the trunk version from January 9th.


                              Can anyone explain what I see?

                              Do you get the same?




                              • 12. Re: Flags and performance issues

                                Hi Ohn,


                                I just opened your example and saw that the barrier was commented out. I guess unintentionally? There is a line-continuation \ at the end of your comment, just above the barrier.

                                • 13. Re: Flags and performance issues

                                  Ohhh it would have taken me years to discover that... It explains the randomness which I saw.

                                  That line was copied from another syntax.


                                  Thanks and sorry about that.

                                  RCCE_get() seems to work perfectly with the flags now.

                                  • 14. Re: Flags and performance issues

                                    No problem. This is one case where it pays off to have good syntax highlighting. Otherwise it's really hard to spot.