9 Replies Latest reply on Jan 9, 2012 9:12 AM by tedk

    RCCE non-singlebitflags


      Hi all,

        I've got a question about RCCE implementation in case of SINGLEBITFLAGS=0 and nongory interface.


      Reading all the documentation (including the RCCE paper by Wijngaart et al) it seems to me that the implementaion is reserving a whole cache line (32 bytes) for each synchronization flag ("sent" and "ready" arrays). So 32 bytes for each flag. If we run an application using 48 cores, we will need  48*2=96 flag per core (all these flag are allocated in the MPB of each exactly with the same offsets) ==> 96*32 bytes = 3072 bytes = 35% of Kb (all these data are reported in various documents).


      Nevertheless, I studied the RCCE code and I'm pretty sure that the routine RCCE_flag_alloc reserves actually 1 byte per flag (and not 32 bytes per flag).

      To be sure of that, I simply printed the address (which is a core virtual address of a MPB location) of each flag in the arrays "sent" and "ready"


      for ( ue=0; ue<RCCE_NP; ue++)

        printf( "%p\n", RCCE_flag_sent[ue].flag_addr );

      for ( ue=0; ue<RCCE_NP; ue++)

        printf( "%p\n", RCCE_flag_ready[ue].flag_addr );


      and I find out that only one byte is allocated for each flag (consider that both translation of a virtual address to a phisical address and translation of a phisical address to a system address by LUT don't change the least significant bits of the addresses, so I can do my considerations with virtual address..).


      So, did I make an interpretation mistake reading the documentation? If one byte is allocated for each flag, considering that RCCE_flag_write must be atomic, why don't we have to use locks when writing to a flag (even if SINGLEBITFLAGS=0)?





        • 1. Re: RCCE non-singlebitflags

          I will have to go  back to the code and check things out to see what was actually done (Rob did the actual work of implementing this feature).   My guess is taht we allocate one byte alligned at the beginning of a cache line.  But that's just a guess.   I need to look at the code to be sure.


          As for the atomics (or lack thereof) remember that the P54C only allows one outstanding write transaction.  We were able to use that to get the safe updates we needed without atomics. 



          • 2. Re: RCCE non-singlebitflags

            I think there have been improvements to RCCE since the original documentation (which admittedly should be updated).


            I look in RCCE_lib.h and see

                 33 #ifdef SINGLEBITFLAGS
                 34 #define RCCE_FLAGS_PER_BYTE 8
                 35 #else
                 36 #define RCCE_FLAGS_PER_BYTE 1
                 37 #endif


            The RCCE_LINE_SIZE is the cache line = 32 bytes. But then I think a second flag is intended to be in the same line when SINGLEBITFLAGS is defined.

            • 3. Re: RCCE non-singlebitflags

              Vincenzo wrote ...

              Nevertheless, I studied the RCCE code and I'm pretty sure that the  routine RCCE_flag_alloc reserves actually 1 byte per flag (and not 32  bytes per flag).

              This is true. But I do see in RCCE_flag_alloc()

                   59   // if this is a new flag line, need to allocate MPB for it
                   60   if (!flagp->line_address) flagp->line_address = RCCE_malloc(RCCE_LINE_SIZE);


              So it seems that when you alloc a new flag, you do malloc() an entire line, even though the flag only takes up a byte of that line. And could this be what's meant by a flag taking up a line?

              • 4. Re: RCCE non-singlebitflags

                Yes, but if you look at the code, 32 different flag are put in the same line (you can check this experimentally as I explained before). So more flag are allocated in the same line, and this behaviour is different from the documentation. Of course it's not a problem for me (or anyone) if things works in this way.


                I just want to understand how flag writes could be atomic. We have to write only a byte in a MPB line, leaving the others flag in the same line unaltered.

                Since there is a write combine buffer (wcb), that's could a problem. The wcb flushes its 32 bytes either if a whole cache line is written (with subsequent MOVEs at subsequent MPB addresses) or if we write to different memory lines (right?). If locks aren't used for non-SINGLEBITFLAGS RCCE, I guess we have to choose the second option. So we have to issue a write to the desired flag (MOVB) and then issue a second write at a different line, just to "fool" the wcb.

                Do these considerations make any sense, or am I on the wrong way?


                Thank you for your attention,


                • 5. Re: RCCE non-singlebitflags

                  Yes, and that's exactly what happens. RCCE_flag_write calls RCCE_put_char, which does


                  *target = *source;
                  *(int *)RCCE_fool_write_combine_buffer = 1;
                  • 6. Re: RCCE non-singlebitflags

                    Where are the docs incorrect about flags? The current version of the SCC Programmer's  Guide discusses byte flags and single bit flags in Section 7.

                    The first choice (lower latency, higher memory use) occurs when you specify  SINGLEBITFLAGS=0 on the make command line. With bigflags, each flag takes up a byte; there are eight flags per 32-byte cache line. The second choice (higher latency, lower memory use) occurs when you specify SINGLEBITFLAGS=1 on the make command line. With singlebitflags, flags are stored as a single bit.

                    I did notice the README mentioning  that a flag can take up an entire cacheline and I just fixed that in the trunk. Are there any other locations? Is this what you were referring to?

                    • 7. Re: RCCE non-singlebitflags

                      Oh, I should mention that at one time big flags did take up a whole cacheline. This was changed. Now big flags take up a byte. So an older paper might still refer to 32-byte flags because papers are not updated. But our manuals and guides should be updated. And if they are not, please let me know.

                      • 8. Re: RCCE non-singlebitflags

                        Shouldn't there be up to 32 flags per 32-byte cache line?

                        • 9. Re: RCCE non-singlebitflags

                          Currently, yes. Back in the early days of RCCE, there was a version where one flag took up an enitre cache line. There may be some old docs that still refer to that version. The manuals and guides should be updated, but a published paper may still retain the old information.