I will have to go back to the code and check things out to see what was actually done (Rob did the actual work of implementing this feature). My guess is taht we allocate one byte alligned at the beginning of a cache line. But that's just a guess. I need to look at the code to be sure.
As for the atomics (or lack thereof) remember that the P54C only allows one outstanding write transaction. We were able to use that to get the safe updates we needed without atomics.
I think there have been improvements to RCCE since the original documentation (which admittedly should be updated).
I look in RCCE_lib.h and see
33 #ifdef SINGLEBITFLAGS
34 #define RCCE_FLAGS_PER_BYTE 8
36 #define RCCE_FLAGS_PER_BYTE 1
38 #define RCCE_FLAGS_PER_LINE (RCCE_LINE_SIZE*RCCE_FLAGS_PER_BYTE)
The RCCE_LINE_SIZE is the cache line = 32 bytes. But then I think a second flag is intended to be in the same line when SINGLEBITFLAGS is defined.
Vincenzo wrote ...
Nevertheless, I studied the RCCE code and I'm pretty sure that the routine RCCE_flag_alloc reserves actually 1 byte per flag (and not 32 bytes per flag).
This is true. But I do see in RCCE_flag_alloc()
59 // if this is a new flag line, need to allocate MPB for it
60 if (!flagp->line_address) flagp->line_address = RCCE_malloc(RCCE_LINE_SIZE);
So it seems that when you alloc a new flag, you do malloc() an entire line, even though the flag only takes up a byte of that line. And could this be what's meant by a flag taking up a line?
Yes, but if you look at the code, 32 different flag are put in the same line (you can check this experimentally as I explained before). So more flag are allocated in the same line, and this behaviour is different from the documentation. Of course it's not a problem for me (or anyone) if things works in this way.
I just want to understand how flag writes could be atomic. We have to write only a byte in a MPB line, leaving the others flag in the same line unaltered.
Since there is a write combine buffer (wcb), that's could a problem. The wcb flushes its 32 bytes either if a whole cache line is written (with subsequent MOVEs at subsequent MPB addresses) or if we write to different memory lines (right?). If locks aren't used for non-SINGLEBITFLAGS RCCE, I guess we have to choose the second option. So we have to issue a write to the desired flag (MOVB) and then issue a second write at a different line, just to "fool" the wcb.
Do these considerations make any sense, or am I on the wrong way?
Thank you for your attention,
Yes, and that's exactly what happens. RCCE_flag_write calls RCCE_put_char, which does
RC_cache_invalidate(); *target = *source; *(int *)RCCE_fool_write_combine_buffer = 1;
Where are the docs incorrect about flags? The current version of the SCC Programmer's Guide discusses byte flags and single bit flags in Section 7.
The first choice (lower latency, higher memory use) occurs when you specify SINGLEBITFLAGS=0 on the make command line. With bigflags, each flag takes up a byte; there are eight flags per 32-byte cache line. The second choice (higher latency, lower memory use) occurs when you specify SINGLEBITFLAGS=1 on the make command line. With singlebitflags, flags are stored as a single bit.
I did notice the README mentioning that a flag can take up an entire cacheline and I just fixed that in the trunk. Are there any other locations? Is this what you were referring to?
Oh, I should mention that at one time big flags did take up a whole cacheline. This was changed. Now big flags take up a byte. So an older paper might still refer to 32-byte flags because papers are not updated. But our manuals and guides should be updated. And if they are not, please let me know.
Shouldn't there be up to 32 flags per 32-byte cache line?
Currently, yes. Back in the early days of RCCE, there was a version where one flag took up an enitre cache line. There may be some old docs that still refer to that version. The manuals and guides should be updated, but a published paper may still retain the old information.