I write a simple code with this scenario :
Core A writes to variable X located on MPB of core B. Later when core B tries to read the variable X, sometimes it reads the old value and not the latest value, which written to it by core A.
I use the emulator and I do put "#pragma omp flush" immidiately after writing to X by A , in order to make the change commited and visible to all threads, and also before reading of X by core B. But it doesn't seem to work properly.
Any ideas? I code provide you with a simple code if you wish.
Thanks,
Omid
Yes, please post the code. Are you putting a pragma in the middle of your executing code?
Yes Ted, I am using pragma in the middle of the code. A simple test code is attached. You will see that a variable called 'size'
in core B is incremented by core A, but sometimes it is not visible by core A in its future access. Test the code with 4 cores.
Omid
Any ideas? Plus is there anything wrong with using 'pragma' in the middle of executing code?
Omid
I guess it's pointer related... The following example is adapted from your code. Could you check it?
char *p;
int ID, next, prev;
RCCE_init(&argc, &argv);
ID = RCCE_ue();
next = (ID + 1) % RCCE_num_ues();
prev = (ID - 1 >= 0) ? ID - 1 : RCCE_num_ues() - 1;
p = RCCE_malloc(32);
*(int *)p = 0;
RCCE_barrier(&RCCE_COMM_WORLD);
RCCE_acquire_lock(0);
printf("Core %d writes to the MPB of core %d\n", ID, next);
*(int *)(p - (char *)RCCE_comm_buffer[ID] + (char *)RCCE_comm_buffer[next]) = ID;
RCCE_release_lock(0);
RCCE_barrier(&RCCE_COMM_WORLD);
assert(*(int *)p == prev);
RCCE_finalize();
Thanks for the modified code. Executing the above code on emulator leads to 3 possible results:
(1) it works as we expect (each code copy its id to its right hand nieghbor in a round robin manner)
(2) Sometimes just one core copy its ID to the nieghbor core and then Assert statement will be false. (this behaviour is not expected)
(3) Sometimes it crashes in the middle of execution. (this is also unexpected)
Could anyone test the above code on SCC platform, since we haven't got the platform yet? I want to understand whether the emulator is guilty or RCCE library (or maybe the code itself !).
Thanks,
Omid
Hmm, it seems to work. No problems when I run it on the emulator (older, revision 90)...
I will test a version of the code on the actual hardware.
Also no problems on the hardware... The code is slightly different, with puts and gets to avoid caching and write-combining issues.
Could you post your code again?
This is exactly my code which runs on emulator:
#include <stdio.h>
#include <stdlib.h>
#include "../../include/RCCE_lib.h"
#include "../../include/RCCE.h"
#include<assert.h>
int RCCE_APP(int argc, char** argv) {
char *p;
int ID, next, prev;
RCCE_init(&argc, &argv);
ID = RCCE_ue();
next = (ID + 1) % RCCE_num_ues();
prev = (ID - 1 >= 0) ? ID - 1 : RCCE_num_ues() - 1;
p = (char *)RCCE_malloc(32);
*(int *)p = 0;
RCCE_barrier(&RCCE_COMM_WORLD);
RCCE_acquire_lock(0);
printf("Core %d writes ' %d ' to the MPB of core %d\n", ID, ID, next);
*(int *)(p - (char *)RCCE_comm_buffer[ID] + (char *)RCCE_comm_buffer[next]) = ID;
RCCE_release_lock(0);
RCCE_barrier(&RCCE_COMM_WORLD);
assert(*(int *)p == prev);
RCCE_finalize();
}
And RCCE version is : "1.0.13.x" and I am using the trunc revision.
I still have the same strange behaviour. Are you sure no "#pragma omp flush " is needed (befroe reads or after writes) in order to make changes made by one thread to shared variables be visible to others threads too?
Omid
I can't get it to fail... I've also checked it against the latest revision of RCCE. Note that I am using gcc to build the RCCE emulator (default is icc).
Strange. I also build emulator with gcc.
App behaves in this way:
Sometimes it crashes -> Here I suspect the RCCE_barrier on emulator.
Sometimes Assert statement is false -> Here I am not sure how it can be explained. Maybe some coherence issues between omp threads.
And sometimes it works as it should.
Could you please send me the code that you already run on actual hardware?
Thanks,
Omid
Of course, here it is. Haven't seen it fail once, both on hardware and on emulator...
The address arithmetic is now done by RCCE_put.

