1 Reply Latest reply: Apr 25, 2012 2:25 PM by gdaliaox RSS

Possible race condition in RCCE_stencil.c example in GORY mode?

gdaliaox Community Member
Currently Being Moderated

 

Hi,

 

I believe I found a race condition bug in the RCCE stencil example code in GORY mode (repository version 303).

 

The code does the following logic per iteration:

 

1. Send fringe data to next core.

2. Send fringe data to previous cores.

3. Accept fringe data from next core.

4. Accept fringe data from previous core.

5. Do stencil processing

 

However, I think the code doesn't actually wait in steps (1) and (2) of the next iteration, to ensure that the previous data it has sent into the neighbor cores has been consumed. It may therefore happen that we get out of sync, and data generated on the next iteration will overwrite the data sent on the previous iteration before the neighbor core had a chance to pick it up, leading to data corruption.

 

This is hard to notice in the fringe code itself due to two reasons: (1) The floating point filter will likely converge despite small numeric errors, and more importantly (2) the stencil processing time is very likely the same for each core, making the race much less likely to happen.

 

I had used the logic from this example in my own code which (1) works on integers, where a single bit error will be detected, and (2) the computation is more complex, and can take different amount of time in each core since it is data dependent, making the race much more likely to happen.

 

An race error scenario is as follows: suppose we just started the processing of iteration i simultaneously on all cores in sync and now the following occurs:

 

1. core 5 finishes processing, and sends its data of iteration i to cores 4 and 6

2. core 4 finishes processing, and sends its data of iteration i to cores 3 and 5

3. core 6 finishes processing, and sends its data of iteration i to cores 5 and 7

4. core 7 is working on hard computation, and does not yet finish computation of iteration i.

5. core 5 waits for data from cores 4 and 6. Since both transferred the data, it happily consumes iteration i data, and proceeds to the computation of iteration i+1.

6. core 6 waits for data from core 7, however since core 7 is busy doing its computation on i, data is not yet available. core 6 is therefore stuck waiting on flag1.

7. core 5 happens to finish computing i+1 quickly, and starts the next loop. Here the bug occurs: it does not wait for core 6 to consume its data which we sent on iteration i, which it will only do after getting data from core 7, but core 7 is not ready! Therefore, core 5 will happily overwrite the old data it has sent to core 6 with new data from iteration i+1.

8. core 7 now finished, releasing core 6 which gets its data from iteration i.

9. core 6 is now getting the data from core 5, but it now gets corrupted data of iteration i+1 which was overwritten by core 5, instead of the data from iteration i it has expected.

 

The bug can also be seen in the "sender" code, in the next lines:

 

    /* start with copying fringe data to neighboring tiles               */
    if (MY_ID!=NTILES1) {
      /* Initialize neighbor flag to zero                                */
      RCCE_flag_write(&flag0, RCCE_FLAG_UNSET, MY_ID+1);
      /* copy private data to shared comm buffer of neighbor             */
      RCCE_put((t_vcharp)(&buff[0]), (t_vcharp)(&a[NXNY2]), NX*sizeof(float), MY_ID+1);
      RCCE_flag_write(&flag0, RCCE_FLAG_SET, MY_ID+1);
    }

 

The first RCCE_flag_write which unsets ID+1 doesn't actually do anything useful, if ID+1 hasn't consumed his data we don't know it but just overwrite the state, and data is lost. The write of UNSET doesn't protect against anything.

 

Attached is a diff file, as well as new version of the RCCE_stencil.c, with a fix for the bug. The same logic was tested in my application, which was sensitive to the bug, and there it fixed the problem.

 

The way it works, is we add two flags in addition to flag0 and flag1, named flag0a for "flag0 ack" and flag1a, which are used for 2-way handshake.

 

When we send data to the next tile, we set flag0 as before. However, the next tile must set the ack flag in our tile MPB area once it has taken the data, and he has to unset the FLAG0 in his tile, not the sender as in the original code:

 

 

     if (MY_ID!=0) {

       RCCE_wait_until(flag0, RCCE_FLAG_SET);

+      RCCE_flag_write(&flag0, RCCE_FLAG_UNSET, MY_ID);

       RCCE_get((t_vcharp)(&a[0]), (t_vcharp)(&buff[0]), NX*sizeof(float),MY_ID);

+      RCCE_flag_write(&flag0a, RCCE_FLAG_SET, MY_ID-1);

 

Next, the sender, prior to sending the data, has to do RCCE_wait_until(flag0a...) to make sure the next tile actually got the data:

 

     if (MY_ID!=NTILES1) {

-      /* Initialize neighbor flag to zero                                */

-      RCCE_flag_write(&flag0, RCCE_FLAG_UNSET, MY_ID+1);

+      /* Wait for next neighbor to take the data, and unset the flag     */

+      RCCE_wait_until(flag0a, RCCE_FLAG_SET);

+      RCCE_flag_write(&flag0a, RCCE_FLAG_UNSET, MY_ID);

       /* copy private data to shared comm buffer of neighbor             */

       RCCE_put((t_vcharp)(&buff[0]), (t_vcharp)(&a[NXNY2]), NX*sizeof(float), MY_ID+1);

       RCCE_flag_write(&flag0, RCCE_FLAG_SET, MY_ID+1);

     }

And the same kind of code for flag1a.
Thanks,
Gadi

 


More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points