3 Replies Latest reply on Apr 19, 2013 6:15 AM by Hayder

    Memory consistency issue using RCCE

    nmelot

      Hi,

       

      I experienced some issues sending messages from a core to another using RCCE_get() and RCCE_put(). Each message sent is followed by sending a flag, signaling a message was sent. When A core has received and processed a message (where processing means checking if all values are the ones expected, then reset it to another "marker" value), then and only then it sends an acknowledgement to the sender core. The process is asynchronous, which means the sender core does not wait to receive an acknowledgement to finish its send operation, but it checks it later before sending any other message; the receiver follows a similar procedure. All N cores (from 2 to 48) play this scenario where core n sends to core n - 1. A cyclic variant allows core 0 to send to core N - 1.

       

      I implemented this scenario on Marc038 using RCCE_put() and RCCE_get() for messages and RCCE_put() and RCCE_get() or RCCE_flag_write() and RCCE_flag_read() for flags (depending on implementation variants). Messages are a sequence of 1 coded in a 4 bytes int and markers are the hexadecimal value 0xdeadbeef. Find simple_skip_msg source files attached to this post and nevermind skip_msg, as I think this implementation is buggy. If I run it with all cores, I almost always end up an error after a few dozens or hundreds thousands messages sent: some core find 0xdeadbeef instead of 1111111111 in the message; most of the time a sequence of 32*n 0xdeadbeef values, but sometimes, exceptionally, less than 32. 0xdeadbeef can be at the beginning of the message, at its end or in the middle. It shows the flag arrived before the message, even though its corresponding RCCE_put()/flag_write() was run before the one for the actual message. Note that marc038 Rocky Creek chip was replaced recently and all cores run now the mpbtest perfectly (the one from Pablo Rebble, as discussed in bug report 487 http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=487). As I reported the bug related to the scenario I describe here  (http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=495), my chip was replaced once again, with not better success.

       

      You can compile source files using the usual compilation tools icc. Use the command line below, where $PELIB_HOME points to the root directory of RCCE sources and where libRCCE_bigflags_gory_nopwrmgmt.a as well as mpb are compiled in bin/SCC_LINUX.

       

      make MPB_SIZE=8128 PELIB_HOME=$PELIB_HOME

       

      Run the test case using the command below:

       

      rccerun -nue 48 -f hostfile -clock 0.533 simple_skip_msg

       

      Where hostfile lists all cores from 00 to 48, in natural order. If you don't do anything, the program will run indefinitely and output faulty buffers when it find some (as well as a few more informations). If you want to stop them all in an easy way, help yourself with the command below:

       

      for i in `seq 0 47`; do ( ( ssh root@rck`printf %02d $i` killall -s INT simple_skip_msg; echo Core $i done. ) & ) 2>/dev/null; done; wait 2>/dev/null

       

      It seems to me that memory consistency is not the one expected (I expect what you write first is what reaches destination first, as the documentation says about the on-chip network). I couldn't find such discussion on communities.intel.com or any more detail about it in documentation. Have you experienced the same issue? Did I understand anything wrong about the SCC and RCCE? Can you see any flaw in my test?

       

      As a follow up, I upgraded my test so that each time it detects a wrong message (containing some instances of 0xdeadbeed), then it waits one second and checks the buffer again; then it always reads the expected value. This shows the messages was not yet arrived when the flag reached the received core and advocate for a weak memory consistency. It also means one can find a workaround to this issue, consisting in checking the whole message received for any instance of 0xdeadbeef again and again until no marker is detected anymore, process the message then reset it to 0xdeadbeef in order to take the next transmission. This has the cost of reading at least once and writing once the whole buffer before and after doing anything. Does anyone has a alternative solution?

       

      Best,

       

      Nicolas

        • 1. Re: Memory consistency issue using RCCE
          Hayder

          Hi,

           

          I have already got like this issue before in my work. I got impression, the problem is the conflicts between

          RCCE_flag_write() and RCCE_put(), because there is no flush for WCB in RCCE_put() to ensure the data gets flushed.

          My suggestion is to add this:

          *(int *)RCCE_fool_write_combine_buffer = 1;

          after each RCCU_put() to make sure the WCB gets flushed, or implement your application based on non-cache mode.

           

          I hope this help.

           

          Hayder

          • 2. Re: Memory consistency issue using RCCE
            nmelot

            Hi Hayder

             

            Thanks for your help.

             

            I am not sure what you refer to with WCB. Are these memory addresses marked as shared, thus invalidated by CL1FLUSHMB (MPBT-marked memory)?

            Anyway I tried to insert "*(int *)RCCE_fool_write_combine_buffer = 1;" after each RCCE_put() line in the test program, but it doesn't seems like helping. Since when an error happens, waiting some time before reading it again with RCCE_get() on local MPB fixes the issue, I am not convinced this is a cache issue.

            Anyway to the best of my knowledge, one can deactivate caching for L2-cached shared off-chip memory, but this is not possible with L1 and MPB.

             

            Best,

             

            Nicolas

            • 3. Re: Memory consistency issue using RCCE
              Hayder

              Hi Nicolas,

               

              WCB = Write combine buffer.

              If you looking to RCCE_put.c code, you will find the RCCE_put() not using RCCE_fool_write_combine_buffer to flush the data, while RCCE_put_char() (which used by RCCE_flag_write()) flush the data by using RCCE_fool_write_combine_buffer to make sure the data already written in MPB.

              Other way, I think to implement those function based on no-cache mode.

               

              Regards

               

              Hayder