3 Replies Latest reply on Apr 11, 2016 3:35 PM by mark_h_@intel

    Can I read/write shared 128 bit floats without locks ?

    PineTree

      I am writing a program. It has two threads. One thread both  reads and writes and the other thread reads only. The thread that is reading is okay with reading the shared variables either before or after they are modified by the thread that writes. It will be a problem if partially modified data is read by the thread that reads only.

       

      I am using 128 bit variables. Below is some sample code for reading and writing.

       

      Function to write :

       

      void setVar(float a , float b , float c , float d){

       

      //sharedArray[4] will not be optimized away. I can guarantee that. It is an array of floats.


      sharedArray[0] = a;

      sharedArray[1] = b;

      sharedArray[2] = c;

      sharedArray[3] = d;

       

      shared128bitFloat = _mm_loadu_ps((float*)sharedArray);

       

      }

       

       

      Function to read :

       

       

      __m128 getVar(){    

           return shared128bitFloat;

      }

       

       

      I looked into the assembly and found that the read and write to the shared128bitFloat happens using only one assembly instruction.



      The code for writing to the 128 bit float ie :


      shared128bitFloat = _mm_load_ps((float*)sharedArray)


      became :


      vmovups xmm0, XMMWORD PTR[rax]

      vmovups XMMWORD PTR [rdi + 560], xmm0      //(ONLY ONE INSTRUCTION USED FOR MODIFYING THE DATA)





      The code to read got inlined and became :

       

           vmovups xmm0, XMMWORD PTR [rdx+560]


       

          

      Given one instruction is used for both reading and writing to the 128 bit float is it okay to assume that I do not need to use any lock to read the data as the reader is okay reading the unmodified data or fully modified data , just not the data that has been modified partially.

       

      In essence I think what I am asking is if reads and writes to 128 bit variables to memory be interleaved. I assume the bus width is 64 bits so probably if two instructions are being executed in two different CPU cores one core could be issuing two reads (128/64 =2) and the other could be issuing two writes and they could interleave.

       

       

      Related questions.

       

       

      Am I right in assuming context cannot switch with an assembly instruction being partially executed.  A question was raised  about the problem when both the reader and writer are executing on the same CPU core and context switch happens halfway between a read or write. As far as I know context switch cannot happen during the execution of micro instructions. It would be helpful if someone confirms that. 

       

       

      Can hyper-threading cause an issue here? If the instructions from both the reader and writer are on the pipeline could there be issues ?

       

       

      If the above methodology cannot work, is there any way I could make the reads and writes atomic by using some kind of compiler intrinsic to lock the bus?  I do not want to use any lock from any library.

       

       

      Thanks,
      Anil Mahmud.