0 Replies Latest reply on Jul 23, 2010 7:04 AM by AlphaOrion

    SSE problems with -m32

    AlphaOrion

      Hi,

       

      i wrote this code:

       

      #include <stdlib.h>
      #include <stdio.h>
      #include <iostream>
      #include <xmmintrin.h>

       

      int main()
      {
        float *outIt;
        __m128 image4 ;
        int k;
        float sum;
       
        posix_memalign( (void**)&outIt, 16, sizeof(float)*8 );
        sum=0;

       

        printf("sum: %f\n", sum );
        for( k=0; k<8; k+=4 )
        {
          image4 = _mm_cvtpu8_ps( _mm_setr_pi8 ( 1+k, 2+k, 3+k, 4+k, 5+k, 6+k, 7+k, 8+k ) );
          _mm_store_ps( outIt, image4 );
          std::cerr <<  "16-byte aligned: " << outIt << std::endl;
          printf( "outIt[0]: %f, outIt[1]: %f, outIt[3]: %f, outIt[3]: %f\n", outIt[0], outIt[0], outIt[2], outIt[3] );
          printf( "outIt[0]: %f, outIt[1]: %f, outIt[3]: %f, outIt[3]: %f\n", outIt[0], outIt[1], outIt[2], outIt[3] );
          sum = outIt[0] + outIt[1] + outIt[2] + outIt[3];
          outIt += 4;
          printf("sum: %f\n", sum );
        }
      }

       

      When I compile this on a 64-bit (Linux) system, I get:

      sum: 0.000000
      16-byte aligned: 0x100100080
      outIt[0]: 1.000000, outIt[1]: 1.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
      outIt[0]: 1.000000, outIt[1]: 2.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
      sum: 10.000000
      16-byte aligned: 0x100100090
      outIt[0]: 5.000000, outIt[1]: 5.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
      outIt[0]: 5.000000, outIt[1]: 6.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
      sum: 26.000000

      what is I expect. However, if I compile it for 32-bit mode, the output is:

      sum: 0.000000
      16-byte aligned: 0x100160
      outIt[0]: nan, outIt[1]: 1.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
      outIt[0]: 1.000000, outIt[1]: 2.000000, outIt[3]: 3.000000, outIt[3]: 4.000000
      sum: 10.000000
      16-byte aligned: 0x100170
      outIt[0]: nan, outIt[1]: 5.000000, outIt[3]: 7.000000, outIt[3]: 8.000000
      outIt[0]: 5.000000, outIt[1]: 6.000000, outIt[3]: 7.000000, outIt[3]: 8.000000

       

      There are always NaNs when I try to access elements of outIt for the first time.

       

      Does anybody know, what I am doing wrong?

       

      Thanks,

       

       

      A.O.