3 Replies Latest reply on Mar 21, 2011 9:21 AM by Nil

    shared dram

    paulcockshott

      Shared dram is mentioned in diagrams in the documentation but I have seen no written documentation that tells you how to use it.

      1. Is it supported in the standard setup or do you need special LUT settings.
      2. Does the RCCE fence call appropriately serialise writes to the shared DRAM so that if all writers to it do a fence call they will, after the fence call, see the results of the earlier writes?
        • 1. Re: shared dram
          michael.riepen

          In the standard setup (assuming you are working with Linux) we definded 64MB of shared memory (16MB per Memory controller). These can be accessed through the physical address 0x80000000-0x83ffffff of each core (LUT entries 0x80 to 0x83). The content can be read and modified by each core!

           

          However, when you're working on Linux you should be very aware that some regions of this shared memory are already in use by system services (e.g. the on-die network driver that allows TCP traffic from core to core). For you convenience I'll paste the current "reserved areas" to this thread:

           

          a. SHM TTY1 & Perfmeter 8000 0000 - 8018 0FFF (1540KB @ 0MB)

          SHM TTY2 8100 0000 - 8118 0FFF (1540KB @ 0MB)

          SHM TTY3 8200 0000 - 8218 0FFF (1540KB @ 0MB)

          SHM TTY4 8300 0000 - 8318 0FFF (1540KB @ 0MB)

          b. rckpc (Host network): 8020 0000 - 802B FFFF (768KB @ 2MB)

          8120 0000 - 812B FFFF (768KB @ 2MB)

          8220 0000 - 822B FFFF (768KB @ 2MB)

          8320 0000 - 832B FFFF (768KB @ 2MB)

          c. rckmb (on-chip network) 8019 0000 – 801E FFFF (384KB @ 1600KB)

          • 2. Re: shared dram
            sfin

            Hi,

             

            We are having similar problems regarding the use of memory  allocation. When we allocate shared memory using RCCE_shmalloc, and we  initialize the memory arrays allocated, our program gets unexpected  results, different in each execution of the program. Furthermore, some  cores which are not being used by our program, crash. For instance, we  see that when we are using core 0 and 1 for our program, cores 12 and 20  crash and become unaccessible, so we have to reboot the cores. We are  also checking that the size of the arrays we are allocating is a  multiple of 32 byte. The program crashes with any number of cores and  the problem arises when we write to the shared memory region.

            We are using RCCE_v1.0.12 and the program we are executing ( a tiny  version that reproduces the error), is below and attached as well. The  program creates three matrices, initialize them and print them on the  screen. Sometimes, the values print are not correct or do not coincide  with the values from the initialization. As already said, some cores  which are not being used crash.

             

            Thank you for your help.

             

            ------------------------------

            --

             

            #include <string.h>
            #include <stdio.h>
            #include "RCCE.h"

             

            #define ROWS_A 48
            #define COLS_A 10
            #define ROWS_B COLS_A
            #define COLS_B 8

             


            int RCCE_APP(int argc, char **argv)
            {

             

                int ME;
                int UEs   = 0;
                int i, j, k;
                int *A, *B, *C;

             

             
             
                RCCE_init(&argc, &argv);

             

                // Get current UEs' id and the number of UEs
                 ME = RCCE_ue();
                UEs = RCCE_num_ues();

             

                A = (int *)RCCE_shmalloc(ROWS_A * COLS_A * sizeof(int));
                if(!A)
                     printf("RCCE failed to shmalloc # A # on proc %d\n", UEs);

             

                B = (int *)RCCE_shmalloc(ROWS_B * COLS_B * sizeof(int));
                 if(!B)
                     printf("RCCE failed to shmalloc # B # on proc %d\n", UEs);

             

                C = (int *)RCCE_shmalloc(ROWS_A * COLS_B * sizeof(int));   
                if(!C)
                     printf("RCCE failed to shmalloc # C # on proc %d\n", UEs);
                

                // If I am UE0 I will initialize the arrays
                if(!ME)
                {
                    for(i = 0; i < ROWS_A; i++)
                    {
                        for(j = 0; j < COLS_A; j++)
                        {
                            A[i * COLS_A + j] = i * j;
                         }
                    }       
                    for(i = 0; i < ROWS_B; i++)
                    {
                        for(j = 0; j < COLS_B; j++)
                        {
                            B[i * COLS_B + j] = i * j;
                        }
                    }
                                
                    for(i = 0; i < ROWS_A; i++)
                    {       
                        for(j = 0; j < COLS_B; j++)
                        {
                            C[i * COLS_B + j] = 0;
                        }
                    }
                 }

             


                RCCE_shflush();
                RCCE_barrier(&RCCE_COMM_WORLD);

             

               
                //Check that initialization was done corectly and other cores see the new values
                if(ME == 0)
                {
                    printf("I am %d\n", RCCE_ue());

             

                    printf("Array A:\n");
                     printf("========\n");
                    for(i = 0; i < ROWS_A; i++)
                    {
                        for(j = 0; j < COLS_A; j++)
                            printf("%d ", A[i * COLS_A + j]);
                        printf("\n");
                     }
                   
                    printf("\nArray B:\n");
                    printf("========\n");
                    for(i = 0; i < ROWS_B; i++)
                    {
                        for(j = 0; j < COLS_B; j++)
                            printf("%d ", B[i * COLS_B + j]);
                         printf("\n");
                    }
                   
                    printf("\nArray C:\n");
                    printf("========\n");
                    for(i = 0; i < ROWS_A; i++)
                    {
                        for(j = 0; j < COLS_B; j++)
                             printf("%d ", C[i * COLS_B + j]);
                        printf("\n");
                    }
                }

             

                RCCE_barrier(&RCCE_COMM_WORLD);

             


                RCCE_shfree((t_vcharp) A);
                RCCE_shfree((t_vcharp) B);
                 RCCE_shfree((t_vcharp) C);
                RCCE_finalize();

             

                return(0);

             

            }
            --------------------------------------
            • 3. Re: shared dram
              Nil

              Hi,

               

              it seems that when you run the program it uses memory already in use by linux on other core, have a look at this thread

               

              http://communities.intel.com/message/106642#106642

               

              I suggest you should try your code with RCCE from trunk.

               

              Hope this helps.