1 2 Previous Next 20 Replies Latest reply on Mar 5, 2012 5:49 AM by compres

    RCKMPI details

    cscholtes

      Good day,

       

      Here are a few question concerning RCKMPI that might be of general interest:

       

      1.) Multiple channels: Is it possible to compile for multiple channels at once and then to select for each run separately which of the compiled channels is to be used (sccmpb, sccmulti, sock, ...)?

       

      Configuring with multiple "--with-device" options only appears to select the last channel specified (e.g.: "--with-device=ch3:sccmulti --with-device=ch3:sock" only configures for sock). Trying to give one "--with-device" option multiple values (e.g.: "--with-device=ch3:sccmulti,sock" or "--with-device=ch3:sccmulti,ch3:sock") fails with an error about a channel that does not exist.

       

      In case compilation for multiple channels is possible, how would one select the channel to be used? (OpenMPI's "mpirun" supports, e.g., something like "--mca btl tcp,self" ...)

       


      2.) Reconfiguring: After having configured, compiled and installed RCKMPI for one channel, what has to be done before configuring for another channel (to be installed to a different location)?

       

      At first sight, using "make clean" appeared to work. Is this sufficient, more than necessary or does it fit?

       


      3.) Paths on cores: Is it possible to set paths (PATH and LD_LIBRARY_PATH) on the cores for RCKMPI and its prerequisites (in /shared/... instead of copying everything to /usr/bin and /usr/lib)? (Copying wastes quite a lot of memory and, thus, severely limits problem sizes. Additionally, copying takes quite some time, especially since it has to be done anew after each boot of the cores.)

       

      The key problem appears to be the limited environment provided by ssh when starting the MPI daemons: Apparently, PATH is hardcoded into ssh to contain only "/usr/bin:/bin". No apparent startup script is processed (e.g.: /etc/profile).

       

      Preparing an environment according to "man ssh" and "man sshd_config" failed, too: Create /etc/ssh/, create /etc/ssh/sshd_config with permissions 644 and contents "PermitUserEnvironment yes", create ~/.ssh/environment with permissions 700 and contents "PATH=/usr/bin:/bin:/shared/...".

       


      4.) Switching channel: What has to be done to switch from one channel to another (e.g. sock to sccmulti)?

       

      a) Is it necessary to exchange the rckmpi/bin copies on the cores (or are these generic)?

       

      b) Is it necessary to exchange the rckmpi/lib copies on the cores? Probably not, since they are all static libraries (lib*.a). -> Are they needed on the cores at all?

       

      c) Is it necessary to rebuild the MPI application? (Probably yes, since mpi functions are linked statically. Or is everything channel specific contained within rckmpi/bin/ ?)

       

      d) Supposedly, PATH on the MCPC has to be adapted to use the tools (mpicc, ...) produced for the desired channel. LD_LIBRARY_PATH is probably insignificant?

       

      e) A suitable kernel has to be running. Would a POPSHM enabled kernel be suitable for all channels?

       


      5) Detect channel:

       

      a) How can an application detect at runtime which channel it uses?

       

      b) How can one detect at compile time, for which channel mpicc compiles?

       


      6) Compilation for POPSHM: Obviously, RCKMPI can be compiled for channel sccmulti without having installed an extra library or even a POPSHM enabled kernel. Even the corresponding MPI daemons appear to work (at least running an application compiled for sccmpb). Will such a POPSHM targeted RCKMPI work as expected when using a POPSHM enabled kernel or should a POPSHM targeted RCKMPI only be compiled running a POPSHM enabled kernel?

       


      I'd be glad to learn more about any of the questions above.

       

      Thank you in advance

       

      Carsten Scholtes

        • 1. Re: RCKMPI details
          cscholtes

          Ad 1) Apparently, multiple channels can be configured using e.g.: "--with-device=ch3:sccmulti:sccmpb:sock". Unfortunately, only the first appears to be compiled (in this case: "sccmulti").

          • 2. Re: RCKMPI details
            compres

            Hello Carsten Scholtes,

             

            1. This is currently not possible.  OpenMPI has this feature, since it was designed from the beginning with this in mind with their "Open Runtime Environment", or ORTE for short.  MPICH2 has an experimental dllchan, and it is not included with RCKMPI.

             

            My understanding is that most of the development in MPICH2 is focused on their Nemesis subsystem, where you can indeed change network modules (netmods) at runtime.  Nemesis is not included with RCKMPI because its shared memory algorithms require that the system has a global address space.

             

            A solution to your problem would be to create one copy for each configured channel, and then dynamically link your applications (otherwise recompile them).  To switch to another channel, have a script overwrite the library in the SCC's cores memory.

             

             

            2. make clean && ./configure <new parameters> && make && make install

             

            You can skip the make clean step, but it does not take much time.

             

            If you do this very often, I would suggest you do what I mentioned in 1.

             

             

            3. This is a limitation of the SSH client build into the busybox image.  The client is called Dropbear, and it is intended for embedded devices.  Having it on the SCC is a sensible decision in terms of memory and performance, but it has its limitations.

             

            One of the limitations is that when you ssh into a core, the environment variables are not passed.  The actual values for the variables are hardwired in the Dropbear client.

             

            For changing PATH and LD_LIBRARY_PATH on the cores I can suggest 2 solutions:

            a. I can provide you with a patch I prepared a while ago, that you can modify to set up the variable to whatever you would prefer, or

            b. You could find a way to build in OpenSSH and replace the Dropbear client.

             

            Option a is the quicker solution, so let me know if you are interested, or if other people are as well. 

             

            I must add though, that this will result in lower performance since the RAM file system is much faster than running from the NFS mount on /shared.

             

             

            4.

            a. In the bin folder, you have the process manager and the mpiexec script.  These can be reused.

             

            b. Only if you link dynamically. 

             

            If you link statically, they don't even need to be in SCC RAM at all.  Only the libraries that are required for the process manager, in the case of the MPD, that is Python and all its dependencies.

             

            c. When linking statically, you will need a recompile. 

             

            d. LD_LIBRARY_PATH is also required.  The channel specific aspects are build into the lib*.a file.

             

            It can happen that the MPICH2 library is not found.  Or even worse, if you have MPICH2 installed in the MCPC through your Linux distribution's packet manager, it will link with the incorrect library.

             

            e. All new channels with run under a POPSHM kernel.  Only ch3:sock and ch3:sccmpb combinations will work in generic kernels.

             

             

            5. Internal information about an MPI implementation is not passed through the MPI-2 API to the application.  There are changes with regards to this issue in the MPI-3 draft that makes internal information accessible in a standardized way.

             

             

            6. The library will compile without problem, after all, you are compiling in a different system so there is no information about the target kernel available.

             

            The daemons are independent of the channel, they work with ssh and TCP/IP traffic, so they will also work in every case.

             

            The library should fail at initialization if you attempt to run an MPI job, on a non-POPSHM kernel, with binaries compiled with either sccshm or sccmulti as channels.  I have tested for this already, but if you run them and they fail silently or hang, please file a bug or just state the issue here in the forum.

             

            There is also a second requirement for the sccshm and sccmulti channels, with regards to the POPSHM.  Each core needs to have allocated the same number of POPSHM pages.  This is so because it allowed me to do some optimizations based on symmetry.  If you set up the POPSHM with an unbalanced number of pages per core, the library should also detect it and fail at initialization.

             

            Isaías A. Comprés Ureña

            • 3. Re: RCKMPI details
              cscholtes

              Hello Isaías A. Comprés Ureña,

               

              Thank you very much for this detailed answer!

               

              I'd be interested in the patch for the dropbear client. It might allow me to run more interesting problem sizes. As you mentioned performance problems for such a solution: I wouldn't expect to experience lower performance _after_ the mpd daemons are up. Is this a safe assumption? (Usually, I'd first execute an MPI_Barrier before starting runtime measurements.)

               

              Best regards

               

              Carsten Scholtes

              • 4. Re: RCKMPI details
                compres

                Hello again,

                 

                You are welcome

                 

                Here is the Dropbear patch.

                 

                Basically, look at the 'addnewvar' entries near the end of the file. 

                 

                You will need to replace the existing patch with the one attached:

                http://marcbug.scc-dc.com/svn/repository/trunk/rckos/dropbear/dropbear.patch

                 

                In your local copy, and rebuild your Linux image.  After you load the new Linux, the new environment should be there after an ssh operation:

                 

                PATH: /shared/install/bin:/usr/local/bin:/usr/local/sbin:/usr/sbin:/usr/bin:/sbin:/bin/
                LD_LIBRARY_PATH: /shared/install/lib

                 

                Notice that you can change other variables as well if you generate a new patch.

                 

                Hope this helps.

                 

                Isaías A. Comprés Ureña

                1 of 1 people found this helpful
                • 5. Re: RCKMPI details
                  cscholtes

                  Hello,

                   

                  Thank you again for all your help.

                   

                  Best regards

                   

                  Carsten Scholtes

                  • 6. Re: RCKMPI details
                    smeraji

                    Hi Isaias,

                     

                    Quick questions,

                     

                    Do you, by any chance, have a script to copy all the files/folders (step 12 of RckMPI installation tutorial) from /shared/user to cores? I know it's not hard to write one, I am just curios if it's already written.

                     

                    My second question is that as copying files wastes quite a lot of time and memory is there any other solution to skip the copying steps? I can see that you mentioned using a patch but I am not sure how I can use the path...

                     

                    Thanks

                    Sina  

                    • 7. Re: RCKMPI details
                      Nil

                      Hi,

                       

                      I have attached the script to copy all files. You need to change the user part with actual username (where it says "user" replace it with actual username you using). It you followed the installation tutorial then the path sould be the same. Hope this helps.

                      • 8. Re: RCKMPI details
                        compres

                        sina wrote:

                         

                        Hi Isaias,

                         

                        Quick questions,

                         

                        Do you, by any chance, have a script to copy all the files/folders (step 12 of RckMPI installation tutorial) from /shared/user to cores? I know it's not hard to write one, I am just curios if it's already written.

                         

                        My second question is that as copying files wastes quite a lot of time and memory is there any other solution to skip the copying steps? I can see that you mentioned using a patch but I am not sure how I can use the path...

                         

                        Thanks

                        Sina  

                         

                        Hi Sina,

                         

                        Nil's script will get the job done.

                         

                        You can apply the dropbear patch posted in a previous post in this thread.  That patch is for the 2.6.16 kernel/busybox image, but you can check the lines changed and apply that to the newer release.  Alternatively, you can configure your newer kernel/busybox to include open-shh instead of dropbear.  That way you can launch applications that are in your /shared (although with lower performance).

                         

                        - Isaías

                        • 9. Re: RCKMPI details
                          smeraji

                          Thanks  Isaías

                           

                          I have one more question for you. Does RCKMPI use the on-chip shared memory or off-chip memory. I assume it uses the off-chip memory.

                           

                          The reason I ask this is that I have a program that scales on regular multi-core machines but if I run the same program on SCC with RCKMPI and 2 cores, the running time is worsen even than the running time of sequential program(just one core). In general it does not scale very well. tracinf the program I found out that the reason is because of blocking receives that I have in my code.  So if RCKMPI uses the off-chip memory, does it uses the shared off-chip memory for both cores or it just uses the private memory for each core and then use TCP/IP(or any other protocol) to send real messages between cores?

                           

                          In any case, can you see any specific reason that I can not get speed up?!

                           

                          Thanks

                          Sina

                          • 10. Re: RCKMPI details
                            compres

                            sina wrote:

                             

                            Thanks  Isaías

                             

                            I have one more question for you. Does RCKMPI use the on-chip shared memory or off-chip memory. I assume it uses the off-chip memory.

                             

                            The reason I ask this is that I have a program that scales on regular multi-core machines but if I run the same program on SCC with RCKMPI and 2 cores, the running time is worsen even than the running time of sequential program(just one core). In general it does not scale very well. tracinf the program I found out that the reason is because of blocking receives that I have in my code.  So if RCKMPI uses the off-chip memory, does it uses the shared off-chip memory for both cores or it just uses the private memory for each core and then use TCP/IP(or any other protocol) to send real messages between cores?

                             

                            In any case, can you see any specific reason that I can not get speed up?!

                             

                            Thanks

                            Sina

                             

                            Hello Sina,

                             

                            It uses the MPB(on die SRAM, 'sccmpb' channel), shred memory ('sccshm' channel, POPSHM style shared memory: pinned pages and LUT reprogramming) or both ('sccmulti' channel), depending on which channel you configured.  You may also configure it to use the 'sock' channel, in that case it will use the TCP/IP stack.

                             

                            If you did not specify a channel with the --with-device=ch3:<channel> option at configure, then it should default to the sccmpb channel and you would be using the MPB for MPI.

                             

                            - Isaías

                            • 11. Re: RCKMPI details
                              smeraji

                              Isaías

                               

                              I understand that in order to use " sccmulti" channel , I need to boot a patched Linux on the SCC cores. After booting it as follows

                               

                              sccBoot -l BuildSCCLinux417_002.obj 0..47

                               

                              I couldn't ping cores as I mentioned here:

                               

                              http://communities.intel.com/message/149464#149464

                               

                              following Ted suggestions in the following link I eventually booted   BuildSCCLinux417_002.obj on cores doing following under  tmp/sccKit_myname:

                               

                              $ sccMerge -m 8 -n 12 -noimage -force ./linux.mt

                              sccReset -g

                              $ sccBoot -g ./obj

                              $ sccReset -r 0..47

                              $ssh root@rck00

                              root@rck00:~> cat /proc/meminfo |grep POP
                              POPSHM pages:        1
                              POPSHM page size:    16384 kB
                              POPSHM buffer size:  16384 kB
                              POPSHM base address: 0x10000000

                               

                              which means that I have POPSHM on cores. but now when I run my mpi program it crashes. i have tried running it with both "sccmpb" and "sccmulti" channels.

                               

                              the program is run on default linux with sccmpb channel,

                               

                              any thoughts?

                              • 12. Re: RCKMPI details
                                compres

                                sina wrote:

                                 

                                Isaías

                                 

                                I understand that in order to use " sccmulti" channel , I need to boot a patched Linux on the SCC cores. After booting it as follows

                                 

                                sccBoot -l BuildSCCLinux417_002.obj 0..47

                                 

                                I couldn't ping cores as I mentioned here:

                                 

                                http://communities.intel.com/message/149464#149464

                                 

                                following Ted suggestions in the following link I eventually booted   BuildSCCLinux417_002.obj on cores doing following under  tmp/sccKit_myname:

                                 

                                $ sccMerge -m 8 -n 12 -noimage -force ./linux.mt

                                sccReset -g

                                $ sccBoot -g ./obj

                                $ sccReset -r 0..47

                                $ssh root@rck00

                                root@rck00:~> cat /proc/meminfo |grep POP
                                POPSHM pages:        1
                                POPSHM page size:    16384 kB
                                POPSHM buffer size:  16384 kB
                                POPSHM base address: 0x10000000

                                 

                                which means that I have POPSHM on cores. but now when I run my mpi program it crashes. i have tried running it with both "sccmpb" and "sccmulti" channels.

                                 

                                the program is run on default linux with sccmpb channel,

                                 

                                any thoughts?

                                Hello Sina,

                                 

                                I can not say for sure with the information provided, but it is common to run out of memory on the SCC.  Check the logs in /var/log/messages to see if your program ran out of memory. 

                                 

                                It is, most of the time, obvious to tell if the problem is with the MPI library because of MPICH error handling and output (unless you configure with --enable-fast=all, which disables most error outputs).

                                 

                                - Isaías

                                • 13. Re: RCKMPI details
                                  smeraji

                                  Thanks Isaías

                                   

                                  The things is that my program works very well with the default linux. So I think something is wrong with "BuildSCCLinux417_002.obj"

                                   

                                  Which image file you use to boot a linux that supports "POPSHM"?

                                   

                                  Thanks

                                  Sina

                                  • 14. Re: RCKMPI details
                                    compres

                                    I was under the impression that all recent images had it enabled by default.  Perhaps someone can confirm here.

                                     

                                    I am still using an image I built a while ago, with minimal functionality to have more memory available for MPI applications.

                                     

                                    - Isaías

                                    1 2 Previous Next