8 Replies Latest reply on Dec 16, 2011 11:56 AM by tedk

    installing BLCR

    yilmaz

      Hello,

       

      I want to install BLCR (Berkley Lab Checkpoint/Restart) on SCC cores.  Therefore I need system.map or vmlinuz file of rcklinux. Where can I find these files? How does sccGui boot linux on the cores?

      For configuring MCPC, I followed the instructions in "Setting up MCPC". This means I installed sccKit_1.3.0.

       

      My second question is if “insmod” command works on SCC Cores. I must insert BLCR into the running Kernel after the installation.

       

      Moreover, has anyone tried to checkpoint on SCC. I am curious if BLCR works.

       

      Thanks in Advance

        • 1. Re: installing BLCR
          tedk

          Do you have your own MCPC/SCC? Why are you installing sccKit 1.3.0. I think if you follow the instructions to configure the MCPC, you will end up with sccKit 1.3.0, but I recommend that you upgrade to sccKIt 1.4.1.3. The newer sccKIt comes with a newer SCC Linux. This newer SCC Linux does support insmod. The SCC Linux with 1.3.0 does not.

          1 of 1 people found this helpful
          • 2. Re: installing BLCR
            yilmaz

            We have our own MCPC/SCC in institute. We installed 1.3 one year ago. I am now working with it, but now we decided to upgrade it next week.

            However i need vmlinuz or system.map file to install BLCR. Is it possible to find them somehow?

             

            I tried to use another Checkpointer "DMTCP".

            First i sourced the Cross Compilation Environment script and I typed: ./configure --prefix=/shared/**/ then make and make install. The installation was successful.

             

            To checkpoint a process in DMTCP, you must first start the dmtcp_coodinator in a separate terminal window. It works successfully too.

             

            But if i start a program with checkpointing support (dmtcp_checkpoint ./myprogram ) , i get the following error:

            ERROR: ld.so: object '/shared/***/lib/dmtcp/dmtcphijack.so' from LD_PRELOAD cannot be preloaded: ignored

             

            This error generally means that there is bit incompatibility. But I have already installed it with the Cross Compiler.

            I checked the file dmtcphijack.so, it is 32 Bit.

             

            What can be the cause of this error message?

            • 3. Re: installing BLCR
              tedk

              Do you build your own Linux? If you do you can see vmlinux and System.map in buildroot-2011.05. Look in http://communities.intel.com/docs/DOC-6869 for information about how to build SCC Linux.

              $ find . -name System.map -print
              ./buildroot-2011.05/output/build/linux-2.6.38.3/System.map
              $  find . -name vmlinux -print
              ./buildroot-2011.05/output/build/linux-2.6.38.3/vmlinux
              ./buildroot-2011.05/output/build/linux-2.6.38.3/arch/x86/boot/compressed/vmlinux
              $

              • 4. Re: installing BLCR
                tedk

                The typical way people add packages to SCC Linux is to modify its buildroot environment. There's a menu that we use to select features. I don't see the ability to add checkpointing in these menus. You should be able to add features to those menus. I don't know how to do that, but someone here must. It means understanding buildroot ... there's nothing SCC specific about it. http://buildroot.uclibc.org/

                 

                Actually I'm surprised that our cross environment supports configuring and building DMTCP. So that may be another way of getting checkpointing support.

                 

                I have not used DMTCP. I assume you just start the "dmtcp_coordinator in a separate terminal window," that is ... another ssh connection into a core. How do you know it's working?  Then you attempt to start a program with checkpointing support and get an error. Where do you run the dmtcp_checkpoint?

                 

                We haven't used shared libraries much with the SCC. There is a post dealing with shared libraries

                http://communities.intel.com/message/144900#144900

                 

                I haven't come across anyone trying to checkpoint on the SCC. I'll ask around and see if I can get more information.

                • 5. Re: installing BLCR
                  yilmaz

                  First I log into a core and start dmtcp coordinator, I am not sure if it works properly. I only dont get any failure and see these messages:

                   

                  dmtcp_coordinator starting...

                      Port: 7779

                      Checkpoint Interval: disabled (checkpoint manually instead)

                      Exit on last client: 0

                  Type '?' for help.

                   

                  It seems working. These are the usual messages. I get them on my linux and can checkpoint without any problem.

                   

                  After that I am logging into the same core with another terminal and trying to start a program with checkpointing support.

                   

                  PS :Here is how does DMTCP internally works (very briefly): http://dmtcp.sourceforge.net/FAQ.html#internalWorking

                  • 6. Re: installing BLCR
                    tedk

                    Can you be more specific about why you want to use checkpointing. Are you interested in "automatic checkpointing " or " application-specific, user level checkpointing?"  Are you doing research on checkpointing or using it as a tool for some other research?

                    • 7. Re: installing BLCR
                      yilmaz

                      I want to use them for process migration in MPI applications and I wonder if someone else did it.


                      Is there any other way for process migration in MPI applications on sccLinux?

                      • 8. Re: installing BLCR
                        tedk

                        There is certainly interest in process migration on the cores. Unfortunately there is little interest in checkpointing. Now I don't know how these are related? Can one do process migration without checkpointing? I would hope so. Checkpointing is very memory intensive, I think. It may save more state than is necessary for migration.