1 2 Previous Next 22 Replies Latest reply: Sep 21, 2011 9:24 AM by tedk Go to original post RSS
      • 15. Re: Unable to use RCKMPI due to lack of mpdboot
        tedk

        Both methods are not working with 1.4.1.2 and the new SCC Linux (2.6.38 kernel). Parth, what rev of sccKit and what kernel are you using?

         

        One way to test to see if the method Isaias describes will work is to ssh into a core (say rck00) and then see if you can ssh from core rck00 to rck01. As in (for sccKit 1.3.0 and sccKit 1.4.0)

        tekubasx@marc002:~$ ssh root@rck00
        Warning: Permanently added 'rck00,192.168.0.1' (RSA) to the list of known hosts.
        root@rck00:~> ssh root@rck01
        root@rck01:~>

        But 1.4.1.2 has

        tekubasx@marc006:~$ ssh root@rck00
        rck00:/root # ssh root@rck01
        ssh: connect to host rck01 port 22: No route to host
        rck00:/root # exit
        Connection to rck00 closed.
        tekubasx@marc006:~$ ssh root@rck01
        Warning: Permanently added 'rck01,192.168.26.2' (RSA) to the list of known hosts.
        rck01:/root # exit
        Connection to rck01 closed.
        tekubasx@marc006:~$
        If you are using the mpd daemon from MPICH2, you need to be able to ssh from one core to another. The default SCC Linux from 1.4.1.2 is not providing that capability.
        I haven't tested it yet, but I think that if you build an SCC from our SVN, it will have this capability. Has anyone done this?

         

        Message was edited by: Ted Kubaska (2.6.26 ==> 2.6.38)

        • 16. Re: Unable to use RCKMPI due to lack of mpdboot
          papandya

          @Ted: I recently wanted to collect some data off SCC but found out that ssh is not working anymore for rck00. So, basically when I use

           

          cmlasu1@marc037:~$ ssh root@rck00
          ssh: connect to host rck00 port 22: Connection timed out

           

          This is the error I have been getting consistently. I am not sure about how to check which version of scckit is being used on this machine. The kernel version I found using uname -r is 2.6.32-24-generic. The peculiar thing is that I when I had tested on this machine sometime back the ssh capabilities were present and I was able to ssh between different cores such rck00 to rck01 and start the ring.

           

          Please let me know how to fix this problem.

           

          regards

           

          Parth

          • 17. Re: Unable to use RCKMPI due to lack of mpdboot
            Nil

            Hi,

             

            If I am not mistaken then you can check version of sccKit by going to /opt/sccKit/

             

            Another thing i noticed is that kernal version you provided seems to be for MCPC and not SCC.

             

             

            Hope this helps.

            • 18. Re: Unable to use RCKMPI due to lack of mpdboot
              tedk

              I logged onto marck037. I noticed that SCC Linux was not booted on the cores. I booted SCCLinux. I can ssh to rck00 and from rck00 I can ssh to rck01.

               

              tekubasx@marc037:/opt/sccKit$ ssh root@rck00

              root@rck00:~> ssh root@rck01
              root@rck01:~> exit
              root@rck00:~> exit
              Connection to rck00 closed.

               

              Please try using ssh again.

               

              I also noticed that you are running sccKit 1.3.0. This is fine ... 1.3.0 is a good stable release of sccKit. Please look at the features of 1.4.1.3 and let us know if you want to upgrade.

              • 19. Re: Unable to use RCKMPI due to lack of mpdboot
                tedk

                Oh, here's a good way to find out what sccKit version you have

                 

                tekubasx@marc037:/opt/sccKit$ ls -l current
                lrwxrwxrwx 1 root root 5 2010-08-26 09:36 current -> 1.3.0

                 

                Also, you can look at the INFO line when you issue an sccKit command

                 

                tekubasx@marc037:/opt/sccKit$ sccBoot -s
                INFO: Welcome to sccBoot 1.3.0 (build date Aug 25 2010 - 15:55:06)...
                Status: The following cores can be reached with ping (booted): All cores!
                tekubasx@marc037:/opt/sccKit$

                 

                If you upgrade the sccKit 1.4.1.3, the INFO line will still say 1.4.1 ... the .3 just means that patch 3 is applied. We do, however, arrange that current links to a directory called 1.4.1.3.

                • 20. Re: Unable to use RCKMPI due to lack of mpdboot
                  papandya

                  @Nil: Thanks for the info :).

                   

                  @Ted: I tried running the ssh again and was not able to. Then I gave the command sccBoot -s to check and the output was:

                   

                  cmlasu1@marc037:/opt/sccKit$ sccBoot -s
                  INFO: Welcome to sccBoot 1.3.0 (build date Aug 25 2010 - 15:55:06)...
                  Status: The following cores can be reached with ping (booted): No cores!

                   

                  As you can see none of the cores seem to be communicating. Is there any way this issue can be resolved by me or does it require booting the SCCLinux. And if it needs SCCLinux to be booted can you outline the procedure for it.

                  • 21. Re: Unable to use RCKMPI due to lack of mpdboot
                    aprell

                    If I remember correctly, the status that sccBoot -s reported under sccKit 1.3.0 was not always correct. You can check with sccPerf to see which cores, if any, are up and running (indicated by green arrows).

                     

                    To boot SCC Linux on all 48 cores, simply sccBoot -l. To boot SCC Linux on cores, say 0 through 7, use sccBoot -l 0..7.

                    • 22. Re: Unable to use RCKMPI due to lack of mpdboot
                      tedk

                      Yes, sccBoot -s sometimes doesn't return all  cores. But if Linux is running it will return some. It tries only once for each core and if the core does not respond quickly enough, it doesn't get counted.

                       

                      I did try sccBoot -s on marc037 just now and got "All cores" but was not able to ssh to a core or to ping a core. But I rebooted SCC Linux and then was able to ssh to rck00. When you are unable to ssh ... does this come after you've run one of your applications? Have you tried rebooting SCC Linux with sccBoot -l?

                      tekubasx@marc037:~$ ping 192.168.0.1
                      PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
                      64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=545 ms
                      64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.081 ms
                      64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=0.081 ms
                      ^C
                      --- 192.168.0.1 ping statistics ---
                      3 packets transmitted, 3 received, 0% packet loss, time 2002ms
                      rtt min/avg/max/mdev = 0.081/182.000/545.839/257.273 ms
                      tekubasx@marc037:~$ ssh root@rck00

                      root@rck00:~> exit
                      Connection to rck00 closed.
                      tekubasx@marc037:~$ ssh root@rck10

                      root@rck10:~> exit
                      Connection to rck10 closed.
                      tekubasx@marc037:~$

                       

                       

                      1 2 Previous Next