1 2 Previous Next 22 Replies Latest reply: Sep 21, 2011 9:24 AM by tedk Go to original post RSS
  • 15. Re: Unable to use RCKMPI due to lack of mpdboot
    tedk Community Member
    Currently Being Moderated

    Both methods are not working with 1.4.1.2 and the new SCC Linux (2.6.38 kernel). Parth, what rev of sccKit and what kernel are you using?

     

    One way to test to see if the method Isaias describes will work is to ssh into a core (say rck00) and then see if you can ssh from core rck00 to rck01. As in (for sccKit 1.3.0 and sccKit 1.4.0)

    tekubasx@marc002:~$ ssh root@rck00
    Warning: Permanently added 'rck00,192.168.0.1' (RSA) to the list of known hosts.
    root@rck00:~> ssh root@rck01
    root@rck01:~>

    But 1.4.1.2 has

    tekubasx@marc006:~$ ssh root@rck00
    rck00:/root # ssh root@rck01
    ssh: connect to host rck01 port 22: No route to host
    rck00:/root # exit
    Connection to rck00 closed.
    tekubasx@marc006:~$ ssh root@rck01
    Warning: Permanently added 'rck01,192.168.26.2' (RSA) to the list of known hosts.
    rck01:/root # exit
    Connection to rck01 closed.
    tekubasx@marc006:~$
    If you are using the mpd daemon from MPICH2, you need to be able to ssh from one core to another. The default SCC Linux from 1.4.1.2 is not providing that capability.
    I haven't tested it yet, but I think that if you build an SCC from our SVN, it will have this capability. Has anyone done this?

     

    Message was edited by: Ted Kubaska (2.6.26 ==> 2.6.38)

  • 16. Re: Unable to use RCKMPI due to lack of mpdboot
    papandya Community Member
    Currently Being Moderated

    @Ted: I recently wanted to collect some data off SCC but found out that ssh is not working anymore for rck00. So, basically when I use

     

    cmlasu1@marc037:~$ ssh root@rck00
    ssh: connect to host rck00 port 22: Connection timed out

     

    This is the error I have been getting consistently. I am not sure about how to check which version of scckit is being used on this machine. The kernel version I found using uname -r is 2.6.32-24-generic. The peculiar thing is that I when I had tested on this machine sometime back the ssh capabilities were present and I was able to ssh between different cores such rck00 to rck01 and start the ring.

     

    Please let me know how to fix this problem.

     

    regards

     

    Parth

  • 17. Re: Unable to use RCKMPI due to lack of mpdboot
    Nil Community Member
    Currently Being Moderated

    Hi,

     

    If I am not mistaken then you can check version of sccKit by going to /opt/sccKit/

     

    Another thing i noticed is that kernal version you provided seems to be for MCPC and not SCC.

     

     

    Hope this helps.

  • 18. Re: Unable to use RCKMPI due to lack of mpdboot
    tedk Community Member
    Currently Being Moderated

    I logged onto marck037. I noticed that SCC Linux was not booted on the cores. I booted SCCLinux. I can ssh to rck00 and from rck00 I can ssh to rck01.

     

    tekubasx@marc037:/opt/sccKit$ ssh root@rck00

    root@rck00:~> ssh root@rck01
    root@rck01:~> exit
    root@rck00:~> exit
    Connection to rck00 closed.

     

    Please try using ssh again.

     

    I also noticed that you are running sccKit 1.3.0. This is fine ... 1.3.0 is a good stable release of sccKit. Please look at the features of 1.4.1.3 and let us know if you want to upgrade.

  • 19. Re: Unable to use RCKMPI due to lack of mpdboot
    tedk Community Member
    Currently Being Moderated

    Oh, here's a good way to find out what sccKit version you have

     

    tekubasx@marc037:/opt/sccKit$ ls -l current
    lrwxrwxrwx 1 root root 5 2010-08-26 09:36 current -> 1.3.0

     

    Also, you can look at the INFO line when you issue an sccKit command

     

    tekubasx@marc037:/opt/sccKit$ sccBoot -s
    INFO: Welcome to sccBoot 1.3.0 (build date Aug 25 2010 - 15:55:06)...
    Status: The following cores can be reached with ping (booted): All cores!
    tekubasx@marc037:/opt/sccKit$

     

    If you upgrade the sccKit 1.4.1.3, the INFO line will still say 1.4.1 ... the .3 just means that patch 3 is applied. We do, however, arrange that current links to a directory called 1.4.1.3.

  • 20. Re: Unable to use RCKMPI due to lack of mpdboot
    papandya Community Member
    Currently Being Moderated

    @Nil: Thanks for the info :).

     

    @Ted: I tried running the ssh again and was not able to. Then I gave the command sccBoot -s to check and the output was:

     

    cmlasu1@marc037:/opt/sccKit$ sccBoot -s
    INFO: Welcome to sccBoot 1.3.0 (build date Aug 25 2010 - 15:55:06)...
    Status: The following cores can be reached with ping (booted): No cores!

     

    As you can see none of the cores seem to be communicating. Is there any way this issue can be resolved by me or does it require booting the SCCLinux. And if it needs SCCLinux to be booted can you outline the procedure for it.

  • 21. Re: Unable to use RCKMPI due to lack of mpdboot
    aprell Community Member
    Currently Being Moderated

    If I remember correctly, the status that sccBoot -s reported under sccKit 1.3.0 was not always correct. You can check with sccPerf to see which cores, if any, are up and running (indicated by green arrows).

     

    To boot SCC Linux on all 48 cores, simply sccBoot -l. To boot SCC Linux on cores, say 0 through 7, use sccBoot -l 0..7.

  • 22. Re: Unable to use RCKMPI due to lack of mpdboot
    tedk Community Member
    Currently Being Moderated

    Yes, sccBoot -s sometimes doesn't return all  cores. But if Linux is running it will return some. It tries only once for each core and if the core does not respond quickly enough, it doesn't get counted.

     

    I did try sccBoot -s on marc037 just now and got "All cores" but was not able to ssh to a core or to ping a core. But I rebooted SCC Linux and then was able to ssh to rck00. When you are unable to ssh ... does this come after you've run one of your applications? Have you tried rebooting SCC Linux with sccBoot -l?

    tekubasx@marc037:~$ ping 192.168.0.1
    PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data.
    64 bytes from 192.168.0.1: icmp_seq=1 ttl=64 time=545 ms
    64 bytes from 192.168.0.1: icmp_seq=2 ttl=64 time=0.081 ms
    64 bytes from 192.168.0.1: icmp_seq=3 ttl=64 time=0.081 ms
    ^C
    --- 192.168.0.1 ping statistics ---
    3 packets transmitted, 3 received, 0% packet loss, time 2002ms
    rtt min/avg/max/mdev = 0.081/182.000/545.839/257.273 ms
    tekubasx@marc037:~$ ssh root@rck00

    root@rck00:~> exit
    Connection to rck00 closed.
    tekubasx@marc037:~$ ssh root@rck10

    root@rck10:~> exit
    Connection to rck10 closed.
    tekubasx@marc037:~$

     

     

1 2 Previous Next

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points