12 Replies Latest reply on Dec 8, 2010 3:00 AM by Nil

    Error code 127

    Nil

      Hello All,

       

       

      I was going through section 8.5 of scc programmer's guide,

       

      when i tried to run stencil_synch example it fails with error code 127.

       

      according to error code pdf it means "rccerun cannot find your program" but

       

      i have copied executible to /shared and i am trying to run it from there.

       

      Can someone please explain if i am doing something wrong here. Below is out

      put of program.

       

      The other thing is when i tried to open up "sccKonsole", the konsole on mcpc

      crashes and nothing happens (i have booted linux on all 48 cores and "sccBoot -s" says it can reach all cores).

       

      Thank you.

       

      rccerun -nue 4 -f ~/RCCE_V1.0.13/hosts/rc.hosts stencil_synch 4

      cp: cannot stat `/home/nil/RCCE_V1.0.13/bin/SCC_LINUX/mpb': No such file or directory

      pssh -h PSSH_HOST_FILE.14177 -t -1 -p 4 /shared/nil/mpb.14177 < /dev/null

      [1] 01:46:00 [FAILURE] rck02 Exited with error code 127

      [2] 01:46:00 [FAILURE] rck01 Exited with error code 127

      [3] 01:46:00 [FAILURE] rck03 Exited with error code 127

      [4] 01:46:00 [FAILURE] rck00 Exited with error code 127

      rm: cannot remove `mpb.14177': No such file or directory

      pssh -h PSSH_HOST_FILE.14177 -t -1 -P -p 4 /shared/nil/stencil_synch 4 1.0 00 01 02 03 4 < /dev/null

      rck00: Core 0 Executing 4 iterations

        • 1. Re: Error code 127
          xl10

          The thing is that mpb is not in the /shared path. You can copy the whole directory of RCCE to your shared directory.

          1 of 1 people found this helpful
          • 2. Re: Error code 127
            tedk

            That'll work. Another method that some users use is to add a copy to /shared/username in the makefile.

            1 of 1 people found this helpful
            • 3. Re: Error code 127
              Nil

              Thank for reply,

               

              I have tried copying RCCE under /shared but the problem is still there.

               

              As long as i understand, rccerun trying to copy the  file "mpb" from rccexx/bin/scc_linux. I had a look in that directory and  there is no such file, is it created during rcce build?

              • 4. Re: Error code 127
                tedk

                What machine are you on? Are you using a Data Center system?

                • 5. Re: Error code 127
                  Nil

                  Yes i am using Data Center system "marc022"

                  • 6. Re: Error code 127
                    tedk

                    I'm on marc022 and I see the example working for me ... but I am using RCCE from the trunk not the 1.0.13 release, which is a newer RCCE.

                    Can you try my executable and see if it works for you? You should be able to copy it to your area.

                     

                    tekubasx@marc022:/shared/tekubasx$ ls -l
                    total 504
                    -rw-r--r-- 1 tekubasx admin    528 2010-12-07 08:44 allhosts
                    -rwxr--r-- 1 tekubasx admin    172 2010-12-07 08:44 killcorePIDs
                    -rwxr--r-- 1 tekubasx admin    357 2010-12-07 08:44 killit
                    -rw-r--r-- 1 tekubasx admin    144 2010-12-07 08:39 rc.hosts
                    -rwxr-xr-x 1 tekubasx admin 496995 2010-12-07 08:45 stencil_synch
                    tekubasx@marc022:/shared/tekubasx$

                     

                    Here's the output (first few lines) I see when I run it.

                     

                    tekubasx@marc022:/shared/tekubasx$ rccerun -nue 4 -f rc.hosts stencil_synch 50
                    pssh -h PSSH_HOST_FILE.28107 -t -1 -p 4 /shared/tekubasx/mpb.28107 < /dev/null
                    [1] 08:46:12 [SUCCESS] rck00
                    [2] 08:46:12 [SUCCESS] rck01
                    [3] 08:46:12 [SUCCESS] rck02
                    [4] 08:46:12 [SUCCESS] rck03
                    pssh -h PSSH_HOST_FILE.28107 -t -1 -P -p 4 /shared/tekubasx/stencil_synch 4 0.533 00 01 02 03 50 < /dev/null
                    rck00: Core 0 Executing 50 iterations
                    rck00:
                    1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
                    0.9363 0.9128 0.9025 0.8971 0.8939 0.8918 0.8903 0.8891 0.8881 0.8870 0.8858 0.8844 0.8824 0.8793 0.8739 0.8627
                    0.8349 0.8153 0.8030 0.7952 0.7900 0.7864 0.7836 0.7814 0.7794 0.7774 0.7753 0.7727 0.7693 0.7645 0.7573 0.7462
                    0.7299 0.7152 0.7040 0.6957 0.6897 0.6851 0.6815 0.6786 0.6759 0.6732 0.6704 0.6671 0.6629 0.6575 0.6503 0.6408

                    • 7. Re: Error code 127
                      Nil

                      Here is my output using your executable.  I have not included middle part of output as it contains only numbers.

                       

                       

                      nil@marc022:/shared/nil$ rccerun -nue 4 -f rc.hosts stencil_synch 50
                      cp: cannot stat `/shared/nil/RCCE_V1.0.13/bin/SCC_LINUX/mpb': No such file or directory
                      pssh -h PSSH_HOST_FILE.28396 -t -1 -p 4 /shared/nil/mpb.28396 < /dev/null
                      [1] 08:54:12 [FAILURE] rck01 Exited with error code 127
                      [2] 08:54:12 [FAILURE] rck02 Exited with error code 127
                      [3] 08:54:12 [FAILURE] rck00 Exited with error code 127
                      [4] 08:54:12 [FAILURE] rck03 Exited with error code 127
                      rm: cannot remove `mpb.28396': No such file or directory
                      pssh -h PSSH_HOST_FILE.28396 -t -1 -P -p 4 /shared/nil/stencil_synch 4 1.0 00 01 02 03 50 < /dev/null
                      rck00: Core 0 Executing 50 iterations
                      rck00:
                      1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000 1.0000
                      0.9363 0.9128 0.9025 0.8971 0.8939 0.8918 0.8903 0.8891 0.8881 0.8870 0.8858 0.8844 0.8824 0.8793 0.8739 0.8627

                       

                       

                       

                      1.4858 1.5083 1.5227 1.5323 1.5392 1.5445 1.5489 1.5531 1.5575 1.5624 1.5685 1.5765 1.5876 1.6040 1.6293 1.6692
                      1.7249 1.7472 1.7579 1.7640 1.7679 1.7708 1.7732 1.7755 1.7778 1.7804 1.7837 1.7883 1.7950 1.8061 1.8270 1.8741
                      2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000
                      Total time: 0.001151
                      rck00: Checksum = 0.874852
                      [1] 08:54:13 [SUCCESS] rck03
                      [2] 08:54:13 [SUCCESS] rck00
                      [3] 08:54:13 [SUCCESS] rck01
                      [4] 08:54:13 [SUCCESS] rck02

                      • 8. Re: Error code 127
                        tedk

                        And I copied your executable into my area and it seems to be working also, although it seems to be not completing. I'm sorry if I stepped on something your were doing. marc022 may need now to have SCC Linux rebooted. But in any case I don't see that 127 error with either my exe or yours.

                        • 9. Re: Error code 127
                          Nil

                          I have tried it with latest rcce from trunk and it still produces same result?

                           

                          one naive question, do you have file called "mpb" under rcce/bin/SCC_LINUX or not?

                          • 10. Re: Error code 127
                            tedk

                            But I can run your executable, so I'm getting confused.

                            I can log on to marc022 as myself, build a RCCE, build stencil, and run the stencil example ... the ery same steps that you are doing.

                             

                            The mpb I think you are seeing is just a temporary programs that clears the MPB ... it's run before RCCE programs as a precaution against junk being left in the MPB.

                             

                            I'll do this agian and send you a script.

                            • 11. Re: Error code 127
                              tedk

                              I logged onto marc022, checked out RCCE from the trunk, built it, built the STENCIL example, ran it on cores 0 through 3. I attached a script that shows what I did. Please look at it and try out those commands in your environment.

                              • 12. Re: Error code 127
                                Nil

                                HI,

                                 

                                I can confirm that it is working now.

                                 

                                I would like to share few things here:

                                 

                                first thing i was using make [options] not makeall as suggested in (7.2) programmer's guide.

                                 

                                second thing when i used make [options] it did not create mpb file under SCC_LINUX/

                                but when i used makeall the mpb file was generated and the errors disappeared.

                                 

                                 

                                nil@marc022:/shared/nil/rcce$ ./configure SCC_LINUX
                                nil@marc022:/shared/nil/rcce$ make OMP_EMULATOR=0 API=nongory

                                (output of build omitted)

                                nil@marc022:/shared/nil/rcce$ ls bin/SCC_LINUX/
                                libRCCE_bigflags_nongory_nopwrmgmt.a

                                 

                                (output of make clean omitted)

                                 

                                nil@marc022:/shared/nil/rcce$ ./configure SCC_LINUX
                                nil@marc022:/shared/nil/rcce$ ./makeall

                                (output of build omitted)

                                nil@marc022:/shared/nil/rcce$ ls bin/SCC_LINUX/
                                mpb libRCCE_smallflags_nongory_nopwrmgmt.a  libRCCE_bigflags_nongory_nopwrmgmt.a  libRCCE_smallflags_gory_nopwrmgmt.a  libRCCE_bigflags_gory_nopwrmgmt.a

                                 

                                 

                                as you can see from output above when used make [options] there is no mpb file, while in makeall it is been generated, is it the way it should be?