1 Reply Latest reply on Aug 21, 2012 11:39 AM by devendra.rai

    RCKMPI job crashes with abtruse warning




      I am trying to run a multiprocessor job onto SCC (kernel 3.1.4, gcc-4.6, rckmpi with sccmulti).


      The job runs well on our own linux clusters, but on the SCC, I get this:


      3: terminate called after throwing an instance of 'std::system_error'

      2: CommandID: 0


      3:   what():  Resource temporarily unavailable

      rank 3 in job 3  rck00_58139   caused collective abort of all ranks

        exit status of rank 3: killed by signal 6


      In my program, I do not throw std::system_error, and the trace clearly says " what():  Resource temporarily unavailable"


      I am lost on what this could mean.


      I am running the job on cores 00-07.


      Any ideas will be helpful.




      Devendra Rai