1 Reply Latest reply on Aug 21, 2012 11:39 AM by devendra.rai

    RCKMPI job crashes with abtruse warning

    devendra.rai

      Hello:

       

      I am trying to run a multiprocessor job onto SCC (kernel 3.1.4, gcc-4.6, rckmpi with sccmulti).

       

      The job runs well on our own linux clusters, but on the SCC, I get this:

       

      3: terminate called after throwing an instance of 'std::system_error'

      2: CommandID: 0

      3:

      3:   what():  Resource temporarily unavailable

      rank 3 in job 3  rck00_58139   caused collective abort of all ranks

        exit status of rank 3: killed by signal 6

       

      In my program, I do not throw std::system_error, and the trace clearly says " what():  Resource temporarily unavailable"

       

      I am lost on what this could mean.

       

      I am running the job on cores 00-07.

       

      Any ideas will be helpful.

       

      Thanks.

       

      Devendra Rai