Looks like I'm getting closer. Having successfully compiled and loaded the module, I ran "ipm_daemon" and then tried mpiexec again. The output is now:
rck00:/root # mpiexec -n 48 -f mpi.hosts ./hello
Real-time signal 0
Anything else I should do? Thanks a million!
That means the signal could not be delivered. Can attach your dmesg output or just copy-paste here the last 100 lines or so?
I suspect the driver was not able to register the interrupt handler, which means the the rckmb driver is not sharing it. Would be sure if I can see the dmesg output.
- Isaías
Edit: Also, the new process manager does not have the same CLI options. You should just pass -n <processes> <full path to executable> <appl. parameters>. Proper parameter parsing is still marked as TODO in the code ![]()
SIGMOD(init): Loaded.
SIGMOD(signal): unregistered PID/TAG: 1.
SIGMOD(signal): unregistered PID/TAG: 2.
SIGMOD(signal): unregistered PID/TAG: 2.
That's about it. Then I tried to unload and reload the module, which unsuprisingly gives:
SIGMOD(exit): Unloaded.
SIGMOD(init): Loaded.
Nothing new happens when I run mpiexec afterwards...
Maybe these are silly questions, but:
- Do you load the driver and start the daemon in all cores?
- Do you pass the full path of the MPI application? Such as /root/foo.exe
Edit: It would also help to know if the rckmb driver is playing nice or is interfeering. I suspect I will need to find out the hard way by looking at the new code.
Edit 2: In the worst case I can post you a patched binary of the kernel image you can load.
Well, turns out that your questions are not silly at all. Actually, the driver and the daemon were not up and running on all cores. Reason - sccKonsole takes some time (about 3-4 minutes) to actually set up all the connections, and commands can be succesfully broadcast only then. What I do is to start sccKonsole in a remote X session, which works extremely slow -- I don't know if there is a better alternative (VNC maybe, but I can't connect to the MCPC like that). Anyway, it now seems to work (at least Hello World).
Just a small detail: is it possible to collect the output (@core0 for example)? Right now every core printf-s in its own console.
And one technical detail:
Can you briefly explain what made you use inter-core interrupts for signalization? For point-to-point communication, they are much slower than polling. Interrupts do offer full asynchrony, but why is that necessary/useful for MPI on the SCC? We have some in-house algorithms (the paper to appear soon) and I'd like to put them side by side with rckmpi, so knowing this would help me understand the result.
Thank you once again!
darence wrote:
Well, turns out that your questions are not silly at all. Actually, the driver and the daemon were not up and running on all cores. Reason - sccKonsole takes some time (about 3-4 minutes) to actually set up all the connections, and commands can be succesfully broadcast only then. What I do is to start sccKonsole in a remote X session, which works extremely slow -- I don't know if there is a better alternative (VNC maybe, but I can't connect to the MCPC like that). Anyway, it now seems to work (at least Hello World).
I had same problem when i use to use X session, but now i am using VNC and it is way faster. There was actually discussion about this some time ago in this community and the conclussion was to use VNC.
darence wrote:
Well, turns out that your questions are not silly at all. Actually, the driver and the daemon were not up and running on all cores. Reason - sccKonsole takes some time (about 3-4 minutes) to actually set up all the connections, and commands can be succesfully broadcast only then. What I do is to start sccKonsole in a remote X session, which works extremely slow -- I don't know if there is a better alternative (VNC maybe, but I can't connect to the MCPC like that). Anyway, it now seems to work (at least Hello World).
Just a small detail: is it possible to collect the output (@core0 for example)? Right now every core printf-s in its own console.
And one technical detail:
Can you briefly explain what made you use inter-core interrupts for signalization? For point-to-point communication, they are much slower than polling. Interrupts do offer full asynchrony, but why is that necessary/useful for MPI on the SCC? We have some in-house algorithms (the paper to appear soon) and I'd like to put them side by side with rckmpi, so knowing this would help me understand the result.
Thank you once again!
For IO multiplexing you have 3 options:
- No multiplexing (default): IO output goes to the stdout/stderr of the daemon. This is what you have now.
- Full-buffered: IO is buffered during the runtime. At job termination, all output goes to the terminal where the mpiexec command was executed.
- Interactive: IO is multiplexed to the terminal of mpiexec inmediately as it is flushed by each process in the MPI job.
Now if you recall, I consider the process manager to be in alpha state. Full-buffered and Interactive modes of multiplexing are not stable, and to be honest I have not touched the code since September of last year. Full-buffered works, but I did not have the time to test in length so I will refrain from saying that it is stable. Interactive does not work for application with lots of IO, becase the inter-core signaling does not work reliably when all 47 cores trigger 1 core at the same time, and this is the case since mpiexec runs in one core while up to 48 processes run in the rest (includding where mpiexec is).
With regards to your last question:
- MPI traffic is handled through polling, point to point or otherwise. Refer to this MARC submission for the communication protocol used for MPI. Note that some features described there are likely disabled in the rckmpi2 tree (need to check, has been a while) since they were not stable or well tested.
- Process management and IO multiplexing are done as mixed polling and inter-core interrupts. You would probably agree that MPI job launches and MPI_Comm_spawns do not happen frequently, making asynchronous communication appropriate.
Hope this helps,
- Isaías
Couldn't ask for more. Thanks for all the detailed explanations. I'll write my experiences here when I have some concrete results.
Everything seems to work now. There's one thing that is still baffling me though: how do I change the channel used? I would like to try out the hybrid channel (MPB + DRAM), but I don't know how... And I assume that it uses (only) the MPB by default?
darence schrieb:
Everything seems to work now. There's one thing that is still baffling me though: how do I change the channel used? I would like to try out the hybrid channel (MPB + DRAM), but I don't know how... And I assume that it uses (only) the MPB by default?
I suppose, you're using the original RCKMPI version. Then you have to choose the channel at compile resp. configuration time. Instead of "--with-device=ch3:sccmpb" use ch3:sccmulti as value for the with-device option when running the configure script. You can only have one channel in the compiled RCKMPI/MPICH2. So if you're comparing the channels, I suggest that you compile in/to a different directory.
The hybrid channel uses the MPB for messages smaller than 256 KB. For larger messages it switches to the off-die DDR3 memory.
Depends what you mean by "original RCKMPI version". I'm using the one from the rckmpi2 folder, so I had compiled it with --with-device=invasive --with-pm=invasive originally.
christgau wrote:
The hybrid channel uses the MPB for messages smaller than 256 KB. For larger messages it switches to the off-die DDR3 memory.
This actually varies depending on the size of the MPI job, according to this formula:
sccmulti_bigmsg_threshold = (1L << 18) / (pg_size - 1) + 32;
This should be moved to the channel specific files later on.
- Isaías
darence wrote:
Depends what you mean by "original RCKMPI version". I'm using the one from the rckmpi2 folder, so I had compiled it with --with-device=invasive --with-pm=invasive originally.
The new release has an improved MPB channel, as the only option.
The reason for this is that changes to the device and PMI were necessary to reduce the latency of spawns and to support a new programming model. These changes made the library incompatible with previous channels. The name invasive is in reference to the german project InvasIC. The idea behind this programming model is to allow for resource aware applications. I can ellaborate on the general idea and what is already available in RCKMPI2 to support this model.
The new channel in RCKMPI2 does outperform all previous ones. I think there is still the possibility of having a new mixed MPB/POPSHM one that is faster than the MPB only one. The reson for having an MPB only one is that we wanted to leave the POPSHM available to applications for hybrid MPI/SHM algorithms.
- Isaías
Hello Isaías ,
You mention that you built Open MPI.
I have my application running with OpenMPI (1.6), but on our own cluster. I'd like to run it on SCC.
So, is there any "how-to" for building Open MPI to work on SCC? I assume that it would require cross-compilation since my PC environment will be different from SCC. Do you build Open MPI on SCC itself?
Will appreciate your help.
Thanks a lot.
Devendra
Devendra Rai wrote:
Hello Isaías ,
You mention that you built Open MPI.
I have my application running with OpenMPI (1.6), but on our own cluster. I'd like to run it on SCC.
So, is there any "how-to" for building Open MPI to work on SCC? I assume that it would require cross-compilation since my PC environment will be different from SCC. Do you build Open MPI on SCC itself?
Will appreciate your help.
Thanks a lot.
Devendra
Hello Devendra,
Once you have a cross-compiler, you can build OpenMPI normally (configure + make + make install).
Afterwards, you will nead to load your libraries and binaries to the SCC. The only thing you need to make sure, is that when you launch the MPI job with OpenMPI on the SCC the TCP/IP socket byte transfer layer implementation is selected. Communication will be a lot slower that SCC specific implementations, such as MPICH2 with the sock channel.

