1 2 3 4 5 Previous Next 62 Replies Latest reply: Jun 1, 2012 3:14 AM by christofi.c Go to original post RSS
  • 30. Re: MPI on the SCC
    christgau Community Member
    Currently Being Moderated

    Steffen Christgau schrieb:

     

    i created a bug considering the OSC/RMA issues with some details I discovered up to now:

     

    http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=386

    The bug has been fixed. The RCKMPI OSC/RMA issues are resolved. Now, all test cases succeed with 'No Errors.' The modified source files are available in the bugtracker.

     

    Regards, Steffen

  • 31. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    Very nice.

     

    christgau wrote:

     

    Steffen Christgau schrieb:

     

    i created a bug considering the OSC/RMA issues with some details I discovered up to now:

     

    http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=386

    The bug has been fixed. The RCKMPI OSC/RMA issues are resolved. Now, all test cases succeed with 'No Errors.' The modified source files are available in the bugtracker.

     

    Regards, Steffen

     

    That is excellent work Steffen, both the fixes and the more MPICH2 like code.

     

    I read your bug report.  I remember I did a coverage test and never saw that code excecuted and removed it (bad move obviously).

     

    Similar fixes should also be added to:

     

    MPIDI_CH3_iSend

    MPIDI_CH3_iSendv

    MPIDI_CH3_iStartMsg

    MPIDI_CH3_iStartMsgv

     

    All found in the common_misc.c file.   I did not see that updated in the attached file, so perhaps you uploaded an older common_misc.c?

     

    I think it would be nice if you get commit access to the repository and commit your fixes.  Let's see if someone in charge reads this.  These fixes can also be easily applied to the new release.

     

    - Isaías

  • 32. Re: MPI on the SCC
    christgau Community Member
    Currently Being Moderated

    Hi Isaias,

    Isaías Alberto Comprés Ureña schrieb:

    Similar fixes should also be added to:

     

    MPIDI_CH3_iSend

    MPIDI_CH3_iSendv

    MPIDI_CH3_iStartMsg

    MPIDI_CH3_iStartMsgv

     

    All found in the common_misc.c file.   I did not see that updated in the attached file, so perhaps you uploaded an older common_misc.c?

    The functions should have been fixed as well. If the send functions complete a message and do not have to add it to the send queue, they call MPIDI_CH3U_Handle_send_req. That one will invoke the onDataAvail callback of the request if that one is set. That's what the implementation of the sock channel and the Nemesis device do as well. Both served as guideline for the OSC fixes.

  • 33. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    christgau wrote:

     

    Hi Isaias,

    Isaías Alberto Comprés Ureña schrieb:

    Similar fixes should also be added to:

     

    MPIDI_CH3_iSend

    MPIDI_CH3_iSendv

    MPIDI_CH3_iStartMsg

    MPIDI_CH3_iStartMsgv

     

    All found in the common_misc.c file.   I did not see that updated in the attached file, so perhaps you uploaded an older common_misc.c?

    The functions should have been fixed as well. If the send functions complete a message and do not have to add it to the send queue, they call MPIDI_CH3U_Handle_send_req. That one will invoke the onDataAvail callback of the request if that one is set. That's what the implementation of the sock channel and the Nemesis device do as well. Both served as guideline for the OSC fixes.

     

    I had not seed that function before and as of revision 9504, I only see that function used under the sctp channel, not the nemesis or sock channels.

     

    After looking at:

    http://trac.mcs.anl.gov/projects/mpich2/browser/mpich2/trunk/src/mpid/ch3/src/ch3u_handle_send_req.c

     

    I can see that the nemesis and sock channel do the same operations without calling this function, which should be generally faster without assuming a clever compiler inlining (although likely negligible).

     

    Your solution is better in terms of avoiding code duplication.

     

    - Isaías

  • 34. Re: MPI on the SCC
    darence Community Member
    Currently Being Moderated

    Could someone refer me to an up-to-date manual for setting up RCKMPI? Thank you!

  • 35. Re: MPI on the SCC
    darence Community Member
    Currently Being Moderated

    I'll try to be more specific -- I've tried to follow the manual from the "Documents" section and here are the steps I fail to reproduce:


    - configure: I am not sure what options to use. "--with-pm=mpd" reports some problems -- missing files in src/pm and src/pmi. I can see them in the "rckmpi" folder in the repository, but they don't exist in "rckmpi2". Which branch to use, rckmpi or rckmpi2, and what parameters to compile with (so as to use the MPB channel)?

     

    - broadcast from Konsole from rck00 to the other cores: I don't know how to do that...

     

    Thanks for your assistance!

  • 36. Re: MPI on the SCC
    Nil Community Member
    Currently Being Moderated

    Hi,

     

    I havent used rckmpi for long so can not answer your first question.

     

    Regarding second question

     

    when you have sccKonsole open with each tab representing a core just select "edit -> copy input to -> all tabs in current window"
    this will enter any command you type in current tab (rck00) to all other open tabs (rck01 to rck47 depending on number of open tabs).

     

     

     

    Hope this helps.

  • 37. Re: MPI on the SCC
    darence Community Member
    Currently Being Moderated

    Thanks, Nil. That helps.

     

    I've just noticed that a patched kernel is necessary to play with rckmpi2. It would be nice if someone could share his/her experience in doing so. By the way,  can anyone tell us what the difference between the two versions is? Judging from some documents, looks like there have been many improvements to RCKMPI lately so it would be useful to know how to work with them.

  • 38. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    darence wrote:

     

    Thanks, Nil. That helps.

     

    I've just noticed that a patched kernel is necessary to play with rckmpi2. It would be nice if someone could share his/her experience in doing so. By the way,  can anyone tell us what the difference between the two versions is? Judging from some documents, looks like there have been many improvements to RCKMPI lately so it would be useful to know how to work with them.

     

    Hello Darence,

     

    There are basically 2 big changes in the new version:

    - A new low level message passing protocol:

         - scales better with process count.

         - has a shared buffer that can be locked and used for optimized collectives

         - the shared buffer is used for spawn and MPI job launch operations

    - A new process manager

         - with orders of magnitude lower latencies when starting MPI jobs as well as doing MPI_Spawn ops.

         - has SCC specific placement strategy based on the Morton space filling curve

         - has hooks for power and topology awareness

     

    These things were done as part of my master thesis, and given the time constraints (listen to me making excuses ), there were many open issues left.  I would clasify the process manager as usable, but in alpha-beta state and low on features; in spite of that, I would not go back tousing the MPD on the SCC after trying the new one.  The channel is pretty solid outside of an issue Steffen found (check the threads here) in one-sided communication and one that I found recently on the metadata updates.

     

    The documentation was never really finised on how to install and run this new version.  The patched kernel is only necessary if the network driver (rckmb and a newer one) does not share the IRQ required by the inte-core signaling system included in the 2 version.

     

    I am very interested in helping people get going with this software, while participating myself (a good example of the benefit of this is Steffen's work).  Maybe we can arrange something with Intel since they own the software; I am curious about whether it could be handled similarly to the SCCLinux code in Github.

     

    The next MARC in France would be a nice place to discuss, but I am affraid I will not make it this time due to schedule conflicts...

     

    - Isaías

  • 39. Re: MPI on the SCC
    darence Community Member
    Currently Being Moderated

    The patched kernel is only necessary if the network driver (rckmb and a newer one) does not share the IRQ required by the inte-core signaling system included in the 2 version.

     

    Thanks for the exhaustive explanation! Out of curiosity, what is the inter-core signaling system used for in your implementation? Are there any details on this in your MARC paper or elsewhere? I am asking because we are working on some primitives that rely on inter-core interrupts and it could be useful to compare the work (or ideally not to do something that has been already done).

     

    I'll try to get rckmpi2 running so don't be surprised if I come up with new stupid questions!

  • 40. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    darence wrote:

     

    The patched kernel is only necessary if the network driver (rckmb and a newer one) does not share the IRQ required by the inte-core signaling system included in the 2 version.

     

    Thanks for the exhaustive explanation! Out of curiosity, what is the inter-core signaling system used for in your implementation? Are there any details on this in your MARC paper or elsewhere? I am asking because we are working on some primitives that rely on inter-core interrupts and it could be useful to compare the work (or ideally not to do something that has been already done).

     

    I'll try to get rckmpi2 running so don't be surprised if I come up with new stupid questions!

     

    The inter-core notification idea was first though of by (to the best of my knowledge) Merijn Verstraaten.  He introduced this in this post http://communities.intel.com/thread/18234?tstart=0

     

    The one found in rckmpi2 is more elaborate and is based on the 2.6.38 kernel.  You have a Linux driver compiled as a module that you must load first.  After that, you can use a library to register/unregister and signal processes.  You can register local and remote processes, although it makes sense to do the registration locally since otherwiese you will need to somehow communicate the process ID first.  The processes need to be registered before they can be signaled (they register in the driver through the API offered by the library component)and optionally you can tag them with any integer (to avoid having to remember their process ID, and even better to simplify your distributed software).

     

    What you do afterwards is to signal local or remote processes by specifying the core ID and the unique tag or process ID.  You specify the signal number and signal payload through the API calls. Both RT (> 32) and regular signals can be sent.  The library will take care of setting the required information and triggering the remote core, the driver (tipically in a remote core, but can be the local one) takes care of notifying the process when handling the interrupt (in reality it updates a work queue, and the signal is delivered at some point in the future depending on how Linux decides to schedule).  The only thing your software needs to make sure is to reserve the last 128 bytes of each MPB for this driver (meaning, not touching those addresses).  This driver can be improved in terms of reliability under load, but I don't have any ideas at the moment on how to do this without increasing its MPB memory requirements for buffering.

     

    This is unfortunately not documented anywhere, so perhaps it is material for a MARC submission in the future.  I did not have space to explain it in my thesis either

     

    I never really measured the latency of these operations since I identified early that my bottlenecks where somewhere else (sensible at the time given the time available).  Would be interesting to measure its performance.

     

    - Isaías

     

    PS: Feel free to ask me about installing the newer library. I will answer, what I can not do is promise to do it in a timely manner

  • 41. Re: MPI on the SCC
    Nil Community Member
    Currently Being Moderated

    Hi,

     

    I am trying to get this inter-core notification code working with new kernal. so my question is, is this "inter-core notification" is built into new kernal or not (the one used in sccKit 1.4.2)?. As far as i know code posted on that thread is working for older version of sccLinux but since linux source tree has changed i dont know if we still need to patch linux source with inter-core notification code or not.

     

    Thank you

  • 42. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    Nil wrote:

     

    Hi,

     

    I am trying to get this inter-core notification code working with new kernal. so my question is, is this "inter-core notification" is built into new kernal or not (the one used in sccKit 1.4.2)?. As far as i know code posted on that thread is working for older version of sccLinux but since linux source tree has changed i dont know if we still need to patch linux source with inter-core notification code or not.

     

    Thank you

     

    Hi Nil,

     

    Short answer: it is not built into the kernel (never was).

     

    Long one:

     

    The code posted in that thread is unrelated to the stuff in rckmpi2.  What Merjin had was for the 2.6.16 kernel and was the same general idea, implemented in a completely different way.  I mentioned it since I want to credit him with coming up with the idea first.

     

    As I mentioned earlier, I implemented a library + driver for inter-core notification as foundation to the process manager in rckmpi2.

     

    The patch I had on my working kernel was due to these factsa bout the rckmb driver:

    - it is build-in the kernel, instead of as a module

    - it did not share the IRQ necessary

    - it was using the MPB instead of shared memory, conflicting with rckmpi protocols

     

    I think a new rckmb driver was being developed to resolve these issues, but I am not in the know.  If you have this information please clarify and I can try to dig up my old patch for the 2.6.38 kernel and post it here.

     

    - Isaias

  • 43. Re: MPI on the SCC
    darence Community Member
    Currently Being Moderated

    Interesting discussion. Actually we used Merijin's idea from that post as a base for one of our libraries as well. There are two interrupt lines: rckmb uses one of them and, as Isaias said, doesn't necessarily share it. However, the other one is used by rckemac (I'm talking about the 2.6.38 kernel), which shares it properly, so the driver can use that IRQ line without any patches to the kernel. Works pretty well, but it is veeery slow if you want to use it for point-to-point communication (context switches are killing the performance). Our library works with 2.6.38 and RCCE and I hope we'll be able to provide the code soon if we manage to get the paper ready in a week (the submission deadline for MARC in Toulouse). Nil, that might answer your question.

     

    On an unrelated topic (but related to the thread ), I am still struggling with rckmpi2. Namely, this is what happens when I run mpiexec:

     

    rck00:/root # mpiexec -n 48 ./hello
    ERROR: timeout waiting for daemon response (core 0)
    spawn failed, daemon at core 0 return code: 0
    failed to spawn 48 processes out of 48

     

    I guess I need to set up the daemon first, but I don't know how. I guess I'm looking for an equivalent of mpdboot from the older version of rckmpi that used mpd (or I've just said something utterly stupid?).

  • 44. Re: MPI on the SCC
    compres Community Member
    Currently Being Moderated

    darence wrote:

     

    Interesting discussion. Actually we used Merijin's idea from that post as a base for one of our libraries as well. There are two interrupt lines: rckmb uses one of them and, as Isaias said, doesn't necessarily share it. However, the other one is used by rckemac (I'm talking about the 2.6.38 kernel), which shares it properly, so the driver can use that IRQ line without any patches to the kernel. Works pretty well, but it is veeery slow if you want to use it for point-to-point communication (context switches are killing the performance). Our library works with 2.6.38 and RCCE and I hope we'll be able to provide the code soon if we manage to get the paper ready in a week (the submission deadline for MARC in Toulouse). Nil, that might answer your question.

     

    On an unrelated topic (but related to the thread ), I am still struggling with rckmpi2. Namely, this is what happens when I run mpiexec:

     

    rck00:/root # mpiexec -n 48 ./hello
    ERROR: timeout waiting for daemon response (core 0)
    spawn failed, daemon at core 0 return code: 0
    failed to spawn 48 processes out of 48

     

    I guess I need to set up the daemon first, but I don't know how. I guess I'm looking for an equivalent of mpdboot from the older version of rckmpi that used mpd (or I've just said something utterly stupid?).

     

    Hello Darence,

     

    I was aiming at absolute minimal latencies, so for me it was important to use the interrupt that in on-chip.  The aim of the process manager was to support highly dynamic MPI applications efficiently.

     

    The first question is whether you build the sigmod.ko module from the rckmpi2 tree?  This module has to be build separately from the library since the way modules are build is not trivially integrateble with the autotool build system in mpich2.  Here is where it is at:

    http://marcbug.scc-dc.com/svn/repository/trunk/rckmpi2/src/pm/invasive/src/sigmod/

     

    You need to modify the Makefile there to point to your SCCLinux directory (only compatible with 2.6.38; note that Meriji's patch is for 2.6.16).  It would help if you know how to build modules.

     

    After the module is ready, you can try loading it (on the SCC, so you need to put it in /shared somewhere) and doing a dmesg to check if it succeeds.  If rckmb is not patched to share the IRQ, you will see an error stating this with the dmesg command.

     

    If you succeed then you are ready for loading the daemons.  The daemons will fail to start without the driver pre-loaded.

     

    Hope this helps.

     

    - Isaías

More Like This

  • Retrieving data ...

Legend

  • Correct Answers - 4 points
  • Helpful Answers - 2 points