1 2 Previous Next 15 Replies Latest reply: Apr 29, 2011 3:47 AM by clauss RSS

iRCCE: A Non-blocking Communication Extension to RCCE

clauss Community Member
Currently Being Moderated

Hello MARC Community,

 

Please find attached the TAR file of the iRCCE library that has been recently developed at our institute.

 

iRCCE is a non-blocking communication extension to the well-known RCCE communication library that extends RCCE by asynchronous message-passing functions (iRCCE_isend/iRCCE_irecv). Furthermore, iRCCE also improves the performance of some RCCE functions, as for example the blocking send/receive operations by applying an assembler-coded and SCC-customized memory copy routine.

 

You can find a very detailed documentation as a PDF file within the TAR package in the doc folder.

 

Please do not hesitate to contact us in case of any problems, questions or ideas for improvements.

 

Carsten Clauss, Stefan Lankes

http://www.lfbs.rwth-aachen.de

Chair for Operating Systems

RWTH Aachen University

 

UPDATES:

 

[2011-02-22] We have fixed a bug concerning the waitlist mechanism of iRCCE (thanks to Roy Bakker for his bug report; see below).

We have also updated the iRCCE manual a little bit (effective February 22, 2011).

 

[2011-04-14] We have added a configuration check that tests whether iRCCE is being built against the trunk revision of RCCE or against the older RCCE V1.0.13 release -- hope this will solve the recently reported version conflicts.

  • 1. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    roybakker Community Member
    Currently Being Moderated

    Hi Carsten,

     

    I am experimenting a bit with iRCCE now, and I experienced the following:

     

    I  am trying to receive a message that may come from any core, but I do  not know in advance which one it is. So therefore I issue  #RCCE_num_ues() iRCCE_irecv() calls, one for each possible sender. I  collect the iRCCE_RECV_REQUESTs in a iRCCE_WAIT_LIST, and then perform a  iRCCE_wait_any() on that waitlist.

     

    The problem I have is that for some reason the  outstanding iRCCE_RECV_REQUESTs must be completed in the (exact) same  order as they were issued. So for example if I issue on core 0  "1:recv(core 1), 2:recv(core 2)", the recv(core 2) will only complete  after completion of recv(core 1). In this case this might even never  happen, so the program hangs.

     

    I wonder wheather this is a bug or a known (and maybe unsolvable) issue, or just that I'm doing something very stupid?

     

    Kind regards,

     

    Roy Bakker

    University of Amsterdam

  • 2. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    clauss Community Member
    Currently Being Moderated

    Hi Roy,

     

    I hope that our bug fix of iRCCE (see the update of my initial posting) solves the issues you reported.
    I have also added a small fix to your sample code (see attached) -- please notice that the completion of a pending communication request must be ensured by a call to the wait() or test() function.

  • 3. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    radudavid Community Member
    Currently Being Moderated

    Hi,

     

    I was looking at iRCCE myself, and tried to install it on my machine.

    However, I am getting a weird error when making the library:

     

     

    ./src/iRCCE_synch.c(77): error: expression must have struct or union type
        flaga = flag.flag_addr;
                ^
    compilation aborted for ./src/iRCCE_synch.c (code 2)
    make: *** [iRCCE_synch.o] Error 2

     

    To note is thtat the iRCCE_synch.c file has two instances of the synchronization function, and the error only comes up in the second instance.

    I looked at the guide in the pdf file, but I cannot figure out what's wrong with this (environent variables maybe?). Did anyone else run into a similar problem?

     

    Thanks,

     

    Radu

  • 4. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    clauss Community Member
    Currently Being Moderated

    Hi,

     

    I think you just need to update you RCCE version: type "svn up" in the RCCE folder (the current revision is 166) and build RCCE as well as iRCCE again.

     

    Please let me know if the problem persists.

     

    With best regards,
    Carsten

  • 5. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    radudavid Community Member
    Currently Being Moderated

    Thanks a lot Carsten,

    It's all good now. I was using a workaround for the problem, but it's great that this fixes it.

     

    Radu

  • 6. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    ms705 Community Member
    Currently Being Moderated

    Hi Carsten,

     

    The problem persists for me, even using the most recent version of iRCCE (downloaded from your original post) and the most recent trunk version of rcce (rev 166).

     

    I get the following compilation error when trying to make iRCCE:

     

    ms705@mawddach:~/scratch/scc/iRCCE$ make
    g++ -c -O3 -fopenmp    -I/home/ms705/scratch/skywriting/scc/rcce/include   -I./include ./src/iRCCE_synch.c 
    ./src/iRCCE_synch.c: In function ‘int iRCCE_test_flag(volatile int*, RCCE_FLAG_STATUS, int*)’:
    ./src/iRCCE_synch.c:76: error: request for member ‘line_addr’ in ‘flag’, which is of non-class type ‘volatile int*’
    make: *** [iRCCE_synch.o] Error 1

     

    I had a look and the error appears to be related to the fact that RCCE_FLAG is #defined to volatile int* in the case that SINGLEBITFLAGS is not defined, but it is a struct otherwise. Now, in RCCE_synch.c, on line 76 an access to member flag_addr of flag is attempted, despite flag being #defined to be volatile int* in this case.

     

    Does anyone have a workaround or fix for this issue? I can obviously dive into the code myself and try to fix it, but I thought I'd ask before I do redundant work :-)

     

    Update: Actually, I was wrong -- this was when compiling iRCCE against RCCE V1.0.13. The issue I'm having with the trunk version is that I cannot compile RCCE itself against the "emulator" target:

     

    ms705@mawddach:~/scratch/scc/rcce-trunk/rcce(svn)$ make
    [...]
    g++ -c -O3 -fopenmp    -I/home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/include   /home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/src/RCCE_send.c 
    /home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/src/RCCE_memcpy.c: Assembler messages:
    /home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/src/RCCE_memcpy.c:87: Error: suffix or operands invalid for `mov'
    make: *** [RCCE_send.o] Error 1

     

    and subsequently, iRCCE also fails to compile (perhaps not surprisingly):

     

    ms705@mawddach:~/scratch/skywriting/scc/iRCCE$ make
    g++ -c -O3 -fopenmp    -I/home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/include   -I./include ./src/iRCCE_isend.c 
    [...]
    g++ -c -O3 -fopenmp    -I/home/ms705/scratch/skywriting/scc/rcce-trunk/rcce/include   -I./include ./src/iRCCE_put.c 
    ./include/scc_memcpy.h: Assembler messages:
    ./include/scc_memcpy.h:143: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:150: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:151: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:165: Error: suffix or operands invalid for `pop'
    ./include/scc_memcpy.h:166: Error: suffix or operands invalid for `pop'
    ./include/scc_memcpy.h:189: Error: suffix or operands invalid for `pop'
    ./include/scc_memcpy.h:143: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:150: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:151: Error: suffix or operands invalid for `push'
    ./include/scc_memcpy.h:165: Error: suffix or operands invalid for `pop'
    ./include/scc_memcpy.h:166: Error: suffix or operands invalid for `pop'
    ./include/scc_memcpy.h:189: Error: suffix or operands invalid for `pop'
    make: *** [iRCCE_put.o] Error 1

     

    Any ideas? (I realise this is probably not an iRCCE issue)

  • 7. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    clauss Community Member
    Currently Being Moderated


    This issue seems to be a compiler/assembler problem regarding the SCC-customized memcpy of RCCE/iRCCE and 64bit platforms.

     

    Try to add -m32 to the platform flags in rcce/common/symbols.in (PLATFORMFLAGS=-fopenmp -m32) and call ./configure emulator ; make again.

     

    As another workaround, you can also try to replace the assembler code with a common memcpy() in the following files: rcce/src/RCCE_memcpy.c and iRCCE/include/scc_memcpy.h

     

    For example like this:

     

    inline static void *memcpy_from_mpb(void *dest, const void *src, size_t count)
    {
    #if 1
            memcpy(dest, src, count);
    #else
        int h, i, j, k, l, m;

     

        asm volatile ("cld;\n\t"
                  "1: cmpl $0, %%eax ; je 2f\n\t"
                  "movl (%%edi), %%edx\n\t"
                  "movl 0(%%esi), %%ecx\n\t"
                  "movl 4(%%esi), %%edx\n\t"
                  "movl %%ecx, 0(%%edi)\n\t"
    ...
    #endif

     

    I hope this will help.
    Carsten

  • 8. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    saibbot Community Member
    Currently Being Moderated

    Hello,

     

    By saying "please notice that the completion of a pending communication request must be ensured by a call to the wait() or test() function" what excactly do you mean? Is it obligatory to do a test or a wait call in order a message to be delivered?

     

    Thanks,

    Vasileios Trigonakis.

    EPFL University

  • 9. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    clauss Community Member
    Currently Being Moderated

    Hello,

     

    A non-blocking communication function returns iRCCE_SUCCESS in that case that the communication request could be finished already within this function call.
    However, usually the function returns iRCCE_PENDING or iRCCE_RESERVED indicating that the communication has been started but not yet finished or that there are prior requests pending in the send or receive queue and that the new request is being reserved.
    Therefore, if you do not check for iRCCE_SUCCESS, a subsequent call of test() or wait() is actually mandatory.

     

    And this is because before the completion of a non-blocking operation is not ensured by a call to these functions, neither the respective receive buffer is guaranteed to be valid (it is likely that the message has yet not arrived in the receive buffer) nor the respective send buffer is allowed to be modified (it is likely that the message has yet not been copied out of the send buffer).

     

    [see iRCCE manual: http://communities.intel.com/docs/DOC-6003]

     

    I hope this explanation helps.
    Carsten

  • 10. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    saibbot Community Member
    Currently Being Moderated

    I was confused because the test and wait calls can be implicit. What I meant is that you can actually avoid the explicit call to test() or wait() on the iRCCE_SEND_REQUEST, if you use iRCCE_WAIT_LIST and test or wait on it instead.

     

    Thanks a lot for your answer.

    Vasileios.

  • 11. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    KARTHIK Community Member
    Currently Being Moderated

    Hi Carsten,

     

    I get the same error as reported by ms705 while trying to build iRCCE.

     

    $ make
    g++ -c -O3 -fopenmp    -I/home/karthik/Research/SCC/RCCE/bkp/RCCE_V1.0.13/include   -I./include ./src/iRCCE_isend.c 
    g++ -c -O3 -fopenmp    -I/home/karthik/Research/SCC/RCCE/bkp/RCCE_V1.0.13/include   -I./include ./src/iRCCE_irecv.c 
    g++ -c -O3 -fopenmp    -I/home/karthik/Research/SCC/RCCE/bkp/RCCE_V1.0.13/include   -I./include ./src/iRCCE_admin.c 
    g++ -c -O3 -fopenmp    -I/home/karthik/Research/SCC/RCCE/bkp/RCCE_V1.0.13/include   -I./include ./src/iRCCE_synch.c 
    ./src/iRCCE_synch.c: In function ‘int iRCCE_test_flag(volatile int*, RCCE_FLAG_STATUS, int*)’:
    ./src/iRCCE_synch.c:76: error: request for member ‘flag_addr’ in ‘flag’, which is of non-class type ‘volatile int*’
    make: *** [iRCCE_synch.o] Error 1

     

    The reason is as already said by ms705, RCCE_FLAG is #defined to volatile int * but the code iRCCE_sync.c (line 76, code below) is trying to access it as a struct variable. RCCE_FLAG will be a struct only when SINGLEBITFLAGS is set. But in this build SINGLEBITFLAGS is not set.

     

    iRCCE_sync.c:

     

    74 int iRCCE_test_flag(RCCE_FLAG flag, RCCE_FLAG_STATUS val, int *result) {

    75

    76      t_vcharp flaga = flag.flag_addr; 

     

     

    Also from the above code "flag_addr" seems to be a member of RCCE_FLAG struct (when SINGLEBITFLAGS is set). But from the definition of RCCE_FLAG in RCCE.h, "location" & "line_address" are the only members. Am I missing something here? please let me know how to fix this.

     

    Thanks,

    karthik

  • 12. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    clauss Community Member
    Currently Being Moderated

    Hello Karthik,

     

    The  error you reported is most commonly due to a version mismatch between  RCCE and iRCCE.

    Therefore, please ensure that you are using the latest version of RCCE (current trunk revision is 188) as well as the latest version of iRCCE.

     

    RCCE: http://marcbug.scc-dc.com/svn/repository/trunk/rcce
    iRCCE: http://communities.intel.com/servlet/JiveServlet/download/110482-19045/iRCCE.tar.zip

     

    With best regards,
    Carsten

  • 13. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    rdm34 Community Member
    Currently Being Moderated

    Here is a simple iRCCE-based task farm example in case it is useful to anyone getting started. It borrows from the example posted earlier in this thread (thanks!).

     

    - Robert.

  • 14. Re: iRCCE: A Non-blocking Communication Extension to RCCE
    KARTHIK Community Member
    Currently Being Moderated

    Hi Carsten,

     

    I was trying to write a iRCCE nonblocking application (which runs in emulator) where two cores simulataneously send messages between each other and receive those messages and print them. The message sent by each core has two parts one is header which says the length of the payload which follows this header. Thus basically each core sends data twice. In the receive section each core requests for the header. After receiving the header, it reads them and finds the payload size. Then it requests payload of that size.

     

    When a core only sends and another receives, it is working perfectly (#define ONEWAY should be retained in the attached code to check this behaviour). But when two cores try to send and receive data simultaneously, it hangs  indefinitely (#define ONEWAY needs to be commented). Please let me know whether I am missing something here. I am using RCCE revision 188.

     

    Also iRCCE apps some times get hung with RCCE revision 188. Is this due to the bug in RCCE itself? I face similar issues with current RCCE revision 206.

     

    Thanks for your time,

    karthik

1 2 Previous Next

More Like This

  • Retrieving data ...