There's a memcpy() inside of RCCE_get(). But there are two versions of memcpy(). One is optimized for the hardware and the other just uses the Linux call. There's an #ifdef SCC that chooses between them.
I think you are describing the differences between a push implementation and a pull implementation.
It used to be that RCCE_send would place the data in the remote MPB (“push” model), and RCCE_get would fetch it from there. The current implementation is a "pull" model in which RCCE_send places the data in the local MPB and RCCE_get fetches it from there.
The major disadvantage of the push model is that with this model, it is not possible to implement RCCE_recv_test. But I think that a RCCE example in the distribution (the gory RCCE_shift.c example) still uses push.
There's even an opportunity to improve RCCE in this case. I was talking to Rob last night about push vs pull and he said that the pull model should be more efficient for broadcast. In the push model the sender needs to communicate the data to all receivers. It can be done with a broadcast tree, but the data does need to be pushed out multiple times. In the pull model you can just put the data in your local MPB once and then notify all recipients to pick it up (it does not happen that way in RCCE at the moment, but it could). Hence, much more of the communication gets parallelized.
Thanks, these are important insights.
Though to communicate between only 2 cores, there is no performance difference between push or pull method. Correct?
I don't think it makes much difference for communication between two cores. Push is not how RCCE implements send/receive, however, because then RCCE_recv_test wouldn't work. Actually early versions of RCCE did use push. When I talk with Rob (who wrote RCCE along with Tim), he's a fan of the pull model.