After some experiments on unidirectional transfers (that is core 0 sends repeatedly a buffer allocate in his private memory to another core) using non-gory non-singlebitflags RCCE, I measured an average bandwidth of 8-10 Megabytes per second (the value depends a little on the recipient core).
The size of the buffer I used is 2KB or 4KB.
Is this result normal? Or is there something weird?
That's not completely weird. Data from Tim Mattson's paper "Light-weight Communications on Intel's Single-Chip Cloud Computer processor" shows RCCE is about maximum of 50MB/s and that's at 2K/4K messages. On our SCC platform (that is ETI baremetal framework), we are able to get about twice that by making the message send and receive asyncronous.
we observed a bandwith of about 27 MByte/s at 2048 byte message size with RCCE (bigflags, non-gory, no power managment). Note that we used the latency (or at least the following approximation): the round-trip time of a message was divided by two. This division is not done in the RCCE pingpong application provided in the Subversion repo. So doubling our result leads to the bandwidth as illustrated in Tim Mattson's (et al.) slides.