I'm not aware of any known differences between answers from the RCCE emulator and RCCE on SCC. Where are you taking the emulator from? We had a couple of emulator updates, and I think the emulator in the trunk is now working correctly. (Thanks to Andreas Prell for that.)
Sorry, but we're confused about what you are actually doing. Can you provide more code context? Are you interested in passing the arguments of put and get as well as or instead of the actual data? Why are you referring to passing to another tile rather than a core?
I'm logged to Marc013. Pulled from trunk andreproduced. Printed out the difference between emulator and HW -
source file (debug prints on lines #72 and #89) -
Note on emulator the outer routine seems to enjoy the true numbers that were passed from remote MPB to local MPB. On HW, the read value is '0. When passing matrix argument on HW from remote MPB to private memory and then to local MPB, the probelm got fixed (the HW fix is the commented-out lines in the attach source file at lines 60,61 for a b2b RCCE_get and RCCE_put).
hosts mat_inv_async n 4 b 2 x 2 y 2
mat_inv_async 4 0.533 00 01 02 03 n 4 b 2 x 2 y 2
Parallel Matrix Inversion on 4 cores
com inner routine pointer check - target_ptr 134583840, target_val 36.000000
com outer routine pointer check - target_ptr 134583840, target_val 36.000000
com inner routine pointer check - target_ptr 134583872, target_val -0.916667
com outer routine pointer check - target_ptr 134583872, target_val -0.916667
Total time: 0.013866
eshifer@marc013:/shared/eshifer$ ./rccerun -nue 4 -f ./rc.hosts mat_inv_async n 4 b 2 x 2 y 2
pssh -h PSSH_HOST_FILE.16601 -t -1 -p 4 /shared/eshifer/mpb.16601 < /dev/null
 13:22:16 [SUCCESS] rck01
 13:22:16 [SUCCESS] rck03
 13:22:18 [SUCCESS] rck02
 13:22:33 [FAILURE] rck00 Exited with error code 255
pssh -h PSSH_HOST_FILE.16601 -t -1 -P -p 4 /shared/eshifer/mat_inv_async 4 0.533 00 01 02 03 n 4 b 2 x 2 y 2 < /dev/null
rck01: com inner routine pointer check - target_ptr -1216921536, target_val 0.000000
com outer routine pointer check - target_ptr -1216921536, target_val 0.000000
com inner routine pointer check - target_ptr -1216921504, target_val 0.000000
com outer routine pointer check - target_ptr -1216921504, target_val 0.000000
rck00: Parallel Matrix Inversion on 4 cores
Total time: 0.033951
Faliure - Matrix inversion compute error
 13:22:38 [SUCCESS] rck02
 13:22:38 [SUCCESS] rck03
 13:22:38 [SUCCESS] rck01
 13:22:38 [SUCCESS] rck00
I think I finally understand what it is you want to do. Yes, it should work on the hw without the detour.
In your code can you be assured that ME is never equal to ME_com? The put is performed by ME and the get by ME_com. We should be able to put into the MPB as well as a private buffer as long as ME != ME_com.
But there is a memcpy_get() and memcpy_put() inside the get and put. These are routines optimized for the hw. When you use the emulator, you are actually using the standard memcpy(). We could also use the standard memcpy() with the hw. The MPB-to-MPB path may not have been tested with the optimized memcpys. For example, in RCCE_get.c, we could replace memcpy_get with memcpy as follows ... this entails rebuilding RCCE.
105 // memcpy((void *)target, (void *)source, num_bytes); <== this is the Linux memcpy
106 memcpy_get((void *)target, (void *)source, num_bytes);
Another thing to try is to look at the return values for RCCE_get() and RCCE_put(). The error codes are in RCCE.h. I think what happens is that if RCCE gets an error, it won't transfer the bytes and consequently you'll see zero.
I don't think the RCCE spec is in error , but when I looked at the description of put and get, I think it could be clearer. target is both cases (put and get) pointes to where the data are placed.
yep, when I changed to the regular memcpy the problem disappered.
I wasn't able to run with error reporting for RCCE_get, as the HW got stuck and suddenly sccBsc -i and sccBoot -l commands were not found. What setup should I run to see these initialization commands?
As for ME and ME_com - these are different cores (if not, it's a bug in the program).
Ah, so then there is a bug in memcpy_put/get. If you don't file a bug, I will soon. Isn't it somebody's rule that a bug occurs in every untested path?
If you cannot find the sccKit commands ... I don't know why that would occur. The commands are in /opt/sccKit/current/bin.You can just put that in your path or use the perl script in /opt/sccKit/current as in eval `/opt/sccKit/current/setup` ... which I have in my .bashrc.