At the hardware level, each core's 'Time Stamp Counter' (TSC) is zeroed upon reset and starts incrementing when the the given core leaves the reset state. Each core in the SCC has a reset bit (RESC0/RESC1) in the per-tile GCU Register that when cleared causes the tile's cores to leave the reset state. Since each per-tile GCU register is independent and there is no 'global reset' each tile's cores must be released from the reset state in a loop that iterates over all the tiles.
ETI's SCC Development Frame releases the reset on a core-by-core basis via the SIF in a tight loop, but there is still a finite skew between cores due to the reset loop overhead. See attached file scc_time_stamp_counter-24cores.pdf for an example of the skews encoutered when sampling the TSC at the beginnning of a simple program.
Clock synchronization requires a barrier with a deterministic release latency to synchronize all the cores, then each core should fetch its Time Stamp Counter via the RDTSC instruction to determine its particular clock offset.
A RCCE clock synchronization example can be found at the end of RCCE's RCP_init_RPC() function in the RCCE_power_management.c file, where a barrier is used to synchronize the cores.
Reference: Intel Architecture Software Developer’s Manual, Volume 3: System Programming, Section 2.6.6. -
'Reading Performance-Monitoring and Time-Stamp Counters'