1 of 1 people found this helpful
There is a mesh fabric, but a mesh interface unit converts to and from its packet interface to memory accesses. Briefly, we didn't produce a chip with a messaging interface exposed because we didn't know what it should be, and it would have required a difficult and timeconsuming modification of the core to implement. Instead we provided a couple key building blocks for experimentation with messaging that will provide insights on its value and relevance.
The building blocks are: lack of cache coherence and its overhead in area, complexity or traffic, local memory on die, and the cache modifications (L1 tagging, invalidate instruction, and L2 bypass). These allow communication between the cores without going off-die (though off-die is also permitted with the cache modifications).
These can be used with a variety of messaging models, and with our experiments show a 15x performance advantage for small messages (depending of frequency configuration). We have an example implementation with RCCE. If one uses the TCPIP stack in Linux the benefits of lower latency will be lost. If one uses an async interacation and optimizes copies as iRCCE from RTWH Auchen the benefits are increased.
Beyond performance, one reason one might want to use messaging on-die would be to preserve a consistent programming model on die that is used off-die for a scaleout application. We have single core nodes, but one could imagine dual or quad core nodes that have local cache coherence.
Shared memory can also be used with selective software managed cache coherence. Indeed a PGAS model such as UPC could also be explored on SCC. We're interested in what the community discovers with experimentation.
Thanks for an interesting reply.
I just don't understand 15x performance, compared to what? Compared to large messages?
To my understanding, the user does not experience big performance boost when comparing to a system with a shared memory (no messaging), because in SCC the messages are indirect. It is the programming model you are after. The messages model HW acceleration is not fully implemented here. SCC messaging mechanism exists here just so we can feel it.
Am I right?
Compared to messages through off-die (regular reading-writing of) memory. Please see http://communities.intel.com/docs/DOC-5032
You may want to review the other documents in that section of this website for background.
There are no message instructions, just enough HW to implement a first level API to support the programming model. The programming model and the ability of SW to exploit the low latency of staying on die is the question.