3 Replies Latest reply on Mar 10, 2011 8:25 AM by jheld

    Message Passing in SCC. Why is it better from the shared memory model?




      I find it hard to understand how SCC implements Message Passing from the documents.


      As far as I understand -

      The Message Passing mechanism is not "direct" - There is no link between each possible pair of tiles. Nor any other ring or whatever topology is in use. The MP mechanism is a part of the shared internal SRAM memory. Cores write and read to this buffer, their "mailboxes".

      If it is really so, then what is the advantage of this model over a simple shared-memory model of communication? I mean, each access to the "mailbox" is as slow as an access to the shared memory. Where is the catch?




        • 1. Re: Message Passing in SCC. Why is it better from the shared memory model?

          There is a mesh fabric, but a mesh interface unit converts to and from its packet interface to memory accesses.  Briefly, we didn't produce a chip with a messaging interface exposed because we didn't know what it should be, and it would have required a difficult and timeconsuming modification of the core to implement.  Instead we provided a couple key building blocks for experimentation with messaging that will provide insights on its value and relevance.

          The building blocks are: lack of cache coherence and its overhead in area, complexity or traffic, local memory on die, and the cache modifications (L1 tagging, invalidate instruction, and L2 bypass).   These allow communication between the cores without going off-die (though off-die is also permitted with the cache modifications).

          These can be used with a variety of messaging models, and with our experiments show a 15x performance advantage for small messages (depending of frequency configuration). We have an example implementation with RCCE.  If one uses the TCPIP stack in Linux the benefits of lower latency will be lost.  If one uses an async interacation and optimizes copies as iRCCE from RTWH Auchen the benefits are increased.

          Beyond performance, one reason one might want to use messaging on-die would be to preserve a consistent programming model on die that is used off-die for a scaleout application.  We have single core nodes, but one could imagine dual or quad core nodes that have local cache coherence.

          Shared memory can also be used with selective software managed cache coherence.  Indeed a PGAS model such as UPC could also be explored on SCC.  We're interested in what the community discovers with experimentation.


          1 of 1 people found this helpful
          • 2. Re: Message Passing in SCC. Why is it better from the shared memory model?

            Thanks for an interesting reply.


            I just don't understand 15x performance, compared to what? Compared to large messages?


            To my understanding, the user does not experience big performance boost when comparing to a system with a shared memory (no messaging), because in SCC the messages are indirect. It is the programming model you are after. The messages model HW acceleration is not fully implemented here. SCC messaging mechanism exists here just so we can feel it.


            Am I right?

            • 3. Re: Message Passing in SCC. Why is it better from the shared memory model?

              Compared to messages through off-die (regular reading-writing of) memory.   Please see http://communities.intel.com/docs/DOC-5032

              You may want to review the other documents in that section of this website for background.


              There are no message instructions, just enough HW to implement a first level API to support the programming model.  The programming model and the ability of SW to exploit the low latency of staying on die is the question.