9 Replies Latest reply on Feb 22, 2011 12:35 PM by jheld

    Test and Set register


      Some questions regarding test&set register although the semantics of test & set register is clear to me.:


      Are all cores are able to read and write the Test&Set bit of other cores?

      Plus what is the atomic test&set machine operation? How such a command benefits from semantics of test&set register?


      I hope my question is clear and thanks in advance

        • 1. Re: Test and Set register

          I can comment on your first question. Yes, the registers are memory-mapped and can be accessed by all cores. The functions RCCE_acquire_lock() and RCCE_release_lock() defined in RCCE_admin.c show how locking is implemented on top of these registers.

          1 of 1 people found this helpful
          • 2. Re: Test and Set register

            I'm not sure what you mean by


            what is the atomic test&set machine operation? How such a command benefits from semantics of test&set register?


            As Andreas said, we access the test&set registers through memory-mapped I/O, using mmap() through the device /dev/rckncm.  What more do you want to know beyond that? Are you running on baremetal or on SCC Linux? Are the test&set registers giving you the functionality and performance that you need?

            1 of 1 people found this helpful
            • 3. Re: Test and Set register

              Thanks for your useful answers. I want to know apart from shared atomic test and set registers, do we have any other atomic objects?

              And about atomic operations in the instruction set, I want to know is there any atomic swap operation (or fetch and store) available? What about compare and swap (or compare and exchange)? They could be useful to implement some concurrent data structures where might test and set registers are not enough. Plus I haven't started implementation on the platform yet.



              • 4. Re: Test and Set register



                You might find these two threads helpful:




                Compare and Exchange does not work as you would expect on SCC.

                • 5. Re: Test and Set register

                  Those other discussions are a good suggestion.  Overall think cluster, not shared memory, programming.   Atomic instructions work with respect to single-core execution, e.g. an interrupt won't divide a CMPXCHG, but LOCK to hold off another core is not intended to work.   Use a message to synchronize, not a memory location.

                  • 6. Re: Test and Set register

                    Thanks for your useful comments. I  understand your comment about have a message passing look on the  platform rather than shared memory. But I am considering usage of off  chip shared memory in case applications want to use it. Having this in  mind, I have the following questions:


                    1-So  you mean it is not possible to provide synchronization (e.g. locks) to  access shared off-chip memory using 48 TNS registers? They are synchronizing access to shared on chip memory, so why not do the same for shared off chip memory. In other words why TNS locks  should be used only for implementation of message passing layer?


                    2-So in case an application needs to use some shared data  structures on off chip shared memory and we avoid using the TNS register to implement locks, how should one control mutex and  synchronization to access them? A distributed lock built on top of  message passing maybe?

                    • 7. Re: Test and Set register

                      The TNS bits can be used for whatever you like - protecting access to on-die memory, off-die memory, anything that an atomic test and set is useful for.  They are just a globally visible set of bits that are TNS.  I mention messaging, because since there are only 48, so a layered solution seems appropriate.    SCC messaging support is agnostic to on-die, off-die.

                      • 8. Re: Test and Set register

                        Thanks for your clear answer. Another question:


                        Does the current implementation of locks in messaging library requires that cores spin on local on-chip memory of other cores? In this case wouldn't be a bottleneck in front of scalability of these locks since they consume interconnect traffic and also could create high contention on memory mapped TNS registers?

                        • 9. Re: Test and Set register

                          I can't speak to the specific implementation on RCCE (if that is what you mean).


                          Polling is limited by the throughput of the cores which is much less than the mesh can sustain.

                          Polling requires traffice but is very low latency.  I'd expect if a system is doing work and load balanced then the period of polling would be brief.

                          Async is also possible.  Interrupts also burn power and have high latency.  The best design point in the tradeoff will depend on the nature of your workload.