8 Replies Latest reply on Jul 13, 2011 7:05 PM by jheld

    How can I get lots of locks for use in my RCCE+POP-SHM application

    ally

      Dear SCCers

       

      I've managed to get quite a complex shared memory application working nicely across the SCC cores, by using POP-SHM for shared memory accesses, and RCCE to get things going and for barriers and locks.  This is thanks to a lot of help from this forum.

       

      I'm now moving on to a more complex version of my application which uses a huge hashtable.  In the regular multicore PC version of the application, the hashtable is managed by having many, many locks (any number in principle, but 256 by default).  Each lock is a memory location, managed using atomic test-and-set operations.

       

      Each such lock is used to guard some subset of collision chains in the hashtable.  Having a large number of locks is important to keep these sets of collision chains relatively small, so that access to the hashtable does not become a bottleneck.

       

      The problem I have in porting this to SCC+RCCE is that it seems that RCCE only provides a single lock per core.

       

      Is there any way to get more locks?

       

      I've been thinking about a solution using flags, but becuase it does not seem possible to test and set a flag in one atomic operation, I cannot see how a RCCE flag can be used to simulate a lock.

       

      Thanks

       

      Ally

        • 1. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
          tgmattso

          This is a tough problem.  I look forward to what people come up with for a solution on the forum.

           

          Take a look at how the single bit flag in RCCE is implemented.  We use the single test-and-set register per core to protect a whole cache line of single bit flags.  You can use this same trick to implement a lock heirarchy ... a core interested in a single lock must first get exclusive access to the "cache line of locks" which can be protected by the test and set register.   At that point, you know only one core can access the bits and you can safely depend on atomic semantics from that one core.

           

          Does this make sense?

           

          --Tim

          • 2. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
            tedk

            Tim,

            Am I off-base in thinking that the new atomic counters might be useful for this?

            • 3. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
              jheld

              Seems like you are working hard to turn SCC into a traditional CMP architecture by applying a design I'd expect on a shared-memory cache-coherent machine.

              Is there no messaging counterpart to the purpose of the hashtable?

              • 4. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
                ally

                Hi Tim

                 

                Yes, this does make sense, thanks.

                 

                I think I could probably make this solution work, and it would be better than having a single global lock on the hashtable, but I guess less efficient than the "loads of locks" method that would be available on a traditional architecture.

                 

                Cheers

                 

                Ally

                • 5. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
                  ally

                  Hi Jim

                   

                  You're right that what I am basically trying to do is shoehorn a shared memory multicore program onto SCC.

                   

                  Perhaps I can give you some context, and then you might be able to advise me as to whether what I'm trying to do sounds reasonble.

                   

                  The application is a finite-state model checker (SPIN).  The background is that there have been many past attempts to parallelise SPIN on clusters, using message passing.  How to do this has been well studied, but invariably researchers have found it difficult to achieve scalability due to the large number and size of messages that need to be exchanged.

                   

                  Then in 2007, a shared memory algorithm for SPIN was developed, and quite good scalability for up to 4 or 5 cores shown.  However, scalability tailed off due to the lack of memory bandwidth.

                   

                  What I've been hoping is that SCC will offer some kind of compromise.  Cores can do local computation in their own memory spaces, not requiring coherency with other cores, and use shared memory (via POP-SHM) to store data that must be shared (e.g. the big hashtable I mentioned).  Unfortunately, they will need to access this shared memory very frequently, but at least there will not be inter-core contention for other data structures, which are local.

                   

                  Do you have any feeling for how scalable POP-SHM is in terms of having it accessed by many cores?

                   

                  Thanks

                   

                  Ally

                  • 6. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
                    tgmattso

                    I assumed he needed a large number of locks and hence I didn't consider the atmoic counters.  I think he'd still need to play some sort of hiearchical locks trick and if you're going to do that, you might as well work wtih the more efficient T&S registers on the cores.

                     

                    I think the long term solution is to think of a distributed hash table design that doesn't need the locks.  POPSHM gives us shared memory, but that doesn't make SCC a good platform for running algorithms designed for traditional cache coherent shared memory systems.

                     

                    --Tim

                    • 7. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
                      saibbot

                      Then in 2007, a shared memory algorithm for SPIN was developed, and quite good scalability for up to 4 or 5 cores shown.  However, scalability tailed off due to the lack of memory bandwidth.

                       

                      I am positively sure you will notice the same badnwitdth issues on the SCC also. I have similar issues (using SHMEM though, not POP-SHM)

                       

                      What Tim suggested was my first thought also; you could have a hashtable of locks (48 buckets) that are protected by the atomic TNS registers. You could either place them in the MPB or the shared memory.

                      • 8. Re: How can I get lots of locks for use in my RCCE+POP-SHM application
                        jheld

                        Thanks for the explanation.  I'd second Tim's suggestion about a distributed hash-table.

                        It would be interesting to see how the ealier messaging approaches worked on SCC and iRCCE.

                         

                        Sorry, I haven't seen any assessment of the scaling of POP-SHM.

                         

                        -Jim