0 Replies Latest reply on Nov 2, 2011 1:03 PM by chandras

    Shared memory - Parallel access times

    chandras

      Hello,

       

      I've been doing some experiments to profile the performance of shared memory for parallel accesses without caching. For this experiment, I allocate the space for an integer in the shared memory using RCCE_shmalloc and make all of the cores to concurrently read the integer 1,000,000 times. Since the shmalloc could allocate in any one of the memory controllers, I expected the core closest to the memory controller to be the fastest [first to finish], while the core farthest away to be the slowest [last to finish]. But this is not what I see.

       

      rck00: 1.762104
      rck01: 1.762337
      rck02: 1.790258
      rck03: 1.781338
      rck04: 1.781550
      rck05: 1.781546
      rck06: 2.670106
      rck07: 2.670031
      rck08: 1.778486
      rck09: 1.778523
      rck10: 2.669764
      rck11: 2.668152
      rck12: 2.290829
      rck13: 2.290630
      rck14: 3.430420
      rck15: 3.430751
      rck16: 3.432729
      rck17: 3.419364
      rck18: 2.271882
      rck19: 2.270674
      rck20: 3.407724
      rck21: 3.407947
      rck22: 2.270704
      rck23: 2.270740
      rck24: 2.573657
      rck25: 2.573673
      rck26: 3.859018
      rck27: 3.859059
      rck28: 2.572800
      rck29: 2.572803
      rck30: 2.572847
      rck31: 2.572848
      rck32: 2.572429
      rck33: 2.572383
      rck34: 2.572414
      rck35: 2.572399
      rck36: 2.577681
      rck37: 2.577675
      rck38: 2.577732
      rck39: 2.577727    
      rck40: 2.577833
      rck41: 2.577852
      rck42: 2.577727
      rck43: 2.577830
      rck44: 2.577724
      rck45: 2.577730
      rck46: 2.577519
      rck47: 2.577542

       

      While the fastest core is core 0, the slowest is core 27. Is this behavior expected? What am I missing?

       

      I've attached the test. The experiment was performed using the latest rcce version from the svn trunk on default SCCLinux.