3 Replies Latest reply on Jan 23, 2013 7:00 AM by JanArneSobania

    Problem in understanding caching on SCC



      I am having problem understanding caching on SCC. Currently I am running my own kernel on SCC . I have marked the shared memory (0x8000000- 0x83FFFFFF) as Uncacheable in the Page Tables  by setting the PCD bit. Besides, I am  setting the CD bit of CR0 to enable/disable cache. I was thinking that the time to read/write shared memory should not vary whether the cache is enabled or not as shared memory is marked as uncacheable. But a small experiment is telling something different. I wrote code to copy a value from one memory location to another in the shared memory region  multiple times and measured the time using global timestamp counter. The time measured when caching is enabled through CR0 is very different from the time measured when caching is not enabled . Following is the code -


      print_GTSC();  //prints the value of Global Timestamp Counter

      int t = 0;

          for(;t < 0x1000000;t++)


            *((uint32*)0x80400004) = *((uint32*)0x80400000);




      This code takes around 6-7 secs when caching is enabled and around 34-36 secs when caching is disabled. I am unable to understand why is the time varying so much when shared memory is not getting cached. Is there a flaw in my understanding of the caching mechanism ?


      Here is the function in assembly to able caching :


      /* BareMichael SCC baremetal framework.

      * Copyright (C) 2012.  All rights reserved. */


      #define CR0_NW (1 << 29)

      #define CR0_CD (1 << 30)


      /* void enable_caching(void) */

      .globl enable_caching


      # CR0: clear NW, clear CD

              movl    %cr0, %eax

              andl    $~(CR0_CD | CR0_NW), %eax

              movl    %eax, %cr0




      Vaibhav Jain

        • 1. Re: Problem in understanding caching on SCC

          Hi Vaibhav,


          interesting experiment. Two things came to my mind while reading your post:


          1) You are clearing CR0.CD without performing a cache flush afterward. As I understand the CD bit, it does nothing more than disable cache-line fills (in case the target cacheline of a cachable read operation is not already in the cache). So you would need to flush both the L1 and L2 caches completely, after CR0.CD has been set, or you risk of working on potentially-stale (cached) data. Of course, if you read a cacheline that was not previously cached, it won't be cached afterward; but if a cacheline was still in the cache when you set the bit, it will never be flushed, because there is no other cacheline to replace it (at least in L2; you can force it out of L1 via WBINVD).


          2) CR0.CD influences all memory operations, not just data accesses; therefore, setting this bit also disables caching of executable code, like the one of your test loop. I don't know what type of bus accesses the P54C uses to read code, though, so I'm not sure whether this can explain your measurements.


          If I wanted to verify this theory, I would probably try to count the number of assembly code bytes forming your test loop, then try to calculate how much time it takes to transfer over the mesh.




          1 of 1 people found this helpful
          • 2. Re: Problem in understanding caching on SCC

            Hi Jane-Arne,


            Thanks a lot for the insight!. I ran the code the again with WBINVD just after the cache enable/disable but it did not  make any difference. I don't think the shared memory is already in the cache before I execute the loop. I think the caching of code could make a difference. I will try to confirm it as you have suggested. However I would like to ask two things

            - Is there a way a to invalidate the L2 cache also.?

            - Can I somehow verify if the uncacheable shared memory is actually not getting cached ? I checked the PTEs and they have the CD bit set but I want to be sure.



            Vaibhav Jain

            • 3. Re: Problem in understanding caching on SCC

              Hi Vaibhav,


              the L2 cache cannot be flushed directly. The only means known to do that is to construct a certain series of memory reads, such that the memory operations seen on the processor's front side (i.e., after traversing through L1) cause the target cachelines to be evicted from L2. This is extremely hard; if you are interested, you can find the discussion here: http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=195


              If you want to make sure that certain data never appears in L2, you could also set the MPBT caching type. Make sure support for it is enabled in CR4, then set the MPBT (PS) bit in the page table; data marked this way will be cached in L1 only (and writes accumulated in the external WCB), but never cause L2 fills. Due to the WCB, it is recommended to use PWT=1 with such mappings.


              Sorry, but I think there is no means to query the contents of the L1 cache. If you absolutely wanted to do this, it could only be done indirectly, by observing which memory addresses the processor accesses. You could try using sccKit's SoftRAM feature for that (run your code from the SoftRAM range, so you see instruction fetch operations), but I have never done that myself and don't even know whether it still works with the current version.