1 of 1 people found this helpful
interesting experiment. Two things came to my mind while reading your post:
1) You are clearing CR0.CD without performing a cache flush afterward. As I understand the CD bit, it does nothing more than disable cache-line fills (in case the target cacheline of a cachable read operation is not already in the cache). So you would need to flush both the L1 and L2 caches completely, after CR0.CD has been set, or you risk of working on potentially-stale (cached) data. Of course, if you read a cacheline that was not previously cached, it won't be cached afterward; but if a cacheline was still in the cache when you set the bit, it will never be flushed, because there is no other cacheline to replace it (at least in L2; you can force it out of L1 via WBINVD).
2) CR0.CD influences all memory operations, not just data accesses; therefore, setting this bit also disables caching of executable code, like the one of your test loop. I don't know what type of bus accesses the P54C uses to read code, though, so I'm not sure whether this can explain your measurements.
If I wanted to verify this theory, I would probably try to count the number of assembly code bytes forming your test loop, then try to calculate how much time it takes to transfer over the mesh.
Thanks a lot for the insight!. I ran the code the again with WBINVD just after the cache enable/disable but it did not make any difference. I don't think the shared memory is already in the cache before I execute the loop. I think the caching of code could make a difference. I will try to confirm it as you have suggested. However I would like to ask two things
- Is there a way a to invalidate the L2 cache also.?
- Can I somehow verify if the uncacheable shared memory is actually not getting cached ? I checked the PTEs and they have the CD bit set but I want to be sure.
the L2 cache cannot be flushed directly. The only means known to do that is to construct a certain series of memory reads, such that the memory operations seen on the processor's front side (i.e., after traversing through L1) cause the target cachelines to be evicted from L2. This is extremely hard; if you are interested, you can find the discussion here: http://marcbug.scc-dc.com/bugzilla3/show_bug.cgi?id=195
If you want to make sure that certain data never appears in L2, you could also set the MPBT caching type. Make sure support for it is enabled in CR4, then set the MPBT (PS) bit in the page table; data marked this way will be cached in L1 only (and writes accumulated in the external WCB), but never cause L2 fills. Due to the WCB, it is recommended to use PWT=1 with such mappings.
Sorry, but I think there is no means to query the contents of the L1 cache. If you absolutely wanted to do this, it could only be done indirectly, by observing which memory addresses the processor accesses. You could try using sccKit's SoftRAM feature for that (run your code from the SoftRAM range, so you see instruction fetch operations), but I have never done that myself and don't even know whether it still works with the current version.