1 Reply Latest reply on Jun 1, 2016 10:24 AM by Intel Corporation

    Does the PDE_CACHE_MISS performance counter count misses in the PDPTE & PML4 cache as well?


      Hi, I was conducting some experiments on my 4770 Haswell processor, and I was using linux perf stat tool to count the TLB miss and TLB page walker cache misses.

      I got some confusing results and decided to ask this question.

      (If the question belongs someplace else, please notify me, I'll move it to wherever it is appropriate)


      Before I get into the details of my problem, the Paging_structure Caches detailed in 4.10.3 of SDM Vol3

      provides details that the caching structure is arranged into three levels, the PML4 cache, PDPTE cache and PDE cache. (For the Top 3 levels of the 4 level page walk).


      Now when I counted the performance counter events for the two following events I got an interesting result


      The first event counts

      Misses in all TLB levels that cause a page walk of any page size.

      Thus it counts all misses in the L2 TLB that causes a HW page walk, which will check the page caching structures (or the PML4, PDPTE, PDE caches)

      The second event counts

      DTLB demand load misses with low part of linear-to- physical address translation missed

      However, to my surprise, for some workloads, I get up to 1.7x PDE_CACHE_MISSES compared to MISSES_CAUSES_A_WALK

      Now my question: Does the PDE_CACHE_MISSES count misses in the PDPTE & PML4 caches as well?

      If the answer to my question is yes, then that would explain why I'm getting more misses in the PDE cache compared to the number of times it was supposedly accessed by a MISSES_CAUSES_A_WALK