Hi, I was conducting some experiments on my 4770 Haswell processor, and I was using linux perf stat tool to count the TLB miss and TLB page walker cache misses.
I got some confusing results and decided to ask this question.
(If the question belongs someplace else, please notify me, I'll move it to wherever it is appropriate)
Before I get into the details of my problem, the Paging_structure Caches detailed in 4.10.3 of SDM Vol3
provides details that the caching structure is arranged into three levels, the PML4 cache, PDPTE cache and PDE cache. (For the Top 3 levels of the 4 level page walk).
Now when I counted the performance counter events for the two following events I got an interesting result
The first event counts
Misses in all TLB levels that cause a page walk of any page size.
Thus it counts all misses in the L2 TLB that causes a HW page walk, which will check the page caching structures (or the PML4, PDPTE, PDE caches)
The second event counts
DTLB demand load misses with low part of linear-to- physical address translation missed
However, to my surprise, for some workloads, I get up to 1.7x PDE_CACHE_MISSES compared to MISSES_CAUSES_A_WALK
Now my question: Does the PDE_CACHE_MISSES count misses in the PDPTE & PML4 caches as well?
If the answer to my question is yes, then that would explain why I'm getting more misses in the PDE cache compared to the number of times it was supposedly accessed by a MISSES_CAUSES_A_WALK
Intel provides limited support for Linux, so the best place to get more feedback in regard to your investigation will be Forums | Linux.com | The source for Linux information, this is the right place to post this and I'm pretty sure than other users will join your post.