This content has been marked as final. Show 2 replies
The core 2 duo has HW performance counters that measure L2_Rqsts_Self_Prefetch_MES and L2_Rqsts_Self_Prefetch_I. My understanding is that L2_Rqsts_Self_Prefetch_I measures the following event: a HW prefetch request has been issued for a line that is not in L2 and hence needs to be brought in from memory (an L2 miss). Is this correct? If this is correct, then I assume, L2_Rqsts_Self_Prefetch_MES measures HW prefetch requests that hit in L2. So, now my question is, is there a cost for HW prefetch requests that hit in L2? And if so how much? It seems to me that any HW prefetch request that hits in L2 is redundant (because the subsequent access to the line would be a hit, regardless). So, we could probably save a few cycles if we can detect this situation. The savings would be really big if there were situations where the requested line is brought in from memory even if it is an L2 hit (i.e., if the L2 check is done after the prefetch reqeust has been sent to memory). Does that ever happen?