What do you mean by:
I found the duration of the first event occupies up to 90% of the total elapsed TSC, and meanwhile the second takes about 50% of the total TSC.
How did you measure how much time each event takes?
My interpretation of the events would be:
- Delays due to fetching from the memory (data not in the caches)
- Delays due to flushing the data from the write buffer to the memory
These two events type is "duration". Does it mean the counter value is the cycle # that CPU spent on respective event?
If yes, then my question is why how the CPU spent 90% of its time stalled waiting for data read while another 50% of its time stalled waiting for flush write buffer? That sound weird to me.
Sorry if these are stupid question, as I had little prior experience with performance counters.
1 of 1 people found this helpful
Without any confidence on my answer, the manual says:
Number of clocks stalled due to full write buffers (011001):
This event counts the number of clocks that the internal pipeline is
stalled due to full write buffers. Full write buffers stall data memory
read misses, data memory write misses, and data memory write hits to
S state lines. Stalls on I/O accesses are not included.
Pipeline stalled waiting for data memory read (011010):
Data TLB Miss processing is also included. The pipeline stalls while a
data memory read is in progress including
which may mean that some of the duration spend on the former event (the data memory read misses) is also included on the latter event. Does this make sense?
Maybe. But I'm still confused. Suppose the CPU has already stalled on memory read, then is it possible that it's also stalled on full write buffer at the same time? I mean, hey, you are already stalled, how can you still keep issue some write which make you "double" stalled?
Again, without being 100% sure, the same manual mentions:
The Pentium processor utilizes write buffers for memory operands and for each pipeline.
Write buffers improve performance by allowing the processor to proceed with the next pair
of instructions even though one of the current instructions writes to memory when the bus is
busy. The write buffers can be filled in parallel when instructions in both pipes write to
memory during the same clock; however, they are always emptied in the same sequence in
which the write requests were generated by software.
In general, the existence of these buffers is transparent to programmers. The Pentium
processor ensures that memory read operations are never reordered ahead of prior pending
write operations; however, for compatibility with future processors, programmers should
follow the ordering guidelines presented in Chapter 19.
which possibly entails that a fetch from memory (e.g., in the other pipeline) will be "blocked" and ordered after a write buffer flush in order to keep the memory consistency. (see https://en.wikipedia.org/wiki/Superscalar)
Yeah, this does make sense! Many thanks!