6 Replies Latest reply on Nov 22, 2012 6:52 AM by jhshi

    Interpreting Performance Counters

    jhshi

      Hi,

       

      Recently, I've been working on performance counters on SCC. The document I follow is P54C_Architecture_And_Programming_Vol3, Section 26. I've managed to get some performance counter values listed in Table 26-4. But I'm not sure how to interpret the meaning of certain performance events.

       

      More specifically, I'm interested in these two events:

      - 011010 "Pipeline stall waiting for data memory read"

      - 011001 "Number of clocks stalled due to full write buffers"

       

      Intuitively, I thought the first one is to measure how many cycles the core has to wait for data reads, and similarly the second measures how many cycles the core has to wait for data writes. But in some experiment. I found the duration of the first event occupies up to 90% of the total elapsed TSC, and meanwhile the second takes about 50% of the total TSC.

       

      I'm confused. The CPU couldn't have spent over 140% (90%+50%) of its time waiting for memory R/W, right? So what on earth do the two event type mean?

       

      Thanks in advance.

       

      Message was edited by: Jinghao Shi

        • 1. Re: Interpreting Performance Counters
          saibbot

          Hi,

           

          What do you mean by:

          I found the duration of the first event occupies up to 90% of the total elapsed TSC, and meanwhile the second takes about 50% of the total TSC.

           

          ?

           

          How did you measure how much time each event takes?

           

          My interpretation of the events would be:

          1. Delays due to fetching from the memory (data not in the caches)
          2. Delays due to flushing the data from the write buffer to the memory

           

          Vasilis.

          • 2. Re: Interpreting Performance Counters
            jhshi

            These two events type is "duration". Does it mean the counter value is the cycle # that CPU spent on respective event?

             

            If yes, then my question is why how the CPU spent 90% of its time stalled waiting for data read while another 50% of its time stalled waiting for flush write buffer? That sound weird to me.

             

            Sorry if these are stupid question, as I had little prior experience with performance counters.

            • 3. Re: Interpreting Performance Counters
              saibbot

              Without any confidence on my answer, the manual says:

              Number of clocks stalled due to full write buffers (011001):

              This event counts the number of clocks that the internal pipeline is

              stalled due to full write buffers. Full write buffers stall data memory

              read misses, data memory write misses, and data memory write hits to

              S state lines. Stalls on I/O accesses are not included.

              Pipeline stalled waiting for data memory read (011010):

              Data TLB Miss processing is also included. The pipeline stalls while a

              data memory read is in progress including

               

               

              which may mean that some of the duration spend on the former event (the data memory read misses) is also included on the latter event. Does this make sense?

              1 of 1 people found this helpful
              • 4. Re: Interpreting Performance Counters
                jhshi

                Maybe. But I'm still confused. Suppose the CPU has already stalled on memory read, then is it possible that it's also stalled on full write buffer at the same time? I mean, hey, you are already stalled, how can you still keep issue some write which make you "double" stalled?

                • 5. Re: Interpreting Performance Counters
                  saibbot

                  Again, without being 100% sure, the same manual mentions:

                  The Pentium processor utilizes write buffers for memory operands and for each pipeline.

                  Write buffers improve performance by allowing the processor to proceed with the next pair

                  of instructions even though one of the current instructions writes to memory when the bus is

                  busy. The write buffers can be filled in parallel when instructions in both pipes write to

                  memory during the same clock; however, they are always emptied in the same sequence in

                  which the write requests were generated by software.

                  In general, the existence of these buffers is transparent to programmers. The Pentium

                  processor ensures that memory read operations are never reordered ahead of prior pending

                  write operations; however, for compatibility with future processors, programmers should

                  follow the ordering guidelines presented in Chapter 19.

                   

                   

                  which possibly entails that a fetch from memory (e.g., in the other pipeline) will be "blocked" and ordered after a write buffer flush in order to keep the memory consistency. (see https://en.wikipedia.org/wiki/Superscalar)

                  • 6. Re: Interpreting Performance Counters
                    jhshi

                    Yeah, this does make sense! Many thanks!