9 Replies Latest reply on Dec 12, 2012 3:43 AM by JanArneSobania

    combine L2 cache with Write-Through and/or the WCB?

    rrotta

      Hi, just a short question:

       

      Does the SCC hardware support memory access with caching in the L2 cache and write-through mode? Would that use the Write Combine Buffer to collect writes on the same lines?

       

      Randolf

        • 1. Re: combine L2 cache with Write-Through and/or the WCB?
          jhshi

          From the SCC EAS document (v1.1, p10),

          The L2 cache is 4-way set associative with a pseudo-LRU replacement policy. It is write-back only. It is not write-allocate.

          So I think L2 cache can not be configured as write through.

          • 2. Re: combine L2 cache with Write-Through and/or the WCB?
            rrotta

            Well, the EAS Ver 1.1. describes the Page Table Entries on page 31 and there is the possibility to select PCD=0 and PWT=1, which should be cached mode with write-through. I wonder how the L1 and L2 interact, because on a L1 cache hit, the write operation will update the data in the L1 cache without marking the line as dirty. Is the line updated in the L2 cache as well?

            • 3. Re: combine L2 cache with Write-Through and/or the WCB?
              JanArneSobania

              As I understand it, the L2 cache cannot be used with either the write-combining buffer or write-through policy. The PCD and PWT bits only effect the processor core directly; PWT is ignored by the external (L2) cache, exactly like the the special bus cycles for (WB)INVD. In contrast, the WCB is activated by PMB, but the processor core never performs cycles with both PCD=0 and PMB=1. If PMB=1, then PCD=1 on the bus.

               

              Here is what I think happens with each of the three control signals. First, assume PMB=0:

               

              If PCD=0, cache-line fill is enabled. On read-misses, the cache will request the corresponding cacheline from memory. On L1 cache miss, the read is visible on the external bus; it will go to L2, trigger an L2 cacheline fill if L2 is enabled, or go to the MIU otherwise. When the read completes, the cacheline is present in both L1 and L2.

               

              Write misses never trigger a cacheline fill, even with PCD=0. If a write miss occurs, the access is forwarded to the next level.

               

              If PCD=1, cache-line fill is disabled. That is, read-misses will not lead the cache to request the cacheline from memory; the operation will go to external logic. Read-hits will be satisfied from cache, as will write-hits (notwithstanding processing for PWT=1). On SCC, I think this works for both the L1 and L2 cache. If the line is absent from L1, but included in L2, L2 will satisfy the read without a mesh packet being sent. If the line is absent from both L1 and L2, the read will be satisfied from memory and neither cache updated.

               

              If PWT=1, the situation is more complicated. First, if a cache miss occurs, there is no difference between this case and PWT=0: the write goes to the next level. Now suppose a write-hit occurs in L1 with PWT=1: L1 will be updated, but the write will still be visible on the bus. If the cacheline is contained in L2, it is updated in L2 without going to the MIU (L2 ignores PWT=1). If the cacheline is not contained in L2, the write goes to the MIU unchanged.

               

              Now, for PMB=1, the situation changes as follows:

               

              If an L1 cacheline fill occurs for PMB=1 (i.e., a read-miss in L1 with PCD=0), the newly-filled cacheline is tagged MPBT. This allows the special CL1INVMB instruction to recognize it later.

               

              If a memory access occurs for PMB=1 and is visible on the bus, the core signals PCD=1 to external logic (irrespective of the state of that bit in the page-table). Such mappings therefore never trigger an L2 cacheline fill; however, they can still be satisfied from L2 if the cacheline is present! Therefore, it is extremely important that you never map the same physical address via different page table entries with different caching attributes; although that's true on almost all IA processors, I think. By the way: the Windows kernel would bugcheck (BSOD) if you tried this, but Linux is more lenient and would simply do as you tell it to, including all resulting malfunctions .

               

              Finally, if a memory access with PMB=1 is visible behind the L2 cache, it goes to the WCB. I know the WCB has support for partial writes (especially non-continuous writes to the same cacheline), but I do not know what happens if a read appears for the same cacheline that is partially contained in the WCB.

               

              I wonder how the L1 and L2 interact, because on a L1 cache hit, the write operation will update the data in the L1 cache without marking the line as dirty. Is the line updated in the L2 cache as well?

              I think with PWT=1, the write is always visible on the bus and goes to L2 for non-MPBT mappings. For MPBT mappings with PWT=1, both L1 and the WCB are updated.

              1 of 1 people found this helpful
              • 4. Re: combine L2 cache with Write-Through and/or the WCB?
                rrotta

                Many thanks, Arne. That sounds reasonable, but disappointing.

                • 5. Re: combine L2 cache with Write-Through and/or the WCB?
                  jhshi

                  Quite thorough explanation! I have two concerns, though.

                  1. Suppose PMB=1, on a $L1 read miss, and this cache line happens to reside in $L2, will the $L1 miss be satisfied using that cache line in $L2? Or the request goes to MIU and then underlying hierarchy?
                    I didn't do the actual test but since one of MPBT tag's functionality is to bypass $L2, reasonably I would think it's the later case. Can somebody clarify this?
                  2. WCB is only for write. It won't be even looked during a read miss. See this thread for more details.







                  • 6. Re: combine L2 cache with Write-Through and/or the WCB?
                    mwvantol

                    I agree to Jan Arne's analysis, this is also what I recall from our experiments with the memory system. The only thing that I would like to add is where the WCB comes in, I think this aggregates the writes before they go to the L1 cache. Reads to these addresses will return you stale values when the updated value is still hanging in the WCB. Only writing to a different cacheline address will flush the WCB, and this other write needs to be tagged MPBT as well if I remember correctly. It wouldnt flush on other non-MPBT writes!

                     

                    Hmm, I miss playing with the SCC

                    • 7. Re: combine L2 cache with Write-Through and/or the WCB?
                      JanArneSobania

                      1. Suppose PMB=1, on a $L1 read miss, and this cache line happens to reside in $L2, will the $L1 miss be satisfied using that cache line in $L2? Or the request goes to MIU and then underlying hierarchy?
                        I didn't do the actual test but since one of MPBT tag's functionality is to bypass $L2, reasonably I would think it's the later case. Can somebody clarify this?

                      I do not know for sure. I would think that the L2$ controller is not connected to the PMB line at all, so it would not notice that the access is special, and just send the cached data back. As I understand it, for MPBT-tagged accesses to "bypass" L2$, it is sufficient that the core simply sets PCD=1; there is no immediate need for the L2$ to even look at PMB, because with properly set page tables, this situation can never happen.

                       

                      WCB is only for write. It won't be even looked during a read miss. See this thread for more details.is special, and ju

                      Thank you for that link, it really helps to clear the situation. Seems I was misguided by the WCB implementation in standard x86 processors.

                       

                      I agree to Jan Arne's analysis, this is also what I recall from our experiments with the memory system. The only thing that I would like to add is where the WCB comes in, I think this aggregates the writes before they go to the L1 cache. Reads to these addresses will return you stale values when the updated value is still hanging in the WCB. Only writing to a different cacheline address will flush the WCB, and this other write needs to be tagged MPBT as well if I remember correctly. It wouldnt flush on other non-MPBT writes!

                       

                      Hmm, I miss playing with the SCC

                      Hmm, are you sure that having the WCB before the L1$ would be a good idea? That way, when flushing the WCB, the changed cacheline may end up just being written to L1, with no means to deterministically make it to DRAM without a WBINVD.

                       

                      In my opinion, the WCB was introduced to work around the no-allocate-on-write strategy of the P54C's L1$. If the cacheline happens to be in L1, all writes will go to L1, as no additional WCB is needed. However, if it is missing in L1, each single byte written would be sent over the mesh without a WCB. Besides, I think I remember speaking to Werner Haas at the Ettlingen MARC Symposium, and he mentioned that the WCB was even behind the L2$.

                      • 8. Re: combine L2 cache with Write-Through and/or the WCB?
                        jhshi

                        I found a detailed version of SCC's cache write workflow in the manual of Intel's SMC (p.20). One thing I notice is that even on a $L1 write hit, if PWT and PMB is set, then this write will still go into WCB. So in this sense, WCB is "beneath" $L1 in the hierarchy.

                        • 9. Re: combine L2 cache with Write-Through and/or the WCB?
                          JanArneSobania

                          I found a detailed version of SCC's cache write workflow in the manual of Intel's SMC (p.20). One thing I notice is that even on a $L1 write hit, if PWT and PMB is set, then this write will still go into WCB. So in this sense, WCB is "beneath" $L1 in the hierarchy.

                          Okay, thanks, that is good to know. So the L2$ really ignores all bus cycles with PMB=1. I think Ted posted a similar picture some time ago, but I was under the impression that parts of the behavior were still unclear back then. Glad that's no longer the case and we finally knew what happens .

                           

                          The behavior for PWT=1 is standard for P54C, though (and, infact, x86). In case of a write hit with PWT=1, the cacheline is updated and the write is still visible to the next-level unit. That's why its called "write-through": the write goes through the cache to other units, independent of whether the cacheline is already in the cache (on P54C: it is visible on the external bus interface, which connects to the L2$ controller on the SCC).

                           

                          However, I would still recommend not to rely on this particular interaction between the PMB signal and L2. Mapping the same physical page with different caching attributes can lead to odd cache states, and trying to debug such issues will definitely cost very much time; much more than if it had been avoided right from the start .