    What is the L2$ and L3$ cache throughput for Nehalem?


      Hi, I'm looking for some cache throughput information for the Nehalem cpu.


      I know what the cache latency values are.  Like L1$ is 4 cpu clocks, L2$ is ~10 cpu clocks, and L3$ is 35-40 cpu clocks.


      But does anyone know more about the L2$ and L3$ throughput?


      For instance, L1$ can feed the SSE registers 128-bits every cpu clock cycle.


      How about for the L2$ feeding the L1$?  I know that the L2$ feeds the L1$ 64-bytes at a time.  But how many cpu clock cycles are between

      each sucessive cacheline feed?  Is it 2 cpu clocks?


      Then what about the L3$ feeding the L2$?  About how many cpu clocks are between each sucessive cacheline feed?  (I have no idea here)


      Thanks for any help or pointers!