Hi, I'm looking for some cache throughput information for the Nehalem cpu.
I know what the cache latency values are. Like L1$ is 4 cpu clocks, L2$ is ~10 cpu clocks, and L3$ is 35-40 cpu clocks.
But does anyone know more about the L2$ and L3$ throughput?
For instance, L1$ can feed the SSE registers 128-bits every cpu clock cycle.
How about for the L2$ feeding the L1$? I know that the L2$ feeds the L1$ 64-bytes at a time. But how many cpu clock cycles are between
each sucessive cacheline feed? Is it 2 cpu clocks?
Then what about the L3$ feeding the L2$? About how many cpu clocks are between each sucessive cacheline feed? (I have no idea here)
Thanks for any help or pointers!
Retrieving data ...