Home > Intel Communities > Support Community > Processors > Discussions

This Question is Possibly Answered

1 "correct" answer available (4 pts) 2 "helpful" answers available (2 pts)
3 Replies Last post: 2009/11/07 5:06 by Ytterbium  
milos 3 posts since
2009/11/03
 
Currently Being Moderated

2009/11/04 13:19

Why discrepancy in theoretical vs STREAM-measured Nehalem memory bandwidths?

Hello everyone,

 

My apologies for the cross-post - I think this is the relevant forum, I had accidentally asked a similar question in the "open port IT" forums..

 

I have two questions about theoretical vs actual memory bandwidth performance of Nehalem processors.  I would very much appreciate the guidance of someone who knows the architecture.

 

1) I am wondering why the theoretical memory interface bandwidth on Nehalem processors is not close to what the STREAM benchmark gives (which is designed to give indealized streaming data)?

 

With a memory interface that has 3 channels running DDR3-1333 at 1333 MHz, I understand that the theoretical memory bandwidth should be:

     BW_theoretical = 1333 megatransfers per second (including DDR) * 3 channels * 64 bit wide bus / (8 bits/byte) = 32,000 MB/s ~ 32GB/s per socket

The above is also what I see online http://ark.intel.com/Product.aspx?id=37106, and even on some non-Intel sites so it sounds right.

 

On the other hand, it is commonly reported that Nehalem-based Xeon 5500 series processors (X5550, E5520, etc) get about 15-17GB/s per socket, or up to about 37GB/s in a dual-socket configuration (e.g. http://www.advancedclustering.com/company-blog/stream-benchmarking.html).

 

Why the difference?  What determines the memory bandwidth when you have an idealized situation where each CPU has the data in its local memory, and it is streaming a continuous data stream from or to memory.

 

2) Does the memory bandwidth depend on the CPU clock frequency (e.g. 2.66 vs 2.93 vs 3.33 GHz), and if so, why when the memory bus is 1333MHz fixed always?  (it appears that it does, but I wasn't able to find a nice test that shows the dependence clearly).  And, is there a simple way one can calculate mem bw vs. cpu speed (like I did above without CPU accounted), as a result?

 

Thank you for your help!

 

Milos Popovic

 

Average User Rating
(0 ratings)




Ytterbium   92 posts since
2009/10/15
Currently Being Moderated
1. 2009/11/06 17:02 in response to: milos
Re: Why discrepancy in theoretical vs STREAM-measured Nehalem memory bandwidths?

1) I think 32GB/s is a theoretical maximium, I think in reality there is other communications on the bus that eat into the amount of data you see in reality.  If you think about running STREAM on windows there is a ton of other stuff going on at the sametime that interfering with the system delivering the theroy results.

 

2) The low end chips have 800Mhz ram bus, mid's have 1066 and high end 1333, so I assume you see that in the benchmarks?

Ytterbium   92 posts since
2009/10/15
Currently Being Moderated
3. 2009/11/07 5:06 in response to: milos
Re: Why discrepancy in theoretical vs STREAM-measured Nehalem memory bandwidths?

I think you'd be best to jump on a blog post from someone at Intel and see if you can get an answer.

More Like This

  • Retrieving data ...