-
1. Re: Benchmarking algorithms on Intel Xeon Gold (DevCloud)
Apr 1, 2018 11:59 PM (in response to akhauri.yash)This message was posted on behalf of Intel CorporationHi Yash,
Thanks for contacting us. We are looking into it and will get back shortly.
Rishabh -
2. Re: Benchmarking algorithms on Intel Xeon Gold (DevCloud)
Apr 5, 2018 2:51 AM (in response to Intel Corporation)This message was posted on behalf of Intel CorporationHi Yash,
On which server you are running your code i.e. on Xeon Phi (knl) or Xeon Gold (skl) or some other box?
Thanks,
Rishabh Kumar Jain -
3. Re: Benchmarking algorithms on Intel Xeon Gold (DevCloud)
akhauri.yash Apr 5, 2018 3:28 AM (in response to Intel Corporation)Hello, I've run the code on both KNL and SKL. I am almost certain that this issue is simply due to the 'warm-up' time and the cache misses that occur on the first upload of data.
If you think there might be another source to this issue, do let me know!
Thanks and Regards,
-
4. Re: Benchmarking algorithms on Intel Xeon Gold (DevCloud)
Apr 16, 2018 5:22 AM (in response to akhauri.yash)This message was posted on behalf of Intel CorporationHi YAsh,
Considering mmatest1.c:
1) I ran the exact code and found that first two loops have more overhead. After that time taken is almost similar. Hence we should give a warm up kernel.
MKL:
MKL - Completed 1 in: 0.2302730 seconds
MKL - Completed 2 in: 0.0001534 seconds
MKL - Completed 3 in: 0.0001267 seconds
MKL - Completed 4 in: 0.0001275 seconds
..................
MKL - Completed 15 in: 0.0001280 seconds
MKL - Completed 16 in: 0.0001347 seconds
CMMA:
CMMA - Completed 1 in: 0.0504993 seconds
CMMA - Completed 2 in: 0.0003169 seconds
CMMA - Completed 3 in: 0.0001666 seconds
CMMA - Completed 4 in: 0.0001687 seconds
................
CMMA - Completed 15 in: 0.0001638 seconds
CMMA - Completed 16 in: 0.0001636 seconds
2) Cache misses should contribute if we are using large matrix size.
For further reference, please go through: https://software.intel.com/en-us/ipcc
Thanks,
Rishabh Kumar Jain -
5. Re: Benchmarking algorithms on Intel Xeon Gold (DevCloud)
akhauri.yash Apr 16, 2018 10:32 PM (in response to Intel Corporation)Hello,
Thank you!
However, in most cases the kernels will always have a new matrix to operate on, and i understand that most of the over head in the first operation is simply due to cache misses.
These cache misses will happen everytime a new matrix is provided.
So do you think the first result should be included?
Is the overhead primarily due to the cache misses or the warm up time?
If it is indeed cache misses, how can i work on that? I thought its always accessed in a row-major format and thus cache misses would be avoided if i accessed it in the same format.