5 Replies Latest reply on Apr 16, 2018 10:32 PM by akhauri.yash

    Benchmarking algorithms on Intel Xeon Gold (DevCloud)

    akhauri.yash

      This post is regarding benchmarking algorithms on the Intel Xeon processors.

      Performance of Classic Matrix Multiplication Algorithm on Intel® Xeon Phi™ Processor System | Intel® Software

      I have been attempting to reproduce the benchmarks as provided in the code from the article above. Specifically mmatest1.c from the zip file attached in the article. One observation I have is that there is a considerable warm-up time which leads to big overhead on the first algorithm being benchmarked. (In this case, the cblas_sgemm function.)

      16 loop counts are often not enough to offset the thread 'warm-up' time. I am not sure what the correct terminology for this would be.

      Can anyone confirm this? When benchmarking, is it better to give a 'warm-up' kernel to the threads?

      Where can i read up more on this?

      To review my code, kindly refer to: GitHub - akhauriyash/XNOR-Nets: An OpenMP parallelized implementation of XNOR kernels.