Processors
Intel® Processors, Tools, and Utilities
14506 Discussions

Xeon Gold 5115 bad linpack performance??

idata
Employee
4,056 Views

Hello together,

We have just a question to our new Dell PowerEdge R640 servers with 2x Intel Xeon Gold 5115 CPUs: The linpack performance is bad compared to our older R630 servers with Xeon E5-2640 v4 CPUs.

We were using l_mklb_p_2018.1.009 for benchmarking:http://registrationcenter-download.intel.com/akdlm/irc_nas/9752/l_mklb_p_2018.1.009.tgz http://registrationcenter-download.intel.com/akdlm/irc_nas/9752/l_mklb_p_2018.1.009.tgz

2x Xeon E5-2640 v4 result: 705 GFLOPS

2x Xeon Gold 5115 result: 480 GFLOS

Any ideas? We are very amazed that the old Xeon 2640 v4 performs much better???

Test system

Hypervisor: vSphere 6.5

Guest system: Fedora 27 with Kernel 4.14

CPU information:

R640:

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 20

On-line CPU(s) list: 0-19

Thread(s) per core: 1

Core(s) per socket: 10

Socket(s): 2

NUMA node(s): 2

Vendor ID: GenuineIntel

CPU family: 6

Model: 85

Model name: Intel(R) Xeon(R) Gold 5115 CPU @ 2.40GHz

Stepping: 4

CPU MHz: 2394.375

BogoMIPS: 4788.75

Hypervisor vendor: VMware

Virtualization type: full

L1d cache: 32K

L1i cache: 32K

L2 cache: 1024K

L3 cache: 14080K

NUMA node0 CPU(s): 0-9

NUMA node1 CPU(s): 10-19

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti retpoline rsb_ctxsw fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xsaves arat pku ospke

R630:

Architecture: x86_64

CPU op-mode(s): 32-bit, 64-bit

Byte Order: Little Endian

CPU(s): 20

On-line CPU(s) list: 0-19

Thread(s) per core: 1

Core(s) per socket: 10

Socket(s): 2

NUMA node(s): 2

Vendor ID: GenuineIntel

CPU family: 6

Model: 79

Model name: Intel(R) Xeon(R) CPU E5-2640 v4 @ 2.40GHz

Stepping: 1

CPU MHz: 2399.998

BogoMIPS: 4799.99

Hypervisor vendor: VMware

Virtualization type: full

L1d cache: 32K

L1i cache: 32K

L2 cache: 256K

L3 cache: 25600K

NUMA node0 CPU(s): 0-9

NUMA node1 CPU(s): 10-19

Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts nopl xtopology tsc_reliable nonstop_tsc cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch cpuid_fault invpcid_single pti retpoline fsgsbase tsc_adjust bmi1 hle avx2 smep bmi2 invpcid rtm rdseed adx smap xsaveopt arat

0 Kudos
10 Replies
idata
Employee
1,859 Views

Hello Chris9300,

 

 

I understand that you are having problems with the unit as it shows that the performance is lower than an older product, I apologize for any inconvenience.

 

 

Regarding this issue, we cannot guarantee that a benchmark test was accurate or not due to the fact that it was not done with any Intel® Software. The performance however, can be affected not only by the processor but by other components as well, that is why in this case I would recommend you to contact Dell so that they can test the performance in one of their systems if possible.

 

 

Normally, the Intel® Xeon Gold should actually be faster and perform better than the older model you have so that is the reason why I recommend you to contact Dell first so they can check where the performance "bottleneck" is located.

 

 

 

Regards,

 

David V
0 Kudos
idata
Employee
1,859 Views

Hello,

Thanks for your quick reply. But we have multiple Dell R640 servers with 2x Xeon 5115 CPUs and we saw the performance issues on all systems... It seems that the Dell support can't help us. They can only help us for example to replace a broken processor. But it's unlikely that all CPUs on multiple servers are broken.

Maybe Intel can help us, because you should be able to reproduce is? We are using the official Intel Linpack Benchmark. See: https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite Intel® Math Kernel Library Benchmarks (Intel® MKL Benchmarks) | Intel® Software

@Intel

Are you able to provide us benchmark results with your Intel MKL benchmark on dual Xeon 5115 cpus? Then we can compare our results maybe we can find the issue.

Thank you.

0 Kudos
idata
Employee
1,859 Views

Hello Chris9300,

 

 

Thank you for your response,

 

 

I understand your position, however, I require more information about the systems that you are using. I would like to know the type of tasks that the servers are doing as well as to know if both of them are doing the same kind of tasks. This way we can have a little more insight about the problem in question.

 

 

 

Regards,

 

David V
0 Kudos
idata
Employee
1,859 Views

Hello Chris9300

 

 

I was checking your case and would like to know if you need further help. If so, please do not hesitate in replying back.

 

 

Regards,

 

Leonardo C.

 

0 Kudos
idata
Employee
1,859 Views

Hello,

we have multiple R640 servers with dual xeon 5115 cpus... But on every server the performance is the same... It's really low compared to Xeon E5-2640v4...

The tasks of the server is nothing for benchmarking... It's a clean installation just to verify the linpack benchmarking performance. For example just a RHEL installation without anything only a small linux to check the linpack performance... What should we do?

 

Thanks.
0 Kudos
idata
Employee
1,859 Views

Hello Chris9300,

 

 

Thank you for your response,

 

 

I was running a comparison between both of the units in question and I was able to check that the performance and the frequency of the E5-2640 v4 is higher than the Gold 5115 as well as the Cache. Up to this point that could be a reason as to why the performance shows better on that one than the Gold 5115. Here is a comparison of both products:

 

 

https://ark.intel.com/compare/120484,92984

 

 

 

Regards,

 

David V
0 Kudos
idata
Employee
1,859 Views

David,

This doesn't help us. The latest Xeon gold processor have AVX512. This should give us much better performance compared to the older E5 2640-v4.

We got only 470 GFLOPS with 2x Xeon Gold 5115. I know another company with Fujitsu servers and 2x Xeon Gold and they got over 800 GFLOPS with 2x Xeon Gold 5115 CPUs.... We bought those R640 servers because Dell and Intel said AVX512 is really great and now the performance is really bad compared to a 2 years gold Xeon E5 2640 v4 generation...

David, maybe you can give us benchmark results from Intel with https://software.intel.com/en-us/articles/intel-mkl-benchmarks-suite Intel® Math Kernel Library Benchmarks (Intel® MKL Benchmarks) | Intel® Software. I think we should get over 800 GFLOPS... But we got only 470 GFLOPS This is really very low for 20 Xeon 5115 AVX512 cores...

Chris

0 Kudos
idata
Employee
1,859 Views

Hello Chris9300,

 

 

Thank you for your response,

 

 

I will proceed to escalate this case to see if there is any other route we may be able to take, I will be getting back to you with the information as soon as possible.

 

 

 

Regards,

 

David V
0 Kudos
mxiao1
Beginner
1,859 Views

Dear intel_corp and @Chris9300

I have the same problem with intel Xeon gold Linpack performance and dell R740 server, I found that the 24 all cpus freq are 1.6GHz when Linpack is running.

I use CentOS 7.2 with intel parallel Studio 2008 Update1 benckmark and using turbostate linux command found this problem.

My CPU is 2* gold 5118, Standard freq is 2.3GHz, max turbo is 3.2GHz.

We also have Xeon v4 CPU DELL R630, and use the same intel parallel Studio 2008 Update1 benckmark and the same linux edition with turbostat command, all cores become turbo mode, performance is good.

So later I change the dell BIOS with Performance Power mode, and disable C stat and C1E but no use. the all 24 cpus running are @1.6GHz.

So is there any update news?

Thanks!

0 Kudos
mxiao1
Beginner
1,859 Views

Dear all

I found the problem! And later my linpack 4 nodes + 96GB Mem and Xeon gold 5118 with 10Gb Ethernet got 87.01% linpack sorce.

The reason is AVX512!

With Xeon gold 5118, linpack use mkl for AVX 512, but when run AVX512, the total cpus freq will down to 1.6GHz, with AVX2, the freq will at 2.3GHz, with SSE2 will up to 2.7GHz,

Please see wiki:https://en.wikichip.org/wiki/intel/xeon_gold/5118 https://en.wikichip.org/wiki/intel/xeon_gold/5118

So before run runme_intel64_dynamic, I add export MKL_ENABLE_INSTRUCTIONS=AVX2

Detail here:https://software.intel.com/en-us/mkl-macos-developer-guide-instruction-set-specific-dispatching-on-intel-architectures Instruction Set Specific Dispatching on Intel® Architectures

Thanks!

0 Kudos
Reply