Processors
Intel® Processors, Tools, and Utilities
14509 Discussions

Understanding numbers for differing performance values

AFala1
Beginner
1,804 Views

I'm not sure if this is the right forum so if it's not please redirect me. I'm trying to understand some differences I'm seeing in network throughput. My question isn't about networking or even the NICs in use. I think the difference I'm seeing is due to architecture of the system bus etc. I'm using a Xeon system which has two processors (physical). There are 24 processors in the system (6 per physical processor with HT enabled). The MoBo is arranged with 2 NUMA nodes and, as should be the case, each processor has its own node. There is 32GB of memory (non-ECC DDR3). The PCM tool set identifies the processors as "Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz 'Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown'."

I have two of these MoBos which I'm linking directly to validate 40Gbps (40GbE); i.e. no switch between them yet. When I run iperf3 without CPU affinity, on both client and server, I cannot see network transfer speeds of > 22Gbps. However, when I set CPU affinity (it must be on both sides) I get ~36Gbps. This is more what I would have expected. Since I'm testing using Linux (CentOS 6 and 7), I figured this is most likely due to memory/bus architecture and the fact that the NICs, on both systems, are connected to NUMA node 0. Using numactl I've attempted fiddling with the policies for the program while it's running but I'm not seeing the same results. If I bind memory allocations and CPUs to the same nodes as the NIC, typically, throughput is faster, but not ~36Gbps, usually closer to 30 or 32. If I bind to the remote node (from the NIC), the performance usually goes down but not to ~22Gbps, rather closer to 28 or 29. I'd like to better understand this disparity. I'm using pcm to collect statistics but I'm not sure I fully understand how to interpret them. Can the Linux perf toolset be of any use? Any pointers are much appreciated. For example, if perf is a good tool to use, which events should I monitor to help me understand what's going on? (I've experimented with perf but, much to my surprise, the tool says that things like, "uncore_imc_0/cas_count_read/" and "uncore_qpi_0/clockticks/" are not supported. Should this be?)

Thanks,

Andy

0 Kudos
1 Solution
idata
Employee
621 Views

Hello AndrewFalanga,

 

 

I'm sorry that it took me some time to come around to this post with a proper answer.

 

 

Unfortunately, I was not able to find an answer for you, related to this issue. This is a Tray processor therefore it is necessary to contact the OEM or your FAE to get the information requested or to solve the issue.

 

 

 

Best regards,

 

 

 

Ivan.

 

View solution in original post

0 Kudos
3 Replies
idata
Employee
621 Views

Hello AndrewFalanga,

 

 

Thank you for contacting the Intel community.

 

 

Please allow me to further investigate this information for you, once I get an answer I will post it here.

 

 

 

Best regards,

 

 

 

Ivan.

 

0 Kudos
idata
Employee
621 Views

I'm sorry for the delay on responding back to you, I'm still investing this for you, I'm still waiting for an answer, once I get it I will post it here.

 

 

 

Best regards,

 

 

 

Ivan.

 

0 Kudos
idata
Employee
622 Views

Hello AndrewFalanga,

 

 

I'm sorry that it took me some time to come around to this post with a proper answer.

 

 

Unfortunately, I was not able to find an answer for you, related to this issue. This is a Tray processor therefore it is necessary to contact the OEM or your FAE to get the information requested or to solve the issue.

 

 

 

Best regards,

 

 

 

Ivan.

 

0 Kudos
Reply