I'm not sure if this is the right forum so if it's not please redirect me. I'm trying to understand some differences I'm seeing in network throughput. My question isn't about networking or even the NICs in use. I think the difference I'm seeing is due to architecture of the system bus etc. I'm using a Xeon system which has two processors (physical). There are 24 processors in the system (6 per physical processor with HT enabled). The MoBo is arranged with 2 NUMA nodes and, as should be the case, each processor has its own node. There is 32GB of memory (non-ECC DDR3). The PCM tool set identifies the processors as "Intel(R) Xeon(R) CPU E5-2667 0 @ 2.90GHz 'Intel(r) microarchitecture codename Sandy Bridge-EP/Jaketown'."
I have two of these MoBos which I'm linking directly to validate 40Gbps (40GbE); i.e. no switch between them yet. When I run iperf3 without CPU affinity, on both client and server, I cannot see network transfer speeds of > 22Gbps. However, when I set CPU affinity (it must be on both sides) I get ~36Gbps. This is more what I would have expected. Since I'm testing using Linux (CentOS 6 and 7), I figured this is most likely due to memory/bus architecture and the fact that the NICs, on both systems, are connected to NUMA node 0. Using numactl I've attempted fiddling with the policies for the program while it's running but I'm not seeing the same results. If I bind memory allocations and CPUs to the same nodes as the NIC, typically, throughput is faster, but not ~36Gbps, usually closer to 30 or 32. If I bind to the remote node (from the NIC), the performance usually goes down but not to ~22Gbps, rather closer to 28 or 29. I'd like to better understand this disparity. I'm using pcm to collect statistics but I'm not sure I fully understand how to interpret them. Can the Linux perf toolset be of any use? Any pointers are much appreciated. For example, if perf is a good tool to use, which events should I monitor to help me understand what's going on? (I've experimented with perf but, much to my surprise, the tool says that things like, "uncore_imc_0/cas_count_read/" and "uncore_qpi_0/clockticks/" are not supported. Should this be?)