I am hoping to use X540-T2 10G NICs to tightly connect 3 Windows 7 desktops (core I7 3930s with 6 cores each) using standard hardware/software found in most offices. This is for a simulation application - each core runs a separate executable which communicates with other processes via TCP sockets. This works well with our existing Gb lan (using a switch), but the latency imposes a limit on the overall speed of execution - each process sends, does some processing, then receives (and has to block until the data from other processes arrives).
Our hope is to add X540-T2 cards on each desktop, and to connect a "private" lan to/from each machine.
1) Can X540-T2 cards be used to directly connect 2 computers (using RJ45/Cat6 crossover cable)?
2) If three computers are to be connected, can 3 cross-over cables be connected making a ring between the X540 dual ports on each machine?
3) Any tips to configure hardware or software for minimal end-to-end latency with a Windows 7 OS? We are using standard TCP/IP socket calls from a C++ application.
If the above is possible, it saves the costs of a new 10G switch (the latency in the switch is probably minimal)...
All help/advice/direction appreciated!
This sounds interesting. I'll start with answering the easy question first. You can directly connect two X540-T2 adapters using an Ethernet cable. No crossover is required and no switch is required. Just make sure you use a good quality cable and do not run them parallel to a potential interference source like a power cord.
You could connect directly connect each of the 3 computers to each other making use of the dual ports on each adapter. I suppose you would configure a different static IP address for each port and that none of the machines would know about the IP address that was not directly connected. I don't have any experience setting up this type of a network with Ethernet, so I am not sure what the potential issues might be with this setup.
There are likely some Windows TCP settings you can tweak such as TCP windows size. The adapters have low latency interrupt (LLI) settings that you would want to configure.
By the way, low latency interrupts can be configured on many Intel Ethernet dual port gigabit adapters such as the Intel(R) Ethernet Server Adapter I350-T2 adapter. If you don't need the extra bandwidth, you could potentially use a gigabit adapter.
By default, Intel Ethernet adapters also use adaptive interrupt moderation. You would probably adjust the interrupt moderation rate or turn it off to get the lowest latency. Of course, that will generate more interrupts to the CPU, so I'm not sure of the overall effect on your application.
Tuning Intel® Ethernet Adapter throughput performanceis a page you should look at for some tuning ideas. Also look at the links at the end of that page. Some of Microsoft's suggestions for tuning Windows Server 2008 R2 might also apply to Windows 7.
I am looking forward to hearing about how your implementation works out and what suggestions you might have for anyone else doing something similar.
Thanks for the reply Mark - most helpful.
We do not have any I350-T2 cards, but do have Intel 82579V NICs - we got the latest ProSet/drivers but they do not seem to have the LLI option - we have played with lots of settings (moderation, priorities etc...) but they did not seem to help much (we tested using high resolution ping timings as well as our own application).
Do you know of other 1G cards (besides the I350-T2 and I350-T4) that support LLI?
Would the overall latency of an I350-T2 card (using LLI) be similar to the X540-T2 10G card?
Thank you much for your answers/advice - any other suggestions are appreciated (HPC, MPI library etc...).
I will follow up here with our findings...
For high performance computing the NetEffect™ Server Cluster Adapter Family are our best adapters for low latency. However, running under Windows 7, you won't get the normal low-latency benefits of using an HPC operating system with the corresponding drivers. The adapters listed below will be your best bet for the lowest latency in Windows 7.
Low latency interrupts (LLI) are available in the driver on most of our newer adapters, but the 82579 driver is not on that list. You will be able to use configurations for lower latency with some of the other adapters on this list. Here are some adapters that do have LLI configuration options. This list is not 100% complete, but the list is only missing a few of the oldest adapters that had LLI configuration, so I would pick something from this list. As you can see, we do have some other, albeit older, gigabit adapters that support LLI configuration.
As to the difference in latency between the 10-gigabit adapters versus the gigabit adapters, I do not know the answer. I am asking around to see what answer I might find. If you need to get the very quickest transfers, then the 10-gigabit adapters are going to be quicker even if latency turned out to be similar.
Latency and throughput speed are affected by other hardware components, the operating system, other processes running on the system, etc. So getting the best NIC might not get you the performance you need. That only means that the NIC will not be your bottleneck.
Message was edited by: Mark H @ Intel to remove the CT Desktop Adapter from the LLI supported list.
I heard back from one of our engineers that deals with performance testing. The i350 and x540 adapters will likely have similar latency figures. Of course, you get the extra transfer speed and bandwidth with the 10 gigabit adapters.
Message was edited by: Mark H @ Intel because I had it wrong before editing.
Thanks again for the fantastic support - truely appreciated.
I bought 5 new Gigabit CT cards (they were in-stock at a local store and not expensive)... I installed the latest ProSet/drivers, but the LLI option is not shown. I could not find documentation on use of LLI with the CT gigabit card (82574L) - anything I am missing?
Unfortunately the LLI configuration is not supported on the Gigabit CT Desktop Board. The driver is the same driver that supports some of the server adapters with that option, but for that particular card, the option is not in there. You will have to return those adapters if you need the LLI options. I am very sorry for giving you bad information on that option.
I directly connect 3 Windows 7/64 boxes using T520-T2. Each box's port 1 is connected to another box's port 2. Takes just 3 cat 6 wires. More work is involved configuring Usually there's some other network to the outside world that has a gateway, and only one gateway per system is allowed (a TCP/IP "requirement"). Using windows network manager you configure each port but don't specify a gateway for these devices. You'll write a couple of routes specifying which port is talking to each of the other two systems. You may already know you can't use the same subnet for two different NICs on the same node.
Below, 10.0.x.y: x is the port number on this node, and y is the node number. Thus, are two subnets.
Node 1 port 1-- 10.0.1.1 -> Node 2 port 2 -- 10.0.2.2
Node 2 port 1-- 10.0.2.1 -> Node 3 port 2 -- 10.0.2.3
Node 3 port 1-- 10.0.3.1 -> Node 1 port 2 -- 10.0.2.1
You have to add these IP addresses to \Windows\System32\drivers\etc\hosts, and this file will be different on each node.
Using admin privileges add to each node the permanent routes from both NICs to the to NICs that are on the other two nodes.
When you have done the above, it's just easier to boot all the nodes especially for the sake of the routes.
Once done, you can use conventional Windows network apps on these connections. You might start just by doing a ping of the other two nodes from each node.
My own application interest includes runing MPI using the x520-T2 network, and Intel's MPI and MPICHv2 both work. Within one specific applicationI have witnessed both NICs running in parallel with inbound traffic exceeding 30 gbps and outbound over 10 gbps. I have increased several of the settings for each of the NICs to provide for more send/receive buffers and use jumbo frame size = 9014. I haven't been able to get access to running the largest frame size.
BTW my nodes are dual boot: the other platform is CentOS and the performance of the same app is comparable using Intel's MPI. So far, I'm unable to get OpenMPI to work on this point-point network.
This past week I tried a different view of the subnets by assigning a subnet to each of the wires. Because of the OpenMPI problem, I wanted to try something else. The Linux version and Intel MPI work, but OpenMPI's behavior is unchanged. I'll be testing the Windows 7 side of the nodes this week.
For the subnet/wire approach, for 10.0.x.y, x= wire number and y = node number. The wires remain plugged into the same NIC ports as before.
Wire 1 Box 1-- 10.0.1.1 -> Box 2-- 10.0.1.2
Wire 2 Box 2-- 10.0.2.2 -> Box 3-- 10.0.2.3
Wire 3 Box 3-- 10.0.3.3 -> Box 1-- 10.0.3.1
Changes to \Windows\System32\drivers\etc\hosts and routes have to be updated to support this configuration.
In the 3 node configuration, each node has a left and right node, and each node is only 1 "hop" away. While one can extend this ring further, the hop count would grow. If I were to add a 4th node, I'd look into doing a hypercube connection topology.
Thanks for the reply Art - this is a good/detailed explanation of how to connect 3 machines with dual port direct-connection cards. We decided to take a bit of a different direction - if we had to buy new/expensive 10G cards (in order to get low-latency) we instead decided to buy infiniband cards (1-3 uSec latency) and will try the same direct connection trick with them (or user a switch which is not so expensive for infiniband vs 10G).
We also switched to Intel MPI routines for the communications between computers/processors (instead of our custom TCP/socket routines) - they do direct memory transfers (minimal latency) between cores/processes on the same computer, TCP over the lan, and support infiniband...
Thanks much for all the help!