2 Replies Latest reply on Dec 25, 2014 5:00 AM by slystopad

    GUEST with bonding and VLAN support on CentOS 6.5 with Intel SR-IOV

    PieterLauwers

      Overview

      In our setup we have a physical server hosting several virtual machine guests.

      All guests are assigned two interfaces which are bonded together for redundancy support.

      GUESTS communicate over the native VLAN (bond0) and one or more extra vlan interfaces (bond0.x).

      The setup is build with

      • HP Proliant DL380P Gen8 servers (16 core, 48 GB RAM)
      • HP 561FLR-T NIC containing Intel X540-AT2 chipset
      • Linux CentOS 6.5 (64 bit HOST and a mixture of 32 and 64 bit GUESTS)
      • Some VM's have SR-IOV enabled, other VMs are bridged (Open vSwitch). Note that bridged GUESTS have only their eth0 connected, eth1 is never used because redundancy is provided on the HOST.

      It took some effort to get this setup up and running. Especially the bonding part. I would like to share our experience for the benefit of the community. I'm not a specialist in this domain and I have no experience with other setups (e.g. other hardware or OS).


      The diagram

      SR-IOV-bonding-VLAN.png

      Configuration

      I skip the process to enable SR-IOV in the Bios. This will most likely be different on your platform anyhow. See the HP document referenced below for details on HP proliant servers.


      Kernel

      We use kernel kernel-2.6.32-431.20.3.el6.x86_64

      The kernel parameters intel_iommu=on pci=realloc intremap=no_x2apic_optout are required to enable SR-IOV support.


      Driver

      We upgraded to the latest driver officially supported by HP. Without this driver update communication between the VM and the HOST was only possible on the native network and not on a VLAN tagged network.

      kmod-hp-ixgbe-3.19.0.46-4.rhel6u5.x86_64

      kmod-hp-ixgbevf-2.12.0.38-4.rhel6u5.x86_64

       

      The kernel module parameters used are:

      options ixgbe max_vfs=63,63


      Virsh qemu-kvm

      Virsh has support for SR-IOV and can assign a virtual function to the VM. On the HOST two networks are defined, one for eth0 and one for eth1.They use mode=hostdev which is for SR-IOV support. Below is the definition for eth0. The one for eth1 is similar.


      cat /etc/libvirt/qemu/networks/autostart/passthrough_eth0.xml

      <network>

        <name>passthrough_eth0</name>

        <uuid>4bbbf5e2-7b80-7cf9-c667-50bb711f2e4c</uuid>

        <forward mode='hostdev' managed='yes'>

          <pf dev='eth0'/>

        </forward></network>

       

      Assign two network interfaces to the GUEST, one from passthrough_eth0 and one from passthrough_eth1.

      <domain type="kvm">

        ...

        <devices>

           ...

          <interface type="network">

            <mac address="52:54:00:bb:f7:8f"/>

            <source network="passthrough_eth0"/>

          </interface>

          <interface type="network">

            <mac address="52:54:00:47:ce:4f"/>

            <source network="passthrough_eth1"/>

          </interface>

        </devices>

      </domain>


      Bonding on the HOST

      This was the most tricky part to get right.

      Starting with the HOST. We rely on the link state of the interface to trigger the failover. eth0 is the preferred interface. This is to make sure that eth0 is active when possible. The updelay of 30 seconds is the time needed by the switch port to come in the spanning tree forwarding state. Therefore we wait 30 seconds before using eth0 as the active interface when it becomes (again) available. The resulting bonding configuration stored in /etc/modprobe.d/bonding.conf is:

       

      alias bond0 bonding

      options bonding mode=1 miimon=100 primary=eth0 updelay=30000


      Bonding on the GUEST

      Take into account:

      • The guest can rely on the link state of the physical network card. Virtual Functions have the same link state as their Physical counter part. So we can use the same bonding configuration on the GUEST.
      • The HOST and GUEST must use the same active network card. A VM that uses eth1 can't communicate with the HOST when the host uses eth0. Even when both interfaces are up and running. Therefore it is crucial that HOST and GUEST failover at the same time to the same interface.
      • When eth1 is enslaved to the bond (during ifup eth1), the bonding driver will change the MAC address of eth1 to the MAC of eth0. So once bonded, both NICs will have the same MAC address. Unfortunately this change of MAC address is not applied at the level of the HOST. The HOST notices the attempt to change the MAC, but refuses to actually change it. The result is that the GUEST and HOST have a different MAC for the same interface. Packets sent by the GUEST over eth1 will be dropped on the network card by the mac spoofing feature.
        There are two possible solutions for this issue (or maybe there are more that I don't know about)
        • configure the bonding option fail_over_mac=active. This option will make that the bond interface will use the MAC address of the active interface. During a failover the MAC address will change to the MAC of the newly active interface. All hosts on the subnet need to update their ARP tables. A (configurable) number of gratuitious ARPs are send over the bond interface to force these ARP table updates.
          One problem with this solution is that vlan interfaces on top the bonding interface do not change their MAC address. So bond0.123 will still use the MAC of eth0 even when eth1 is the active interface. The result is that only communication over the native VLAN works.
        • Manually change the MAC address on the HOST. using the command ip link set eth1 vf 3 mac aa:bb:cc:dd:ee:ff (but then with the correct MAC of eth0 on the GUEST) will make the bond work. Even for vlan interfaces.
        • It must be said that a test using kernel 3.10.48-1.el6.elrepo.x86_64 and ixgbe 3.13.10-k and ixgbevf 2.7.12-k showed that changing the MAC in the GUEST automatically changed the MAC in the HOST. But using this setup we had other issues like: virsh enumerates the VFs wrong and assigns the MAC address for the GUEST to the wrong virtual function. And we also saw issues where a GUEST was sending ARP requests from a different MAC than it's own. We left this path without finding the real cause or a real solution.

      In the end we use the same bonding config on the GUEST:

       

      alias bond0 bonding

      options bonding mode=1 miimon=100 primary=eth0 updelay=30000


      After virsh has started the GUEST, we have a script on the HOST that updates the MAC address of eth1 of the GUEST.

       

      Result

      After all this configuration, what is now the end result?

      Communication between two VMs works both over the native as tagged VLAN interfaces. Unplugging a network cable will cause a bonding failover and all communication resumes. Restoring the network cable will force all bonds back to eth0 after 30 seconds.

      VM to HOST communication also works as expected (native, tagged VLAN, bonding failover).

      But keep in mind that all machines (HOST and all GUESTS) must use the same active interface.

       

      What still doesn't work is communication between a GUEST using SR-IOV and a GUEST connected to the bridge on the HOST. An SR-IOV GUEST and a bridged GUEST can't communicate with each other. Each GUEST can communicate with the HOST or the outside world (native and tagged VLAN), but they can't communicate to each other. Ethernet broadcast packets (e.g. ARP requests) arrive in the bridged GUEST, but unicast ethernet frames don't arrive in the bridged GUEST. Packets from the bridged GUEST to the SR-IOV guest appear to work as expected.

      Maybe someone has a solution to this problem...

       

      References

        • 1. Re: GUEST with bonding and VLAN support on CentOS 6.5 with Intel SR-IOV
          Patrick_Kutch

          I took this to our SR-IOV Guru, and he came back with this for you:

           

          Problem description leads me to believe customer is unable to communicate between VM on a bridged network and VM with SR-IOV.   I believe the packets from SR-IOV VM are dropped by the bridged interface.   Workaround for this case was added to Linux kernel 3.4.x.  Below is the excerpt of the actual scenario.

          In SR-IOV enabled adapters, Physical Function (PF) does not work in bridge mode. When a bridge is created on the PF device, an emulation device in the Virtual Machine (VM) connects to this bridge cannot receive any  unicast packets.

          To avoid this from occurring, for each emulated device (virtio device in the VM) added to the bridge, the MAC address of the emulated device (virtio device in the VM) needs to be added manually  to the forwarding database (FDB) filter table using the iproute2 package bridge tool. This can be done by executing the following command:

            # bridge fdb add <MAC ADDR> dev <PF device interface>

          To display if the mac address is added to the FDB table
            # bridge fdb show dev < PF device interface >

          The FDB entry when the emulated device is no longer in use or the guest to which it is assigned is moved to a different host must be deleted using this
          command:

            # bridge fdb del <MAC ADDR> dev <PF device interface>

          Please instruct the customer to update the Linux kernel to version 3.4.x or newer and the commands above.  I’d also recommend disabling VF Anti-spoofing if Linux Channel bonding driver is being used.  Below is the command to disable it.

                          ip link set <nic interface name> vf <num> spoofchk off

          Example:  ip link set p5p2 vf 1 spoofchk off    Disabling Spoofchk feature on VF # 1 for Ethernet interface P5P2. 

           

          Hope this helps

           

          - Patrick

          • 2. Re: GUEST with bonding and VLAN support on CentOS 6.5 with Intel SR-IOV
            slystopad

            Hi

             

            I'm also interested in bonding inside virtual machine with SR-IOV VF as NIC's.

             

             

            • When eth1 is enslaved to the bond (during ifup eth1), the bonding driver will change the MAC address of eth1 to the MAC of eth0. So once bonded, both NICs will have the same MAC address. Unfortunately this change of MAC address is not applied at the level of the HOST. The HOST notices the attempt to change the MAC, but refuses to actually change it. The result is that the GUEST and HOST have a different MAC for the same interface. Packets sent by the GUEST over eth1 will be dropped on the network card by the mac spoofing feature.

            I see same behavior. Is there any solution to propagate MAC address change to the HOST? Because we have requirement to be able change MAC address of the NIC inside VM.

             

            I have spoofchk disabled on the VF. There aren't any log messages about spoofed packets but traffic doesn't flow through interface after MAC change inside VM. MAC change works if MAC changed synchronously in VM and on VF in HOST.