1.6.2. Two VMs, one core, Virtio monoqueue NICs, VXLAN termination, offloading enabled, with Open vSwitch

Two VMs on different hypervisors exchange traffic on a private subnet. Hypervisors encapsulate traffic in a VXLAN tunnel. Each hypervisor uses Open vSwitch bridges to connect its physical interface and the VM virtual interface. Each VM runs on one core, and one Virtio interface using one virtual ring. Virtual Accelerator runs on two cores (cores 1 and 2).

../../_images/2-vms-1-core.svg

Virtual Accelerator configuration

host1

In this second use case, the fast path configuration will be customized using the interactive wizard:

  • Assign one physical core to the fast path

  • Poll one physical port only

  • Adapt the amount of memory reserved for the VM to our needs

  1. Start the fast path configuration wizard tool

    root@host1:~# fast-path.sh config -i
    
  2. Select which physical ports the fast path should use.

    Enter into the first sub-menu:

    Fast path configuration
    =======================
    
    1 - Select fast path ports and polling cores
    2 - Select a hardware crypto accelerator
    3 - Advanced configuration
    4 - Advanced plugin configuration
    5 - Display configuration
    
    S - Save configuration and exit
    Q - Quit
    
    Enter selection [S]: 1
    

    Select Manually configure FP_PORTS with e. We see that ens5f0 has the PCI bus identifier 0000:85:00.0:

    Fast path port selection
    ========================
    
    Core/port mapping mode is auto
    
    == ============ == ========== ============== ===================================
    #  PCI or Name  Id Interface  Selected cores NIC full name
    == ============ == ========== ============== ===================================
    1  0000:03:00.0 0  ens1f0     auto: 7,23     Intel Corporation 82599ES 10-Gigabi
    2  0000:03:00.1 0  ens1f1     auto: 7,23     Intel Corporation 82599ES 10-Gigabi
    3  0000:06:00.0 0  mgmt0      auto: 7,23     Intel Corporation I350 Gigabit Netw
    4  0000:06:00.1 0  enp6s0f1   auto: 7,23     Intel Corporation I350 Gigabit Netw
    5  0000:83:00.0 0  ens7f0     auto: 15,31    Intel Corporation Ethernet Controll
    6  0000:83:00.1 0  ens7f1     auto: 15,31    Intel Corporation Ethernet Controll
    7  0000:85:00.0 0  ens5f0     auto: 15,31    Intel Corporation 82599ES 10-Gigabi
    8  0000:85:00.1 0  ens5f1     auto: 15,31    Intel Corporation 82599ES 10-Gigabi
    == ============ == ========== ============== ===================================
    
    A - Add a virtual device
    C - Switch to manual core/port mapping mode
    M - Set the fast path core mask. Current value is "auto" (7,15,23,31)
    E - Manually configure FP_PORTS. Current value is "all"
    
    B - Back
    
    Enter selection [B]: e
    

    Enable the ens5f0 port using its PCI bus identifier:

    Set the fast path ports
    =======================
    
    ## FP_PORTS defines the list of ports enabled in the fast path.
    ##
    ## The value can be:
    ## - 'all' or commented out: all supported physical ports are enabled.
    ## - a space-separated list of keys, defining the list of ports. A key
    ##   adds one or several ports to the current list, or removes them if
    ##   it is prefixed by '-'. The valid keys are: 'all', a pci identifier,
    ##   a linux interface name.
    ##
    ##   Example: "" means no port
    ##   Example: "all -mgmt0" means all ports except the one associated to
    ##   the interface called mgmt0 in Linux.
    ##   Example: "0000:03:00.0 0000:03:00.1" means the ports whose pci bus
    ##   addresses match.
    ##
    ##   A PCI bus can be suffixed by driver-specific arguments. For instance:
    ##   "0000:03:00.0,role=right,verbose=1".
    ##
    ##   The list is evaluated from left to right, so that
    ##   "eth0 0000:03:00.0 -all" means no port are enabled.
    ##
    ##   Note: be careful when using Linux interface names in configuration,
    ##   as they are subject to changes, depending on system configuration.
    ##
    ## This parameter is evaluated at fast path start and is converted into
    ## a whitelist or blacklist of PCI devices, that is passed to the fast
    ## path command line. Therefore, it is not possible to enable only some
    ## ports of a PCI device.
    ##
    ## In expert mode (EXPERT_FPCONF=on), this parameter is mandatory and
    ## must contain a list of PCI devices only.
    ##
    
    Current value for FP_PORTS: all
    Use 'none' to remove all ports
    Enter new value [no change] > 0000:85:00.0
    
  3. Our PCI device is located on the second socket. To achieve the best performance, it has to be polled by cores from the same socket. We choose to use the first physical core of the second socket (logical cores 10 and 30).

    Note

    Use fast-path.sh config -m to display the machine topology.

    Select m to modify the fast path core mask:

    Fast path port selection
    ========================
    
    Core/port mapping mode is auto
    
    == ============ == ========== ============== ===================================
    #  PCI or Name  Id Interface  Selected cores NIC full name
    == ============ == ========== ============== ===================================
    1  0000:03:00.0 0  ens1f0                    Intel Corporation 82599ES 10-Gigabi
    2  0000:03:00.1 0  ens1f1                    Intel Corporation 82599ES 10-Gigabi
    3  0000:06:00.0 0  mgmt0                     Intel Corporation I350 Gigabit Netw
    4  0000:06:00.1 0  enp6s0f1                  Intel Corporation I350 Gigabit Netw
    5  0000:83:00.0 0  ens7f0                    Intel Corporation Ethernet Controll
    6  0000:83:00.1 0  ens7f1                    Intel Corporation Ethernet Controll
    7  0000:85:00.0 0  ens5f0     auto: 15,31    Intel Corporation 82599ES 10-Gigabi
    8  0000:85:00.1 0  ens5f1                    Intel Corporation 82599ES 10-Gigabi
    == ============ == ========== ============== ===================================
    
    A - Add a virtual device
    C - Switch to manual core/port mapping mode
    M - Set the fast path core mask. Current value is "auto" (7,15,23,31)
    E - Manually configure FP_PORTS. Current value is "0000:03:00.0"
    
    B - Back
    
    Enter selection [B]: m
    

    Set its value to 10,26. The syntax is detailed below:

    Set the fast path core mask
    ===========================
    
    ## FP_MASK defines which logical cores run the fast path.
    ##
    ## The value can be:
    ## - 'auto' or commented out: on a Turbo Router, all cores are enabled, except
    ##   the first one. On a Virtual Accelerator, one physical core per node is
    ##   enabled.
    ## - a list of logical cores ranges.
    ##   Example: "1-4,6,8" means logical cores 1,2,3,4,6,8.
    ## - an hexadecimal mask starting with '0x': it represents the mask of logical
    ##   cores to be enabled.
    ##   Example: "0x1e" means logical cores 1,2,3,4.
    ##
    ## Note: the core 0 is usually reserved for Linux processes and DPVI, so
    ## it's not advised to use it in FP_MASK.
    ##
    ## In expert mode (EXPERT_FPCONF=on), this parameter is mandatory and must not
    ## be auto.
    ##
    
    Enter value [auto]: 10,26
    

    The core to port mapping is updated:

    Fast path port selection
    ========================
    
    Core/port mapping mode is auto
    
    == ============ == ========== ============== ===================================
    #  PCI or Name  Id Interface  Selected cores NIC full name
    == ============ == ========== ============== ===================================
    1  0000:03:00.0 0  ens1f0                    Intel Corporation 82599ES 10-Gigabi
    2  0000:03:00.1 0  ens1f1                    Intel Corporation 82599ES 10-Gigabi
    3  0000:06:00.0 0  mgmt0                     Intel Corporation I350 Gigabit Netw
    4  0000:06:00.1 0  enp6s0f1                  Intel Corporation I350 Gigabit Netw
    5  0000:83:00.0 0  ens7f0                    Intel Corporation Ethernet Controll
    6  0000:83:00.1 0  ens7f1                    Intel Corporation Ethernet Controll
    7  0000:85:00.0 0  ens5f0     auto: 10,26    Intel Corporation 82599ES 10-Gigabi
    8  0000:85:00.1 0  ens5f1                    Intel Corporation 82599ES 10-Gigabi
    == ============ == ========== ============== ===================================
    
    A - Add a virtual device
    C - Switch to manual core/port mapping mode
    M - Set the fast path core mask. Current value is "10,26"
    E - Manually configure FP_PORTS. Current value is "0000:83:00.0"
    
  4. Reduce the amount of memory reserved for the VM. By default, 4GB per socket is reserved but 1GB on the second socket is enough in our use case.

    Enter Advanced configuration:

    Fast path configuration
    =======================
    
    1 - Select fast path ports and polling cores
    2 - Select a hardware crypto accelerator
    3 - Advanced configuration
    4 - Advanced plugin configuration
    5 - Display configuration
    
    S - Save configuration and exit
    Q - Quit
    
    Enter selection [S]: 3
    

    Ask to modify the amount of memory reserved for VMs:

    Advanced configuration
    ======================
    
    1 - Set the amount of fast path memory (current value is auto)
    2 - Enable/disable fast path memory allocation (current value is on)
    3 - Set the number of mbufs (current value is auto)
    4 - Enable/disable fast path offloads (current value is auto)
    5 - Set the number of memory channels (current value is auto)
    6 - Set the hugepage directory (current value is /mnt/huge)
    7 - Set the amount of VM memory (current value is auto)
    
    B - Back
    
    Enter selection [B]: 7
    

    Set its value to 0,1024. The syntax is detailed below:

    Set amount of memory reserved for Virtual Machines
    ==================================================
    
    ## VM_MEMORY defines how much memory from the hugepages to allocate for
    ## virtual machines.
    ##
    ## When running the fast path as a host managing VMs, the fast path
    ## startup script is able to reserve additional memory stored in huge
    ## pages. This memory can be used by Qemu or libvirt for the virtual
    ## machines.
    ##
    ## The value can be:
    ## - auto or commented out: on a Virtual Accelerator, 4GB per socket will
    ##   be reserved, on other products no VM memory will be reserved.
    ## - an integer: it represents the amount of memory in MB to reserve
    ##   for VMs. This amount will be spread equally on all NUMA nodes.
    ##   Example: "4096" asks to reserve 4GB for the virtual machines, distributed
    ##   on all the NUMA nodes of the machine (2GB per node if the machine has
    ##   2 nodes).
    ## - a list of integers, representing the amount of memory in MB
    ##   to reserve on each NUMA node.
    ##   Example: "2048,2048" asks to reserve 2GB on node0 and 2GB on node1
    ##   in huge pages for the virtual machines.
    ##
    ## In expert mode (EXPERT_FPCONF=on), the parameter is mandatory and its format
    ## must be a list of integer, one per socket.
    ##
    
    Enter value [auto]: 0,1024
    
  5. The configuration can be saved. It will generate the /etc/fast-path.env file:

    Fast path configuration
    =======================
    
    1 - Select fast path ports and polling cores
    2 - Select a hardware crypto accelerator
    3 - Advanced configuration
    4 - Advanced plugin configuration
    5 - Display configuration
    
    S - Save configuration and exit
    Q - Quit
    
    Enter selection [S]: S
    
  6. libvirt does not support the cpuset isolation feature; it has to be disabled in /etc/cpuset.env.

    -#: ${CPUSET_ENABLE:=1}
    +: ${CPUSET_ENABLE:=0}
    
  7. Start Virtual Accelerator.

    root@host1:~# systemctl start virtual-accelerator.target
    
  8. Restart the Open vSwitch control plane.

    root@host1:~# systemctl restart openvswitch
    
  9. The hugepages are allocated by Virtual Accelerator at startup and libvirt cannot detect them dynamically. libvirt must be restarted to take the hugepages into account.

    root@host1:~# systemctl restart libvirtd.service
    

    Warning

    If you restart Virtual Accelerator, you must restart openvswitch and libvirt (and its VMs) as well.

  10. Create a virtio interface to communicate with the VM. The sockpath argument will be used in the libvirt XML file later.

    root@host:~# fp-vdev add tap0 --sockpath=/tmp/pmd-vhost0
    devargs:
      profile: endpoint
      sockmode: client
      sockname: /tmp/pmd-vhost0
      txhash: l3l4
      verbose: 0
    driver: pmd-vhost
    ifname: tap0
    rx_cores: all
    

    Note

    Make sure that the fast path has been started before you create hotplug ports with fp-vdev commands

    See also

    The 6WINDGate Fast Path Managing virtual devices documentation for more information about the fp-vdev command.

host2

Follow on host2 the same configuration steps as on host1:

  • Using the configuration wizard:

    • select a physical port and select the fast path cores

    • change the amount of VM memory

    • save the resulting configuration file

  • Disable the cpuset feature in libvirt

  • Start or restart services:

    • restart Open vSwitch

    • start Virtual Accelerator

    • restart libvirt

Hosts and VMs configuration

VM creation on both hosts (if needed)

If you don’t have a VM ready, you can use a cloud image. See VM Creation to create one VM on each host with the following libvirt configuration sections:

  • hostname vm1 on host1 and vm2 on host2

    <name>vm1</name>
    

    and

    <name>vm2</name>
    
  • one vhost-user interface:

    <interface type='vhostuser'>
       <source type='unix' path='/tmp/pmd-vhost0' mode='server'/>
       <model type='virtio'/>
    </interface>
    
  • 1048576 bytes (1GB) of memory:

    <memory>1048576</memory>
    

    and

    <numa>
        <cell id="0" cpus="0" memory="1048576" memAccess="shared"/>
    </numa>
    
  • Setup memory to be on socket 1:

    <numatune>
    <!-- adapt to set the host node where hugepages are taken -->
      <memory mode='strict' nodeset='1'/>
    </numatune>
    

Configuration of host1 and vm1

Now that we have access to the VM, we can setup Linux with the configuration needed for iperf.

  1. Inside the VM, set interface up and add address on 11.0.0.0/8 subnet.

    root@vm1:~# ip link set eth1 up
    root@vm1:~# ip addr add 11.5.0.105/8 dev eth1
    
  2. On host1, check the name of the fast path interfaces at the end of the port description in the fp-shmem-ports -d output.

    root@host1:~# fp-shmem-ports -d
    core freq : 2693510304
    offload : enabled
    vxlan ports :
    port 4789 (set by user)
    port 8472 (set by user)
    port 0: ens5f0 mac 90:e2:ba:29:df:58 driver rte_ixgbe_pmd GRO timeout 10us
    RX queues: 2 (max: 128)
    TX queues: 2 (max: 64)
    RX vlan strip off
    RX IPv4 checksum on
    RX TCP checksum on
    RX UDP checksum on
    GRO on
    LRO off
    TX vlan insert on
    TX IPv4 checksum on
    TX TCP checksum on
    TX UDP checksum on
    TX SCTP checksum on
    TSO on
    port 1: tap0 mac 02:09:c0:ea:ef:37 driver pmd-vhost
       (args sockmode=client,sockname=/tmp/pmd-vhost0) GRO timeout 10us
    RX queues: 1 (max: 1)
    TX queues: 2 (max: 64)
    RX TCP checksum on
    RX UDP checksum on
    GRO on
    LRO on
    TX TCP checksum on
    TX UDP checksum on
    TSO on
    

    Note

    tap0 is the vhost port bound to the interface of vm1 and ens5f0 is the host external interface.

  3. On host1, set interfaces up, add an IP address to the host interface and tune its MTU

    root@host1:~# ip link set ens5f0 up
    root@host1:~# ip link set tap0 up
    root@host1:~# ip addr add 10.200.0.5/24 dev ens5f0
    root@host1:~# ip link set mtu 1600 dev ens5f0
    

    Note

    We need to increase MTU size so that encapsulated frames fit in without causing additional fragmentation

  4. On host1, create two chained OVS bridges. The first one (br-int) connects the vhost interface, the second one (br-tun) encapsulates the traffic in a VXLAN tunnel starting at the host interface.

    root@host1:~# ovs-vsctl add-br br-int
    root@host1:~# ovs-vsctl add-port br-int patch-tun -- set Interface \
      patch-tun type=patch options:peer=patch-int
    root@host1:~# ovs-vsctl add-br br-tun
    root@host1:~# ovs-vsctl add-port br-tun patch-int -- set Interface \
      patch-int type=patch options:peer=patch-tun
    root@host1:~# ip link set up dev tap0
    root@host1:~# ovs-vsctl add-port br-int tap0
    root@host1:~# ovs-vsctl add-port br-tun vxlan10 -- set Interface vxlan10 \
      type=vxlan options:remote_ip=10.200.0.6 options:local_ip=10.200.0.5 \
      options:out_key=flow
    root@host1:~# ip link set up dev br-tun
    

Configuration of host2 and vm2

Now that we have access to the VM, we can setup Linux with the configuration needed for iperf.

  1. Inside the VM, set interface up and add address on 11.0.0.0/8 subnet.

    root@vm2:~# ip link set eth1 up
    root@vm2:~# ip addr add 11.6.0.106/8 dev eth1
    
  2. On host2, check the name of the fast path interfaces at the end of the port description in the fp-shmem-ports -d output.

    root@host2:~# fp-shmem-ports -d
    core freq : 2693520583
    offload : enabled
    vxlan ports :
    port 4789 (set by user)
    port 8472 (set by user)
    port 0: ens5f0 mac 00:1b:21:74:5b:58 driver rte_ixgbe_pmd GRO timeout 10us
    RX queues: 2 (max: 128)
    TX queues: 2 (max: 64)
    RX vlan strip off
    RX IPv4 checksum on
    RX TCP checksum on
    RX UDP checksum on
    GRO on
    LRO off
    TX vlan insert on
    TX IPv4 checksum on
    TX TCP checksum on
    TX UDP checksum on
    TX SCTP checksum on
    TSO on
    port 1: tap0 mac 02:09:c0:88:f7:5d driver pmd-vhost
       (args sockmode=client,sockname=/tmp/pmd-vhost0) GRO timeout 10us
    RX queues: 1 (max: 1)
    TX queues: 2 (max: 64)
    RX TCP checksum on
    RX UDP checksum on
    GRO on
    LRO on
    TX TCP checksum on
    TX UDP checksum on
    TSO on
    

    Note

    tap0 is the vhost port bound to the interface of vm2 and ens5f0 is the host external interface.

  3. On host2, set interfaces up, add an IP address to the host interface and tune its MTU

    root@host2:~# ip link set ens5f0 up
    root@host2:~# ip link set tap0 up
    root@host2:~# ip addr add 10.200.0.6/24 dev ens5f0
    root@host2:~# ip link set mtu 1600 dev ens5f0
    

    Note

    We need to increase MTU size so that encapsulated frames fit in without causing additional fragmentation

  4. On host2, create two chained OVS bridges. The first one (br-int) connects the vhost interface, the second one (br-tun) encapsulates the traffic in a VXLAN tunnel starting at the host interface.

    root@host2:~# ovs-vsctl add-br br-int
    root@host2:~# ovs-vsctl add-port br-int patch-tun -- set Interface \
      patch-tun type=patch options:peer=patch-int
    root@host2:~# ovs-vsctl add-br br-tun
    root@host2:~# ovs-vsctl add-port br-tun patch-int -- set Interface \
      patch-int type=patch options:peer=patch-tun
    root@host2:~# ip link set up dev tap0
    root@host2:~# ovs-vsctl add-port br-int tap0
    root@host2:~# ovs-vsctl add-port br-tun vxlan10 -- set Interface vxlan10 \
      type=vxlan options:remote_ip=10.200.0.5 options:local_ip=10.200.0.6 \
      options:out_key=flow
    root@host2:~# ip link set up dev br-tun
    

Testing

We can send traffic from vm1 to vm2 and check that each fast path switches it to and from the VM. First, let’s do a ping to check the setup.

  1. Reset the fast path statistics first on each host.

    root@host:~# fp-cli stats-reset
    
  2. Ping the vm1 address from vm2.

    root@vm2:~# ping 11.5.0.105
    PING 11.5.0.105 (11.5.0.105) 56(84) bytes of data.
    64 bytes from 11.5.0.105: icmp_seq=1 ttl=64 time=5.27 ms
    64 bytes from 11.5.0.105: icmp_seq=2 ttl=64 time=0.465 ms
    64 bytes from 11.5.0.105: icmp_seq=3 ttl=64 time=0.442 ms
    [...]
    
  3. During traffic, you can check that the flows are available in the kernel on each host.

    root@host1:~# ovs-dpctl dump-flows
    recirc_id(0),tunnel(),in_port(3),eth(src=52:54:00:ee:fb:f5,
       dst=52:54:00:e8:0b:95),eth_type(0x0800),ipv4(tos=0/0x3,frag=no),
       packets:23, bytes:2254, used:0.045s, actions:set(tunnel(
       tun_id=0x0,src=10.200.0.5,dst=10.200.0.6,ttl=64,flags(df|key))),4
    recirc_id(0),tunnel(tun_id=0x0,src=10.200.0.6,dst=10.200.0.5,ttl=64,
       flags(-df-csum+key)),in_port(4),skb_mark(0),eth(src=52:54:00:e8:0b:95,
       dst=52:54:00:ee:fb:f5),eth_type(0x0800), ipv4(frag=no),
       packets:23, bytes:3082, used:0.045s, actions:3
    
    root@host2:~# ovs-dpctl dump-flows
    recirc_id(0),tunnel(),in_port(3),eth(src=52:54:00:e8:0b:95,
       dst=52:54:00:ee:fb:f5),eth_type(0x0800),ipv4(tos=0/0x3,frag=no),
       packets:17, bytes:1666, used:0.000s, actions:set(tunnel(
       tun_id=0x0,src=10.200.0.6,dst=10.200.0.5,ttl=64,flags(df|key))),4
    recirc_id(0),tunnel(tun_id=0x0,src=10.200.0.5,dst=10.200.0.6,ttl=64,
       flags(-df-csum+key)),in_port(4),skb_mark(0),eth(src=52:54:00:ee:fb:f5,
       dst=52:54:00:e8:0b:95),eth_type(0x0800),ipv4(frag=no),
       packets:17, bytes:2278, used:0.000s, actions:3
    
  4. The statistics are increased in the fast path, showing that the fast path processed the packets.

    root@host1:~# fp-cli fp-vswitch-stats
      flow_not_found:7
      output_ok:81
      set_tunnel_id:41
    
    root@host2:~# fp-cli fp-vswitch-stats
      flow_not_found:7
      output_ok:85
      set_tunnel_id:42
    

    Note

    The flow_not_found statistics is increased for the first packets of each flow, that are sent to Linux because they don’t match any known flow in the fast path. Linux receives the packets and sends them to the ovs-vswitchd daemon (it is the standard Linux processing). The daemon creates a flow in the OVS kernel data plane. The flow is automatically synchronized to the fast path, and the next packets of the flow are processed by the fast path.

Now that we checked the setup, we can try iperf.

  1. Reset the fast path statistics first on each host.

    root@host1:~# fp-cli stats-reset
    
    root@host2:~# fp-cli stats-reset
    
  2. Start the iperf server on vm1.

    root@vm1:~# iperf -s
    
  3. Start iperf client on vm2.

    root@vm2:~# iperf -c 11.5.0.105 -i 10
    ------------------------------------------------------------
    Client connecting to 11.5.0.105, TCP port 5001
    TCP window size: 85.0 KByte (default)
    ------------------------------------------------------------
    [  3] local 11.6.0.106 port 57649 connected with 11.5.0.105 port 5001
    [ ID] Interval       Transfer     Bandwidth
    [  3]  0.0-10.0 sec  10.6 GBytes  9.07 Gbits/sec
    [  3]  0.0-10.0 sec  10.6 GBytes  9.07 Gbits/sec
    
  4. During traffic, you can check the fast path CPU usage on each host.

    root@host1:~# fp-cpu-usage
    Fast path CPU usage:
    cpu: %busy     cycles   cycles/packet
     10:  100%  538671904             3210
     26:   <1%    5098764               0
    average cycles/packets received from NIC: 3240 (543770668/167807)
    
  5. After traffic, you can check the fast path statistics on each host.

    root@host1:~# fp-cli fp-vswitch-stats
      flow_not_found:4
      output_ok:801241
      set_tunnel_id:347493