1.6.2. Two VMs, one core, Virtio monoqueue NICs, VXLAN termination, offloading enabled, with Open vSwitch¶
Two VMs on different hypervisors exchange traffic on a private subnet. Hypervisors encapsulate traffic in a VXLAN tunnel. Each hypervisor uses Open vSwitch bridges to connect its physical interface and the VM virtual interface. Each VM runs on one core, and one Virtio interface using one virtual ring. Virtual Accelerator runs on two cores (cores 1 and 2).
Virtual Accelerator configuration¶
host1
¶
In this second use case, the fast path configuration will be customized using the interactive wizard:
Assign one physical core to the fast path
Poll one physical port only
Adapt the amount of memory reserved for the VM to our needs
Start the fast path configuration wizard tool
root@host1:~# fast-path.sh config -i
Select which physical ports the fast path should use.
Enter into the first sub-menu:
Fast path configuration ======================= 1 - Select fast path ports and polling cores 2 - Select a hardware crypto accelerator 3 - Advanced configuration 4 - Advanced plugin configuration 5 - Display configuration S - Save configuration and exit Q - Quit Enter selection [S]: 1
Select
Manually configure FP_PORTS
withe
. We see thatens5f0
has the PCI bus identifier0000:85:00.0
:Fast path port selection ======================== Core/port mapping mode is auto == ============ == ========== ============== =================================== # PCI or Name Id Interface Selected cores NIC full name == ============ == ========== ============== =================================== 1 0000:03:00.0 0 ens1f0 auto: 7,23 Intel Corporation 82599ES 10-Gigabi 2 0000:03:00.1 0 ens1f1 auto: 7,23 Intel Corporation 82599ES 10-Gigabi 3 0000:06:00.0 0 mgmt0 auto: 7,23 Intel Corporation I350 Gigabit Netw 4 0000:06:00.1 0 enp6s0f1 auto: 7,23 Intel Corporation I350 Gigabit Netw 5 0000:83:00.0 0 ens7f0 auto: 15,31 Intel Corporation Ethernet Controll 6 0000:83:00.1 0 ens7f1 auto: 15,31 Intel Corporation Ethernet Controll 7 0000:85:00.0 0 ens5f0 auto: 15,31 Intel Corporation 82599ES 10-Gigabi 8 0000:85:00.1 0 ens5f1 auto: 15,31 Intel Corporation 82599ES 10-Gigabi == ============ == ========== ============== =================================== A - Add a virtual device C - Switch to manual core/port mapping mode M - Set the fast path core mask. Current value is "auto" (7,15,23,31) E - Manually configure FP_PORTS. Current value is "all" B - Back Enter selection [B]: e
Enable the
ens5f0
port using its PCI bus identifier:Set the fast path ports ======================= ## FP_PORTS defines the list of ports enabled in the fast path. ## ## The value can be: ## - 'all' or commented out: all supported physical ports are enabled. ## - a space-separated list of keys, defining the list of ports. A key ## adds one or several ports to the current list, or removes them if ## it is prefixed by '-'. The valid keys are: 'all', a pci identifier, ## a linux interface name. ## ## Example: "" means no port ## Example: "all -mgmt0" means all ports except the one associated to ## the interface called mgmt0 in Linux. ## Example: "0000:03:00.0 0000:03:00.1" means the ports whose pci bus ## addresses match. ## ## A PCI bus can be suffixed by driver-specific arguments. For instance: ## "0000:03:00.0,role=right,verbose=1". ## ## The list is evaluated from left to right, so that ## "eth0 0000:03:00.0 -all" means no port are enabled. ## ## Note: be careful when using Linux interface names in configuration, ## as they are subject to changes, depending on system configuration. ## ## This parameter is evaluated at fast path start and is converted into ## a whitelist or blacklist of PCI devices, that is passed to the fast ## path command line. Therefore, it is not possible to enable only some ## ports of a PCI device. ## ## In expert mode (EXPERT_FPCONF=on), this parameter is mandatory and ## must contain a list of PCI devices only. ## Current value for FP_PORTS: all Use 'none' to remove all ports Enter new value [no change] > 0000:85:00.0
Our PCI device is located on the second socket. To achieve the best performance, it has to be polled by cores from the same socket. We choose to use the first physical core of the second socket (logical cores 10 and 30).
Note
Use
fast-path.sh config -m
to display the machine topology.Select
m
to modify the fast path core mask:Fast path port selection ======================== Core/port mapping mode is auto == ============ == ========== ============== =================================== # PCI or Name Id Interface Selected cores NIC full name == ============ == ========== ============== =================================== 1 0000:03:00.0 0 ens1f0 Intel Corporation 82599ES 10-Gigabi 2 0000:03:00.1 0 ens1f1 Intel Corporation 82599ES 10-Gigabi 3 0000:06:00.0 0 mgmt0 Intel Corporation I350 Gigabit Netw 4 0000:06:00.1 0 enp6s0f1 Intel Corporation I350 Gigabit Netw 5 0000:83:00.0 0 ens7f0 Intel Corporation Ethernet Controll 6 0000:83:00.1 0 ens7f1 Intel Corporation Ethernet Controll 7 0000:85:00.0 0 ens5f0 auto: 15,31 Intel Corporation 82599ES 10-Gigabi 8 0000:85:00.1 0 ens5f1 Intel Corporation 82599ES 10-Gigabi == ============ == ========== ============== =================================== A - Add a virtual device C - Switch to manual core/port mapping mode M - Set the fast path core mask. Current value is "auto" (7,15,23,31) E - Manually configure FP_PORTS. Current value is "0000:03:00.0" B - Back Enter selection [B]: m
Set its value to
10,26
. The syntax is detailed below:Set the fast path core mask =========================== ## FP_MASK defines which logical cores run the fast path. ## ## The value can be: ## - 'auto' or commented out: on a Turbo Router, all cores are enabled, except ## the first one. On a Virtual Accelerator, one physical core per node is ## enabled. ## - a list of logical cores ranges. ## Example: "1-4,6,8" means logical cores 1,2,3,4,6,8. ## - an hexadecimal mask starting with '0x': it represents the mask of logical ## cores to be enabled. ## Example: "0x1e" means logical cores 1,2,3,4. ## ## Note: the core 0 is usually reserved for Linux processes and DPVI, so ## it's not advised to use it in FP_MASK. ## ## In expert mode (EXPERT_FPCONF=on), this parameter is mandatory and must not ## be auto. ## Enter value [auto]: 10,26
The core to port mapping is updated:
Fast path port selection ======================== Core/port mapping mode is auto == ============ == ========== ============== =================================== # PCI or Name Id Interface Selected cores NIC full name == ============ == ========== ============== =================================== 1 0000:03:00.0 0 ens1f0 Intel Corporation 82599ES 10-Gigabi 2 0000:03:00.1 0 ens1f1 Intel Corporation 82599ES 10-Gigabi 3 0000:06:00.0 0 mgmt0 Intel Corporation I350 Gigabit Netw 4 0000:06:00.1 0 enp6s0f1 Intel Corporation I350 Gigabit Netw 5 0000:83:00.0 0 ens7f0 Intel Corporation Ethernet Controll 6 0000:83:00.1 0 ens7f1 Intel Corporation Ethernet Controll 7 0000:85:00.0 0 ens5f0 auto: 10,26 Intel Corporation 82599ES 10-Gigabi 8 0000:85:00.1 0 ens5f1 Intel Corporation 82599ES 10-Gigabi == ============ == ========== ============== =================================== A - Add a virtual device C - Switch to manual core/port mapping mode M - Set the fast path core mask. Current value is "10,26" E - Manually configure FP_PORTS. Current value is "0000:83:00.0"
Reduce the amount of memory reserved for the VM. By default, 4GB per socket is reserved but 1GB on the second socket is enough in our use case.
Enter
Advanced configuration
:Fast path configuration ======================= 1 - Select fast path ports and polling cores 2 - Select a hardware crypto accelerator 3 - Advanced configuration 4 - Advanced plugin configuration 5 - Display configuration S - Save configuration and exit Q - Quit Enter selection [S]: 3
Ask to modify the amount of memory reserved for VMs:
Advanced configuration ====================== 1 - Set the amount of fast path memory (current value is auto) 2 - Enable/disable fast path memory allocation (current value is on) 3 - Set the number of mbufs (current value is auto) 4 - Enable/disable fast path offloads (current value is auto) 5 - Set the number of memory channels (current value is auto) 6 - Set the hugepage directory (current value is /mnt/huge) 7 - Set the amount of VM memory (current value is auto) B - Back Enter selection [B]: 7
Set its value to
0,1024
. The syntax is detailed below:Set amount of memory reserved for Virtual Machines ================================================== ## VM_MEMORY defines how much memory from the hugepages to allocate for ## virtual machines. ## ## When running the fast path as a host managing VMs, the fast path ## startup script is able to reserve additional memory stored in huge ## pages. This memory can be used by Qemu or libvirt for the virtual ## machines. ## ## The value can be: ## - auto or commented out: on a Virtual Accelerator, 4GB per socket will ## be reserved, on other products no VM memory will be reserved. ## - an integer: it represents the amount of memory in MB to reserve ## for VMs. This amount will be spread equally on all NUMA nodes. ## Example: "4096" asks to reserve 4GB for the virtual machines, distributed ## on all the NUMA nodes of the machine (2GB per node if the machine has ## 2 nodes). ## - a list of integers, representing the amount of memory in MB ## to reserve on each NUMA node. ## Example: "2048,2048" asks to reserve 2GB on node0 and 2GB on node1 ## in huge pages for the virtual machines. ## ## In expert mode (EXPERT_FPCONF=on), the parameter is mandatory and its format ## must be a list of integer, one per socket. ## Enter value [auto]: 0,1024
The configuration can be saved. It will generate the
/etc/fast-path.env
file:Fast path configuration ======================= 1 - Select fast path ports and polling cores 2 - Select a hardware crypto accelerator 3 - Advanced configuration 4 - Advanced plugin configuration 5 - Display configuration S - Save configuration and exit Q - Quit Enter selection [S]: S
libvirt does not support the cpuset isolation feature; it has to be disabled in
/etc/cpuset.env
.-#: ${CPUSET_ENABLE:=1} +: ${CPUSET_ENABLE:=0}
Start Virtual Accelerator.
root@host1:~# systemctl start virtual-accelerator.target
Restart the Open vSwitch control plane.
root@host1:~# systemctl restart openvswitch
The hugepages are allocated by Virtual Accelerator at startup and libvirt cannot detect them dynamically. libvirt must be restarted to take the hugepages into account.
root@host1:~# systemctl restart libvirtd.service
Warning
If you restart Virtual Accelerator, you must restart openvswitch and libvirt (and its VMs) as well.
Create a virtio interface to communicate with the VM. The
sockpath
argument will be used in the libvirt XML file later.root@host:~# fp-vdev add tap0 --sockpath=/tmp/pmd-vhost0 devargs: profile: endpoint sockmode: client sockname: /tmp/pmd-vhost0 txhash: l3l4 verbose: 0 driver: pmd-vhost ifname: tap0 rx_cores: all
Note
Make sure that the fast path has been started before you create hotplug ports with
fp-vdev
commandsSee also
The 6WINDGate Fast Path Managing virtual devices documentation for more information about the
fp-vdev
command.
host2
¶
Follow on host2
the same configuration steps as on host1
:
Using the configuration wizard:
select a physical port and select the fast path cores
change the amount of VM memory
save the resulting configuration file
Disable the cpuset feature in libvirt
Start or restart services:
restart Open vSwitch
start Virtual Accelerator
restart libvirt
Hosts and VMs configuration¶
VM creation on both hosts (if needed)¶
If you don’t have a VM ready, you can use a cloud image. See VM Creation to create one VM on each host with the following libvirt configuration sections:
hostname
vm1
onhost1
andvm2
onhost2
<name>vm1</name>
and
<name>vm2</name>
one
vhost-user
interface:<interface type='vhostuser'> <source type='unix' path='/tmp/pmd-vhost0' mode='server'/> <model type='virtio'/> </interface>
1048576 bytes (1GB) of memory:
<memory>1048576</memory>
and
<numa> <cell id="0" cpus="0" memory="1048576" memAccess="shared"/> </numa>
Setup memory to be on socket 1:
<numatune> <!-- adapt to set the host node where hugepages are taken --> <memory mode='strict' nodeset='1'/> </numatune>
Configuration of host1
and vm1
¶
Now that we have access to the VM, we can setup Linux with the configuration
needed for iperf
.
Inside the VM, set interface up and add address on
11.0.0.0/8
subnet.root@vm1:~# ip link set eth1 up root@vm1:~# ip addr add 11.5.0.105/8 dev eth1
On
host1
, check the name of the fast path interfaces at the end of the port description in thefp-shmem-ports -d
output.root@host1:~# fp-shmem-ports -d core freq : 2693510304 offload : enabled vxlan ports : port 4789 (set by user) port 8472 (set by user) port 0: ens5f0 mac 90:e2:ba:29:df:58 driver rte_ixgbe_pmd GRO timeout 10us RX queues: 2 (max: 128) TX queues: 2 (max: 64) RX vlan strip off RX IPv4 checksum on RX TCP checksum on RX UDP checksum on GRO on LRO off TX vlan insert on TX IPv4 checksum on TX TCP checksum on TX UDP checksum on TX SCTP checksum on TSO on port 1: tap0 mac 02:09:c0:ea:ef:37 driver pmd-vhost (args sockmode=client,sockname=/tmp/pmd-vhost0) GRO timeout 10us RX queues: 1 (max: 1) TX queues: 2 (max: 64) RX TCP checksum on RX UDP checksum on GRO on LRO on TX TCP checksum on TX UDP checksum on TSO on
Note
tap0
is the vhost port bound to the interface ofvm1
andens5f0
is the host external interface.On
host1
, set interfaces up, add an IP address to the host interface and tune its MTUroot@host1:~# ip link set ens5f0 up root@host1:~# ip link set tap0 up root@host1:~# ip addr add 10.200.0.5/24 dev ens5f0 root@host1:~# ip link set mtu 1600 dev ens5f0
Note
We need to increase MTU size so that encapsulated frames fit in without causing additional fragmentation
On
host1
, create two chained OVS bridges. The first one (br-int) connects the vhost interface, the second one (br-tun) encapsulates the traffic in a VXLAN tunnel starting at the host interface.root@host1:~# ovs-vsctl add-br br-int root@host1:~# ovs-vsctl add-port br-int patch-tun -- set Interface \ patch-tun type=patch options:peer=patch-int root@host1:~# ovs-vsctl add-br br-tun root@host1:~# ovs-vsctl add-port br-tun patch-int -- set Interface \ patch-int type=patch options:peer=patch-tun root@host1:~# ip link set up dev tap0 root@host1:~# ovs-vsctl add-port br-int tap0 root@host1:~# ovs-vsctl add-port br-tun vxlan10 -- set Interface vxlan10 \ type=vxlan options:remote_ip=10.200.0.6 options:local_ip=10.200.0.5 \ options:out_key=flow root@host1:~# ip link set up dev br-tun
Configuration of host2
and vm2
¶
Now that we have access to the VM, we can setup Linux with the configuration
needed for iperf
.
Inside the VM, set interface up and add address on
11.0.0.0/8
subnet.root@vm2:~# ip link set eth1 up root@vm2:~# ip addr add 11.6.0.106/8 dev eth1
On
host2
, check the name of the fast path interfaces at the end of the port description in thefp-shmem-ports -d
output.root@host2:~# fp-shmem-ports -d core freq : 2693520583 offload : enabled vxlan ports : port 4789 (set by user) port 8472 (set by user) port 0: ens5f0 mac 00:1b:21:74:5b:58 driver rte_ixgbe_pmd GRO timeout 10us RX queues: 2 (max: 128) TX queues: 2 (max: 64) RX vlan strip off RX IPv4 checksum on RX TCP checksum on RX UDP checksum on GRO on LRO off TX vlan insert on TX IPv4 checksum on TX TCP checksum on TX UDP checksum on TX SCTP checksum on TSO on port 1: tap0 mac 02:09:c0:88:f7:5d driver pmd-vhost (args sockmode=client,sockname=/tmp/pmd-vhost0) GRO timeout 10us RX queues: 1 (max: 1) TX queues: 2 (max: 64) RX TCP checksum on RX UDP checksum on GRO on LRO on TX TCP checksum on TX UDP checksum on TSO on
Note
tap0
is the vhost port bound to the interface ofvm2
andens5f0
is the host external interface.On
host2
, set interfaces up, add an IP address to the host interface and tune its MTUroot@host2:~# ip link set ens5f0 up root@host2:~# ip link set tap0 up root@host2:~# ip addr add 10.200.0.6/24 dev ens5f0 root@host2:~# ip link set mtu 1600 dev ens5f0
Note
We need to increase MTU size so that encapsulated frames fit in without causing additional fragmentation
On
host2
, create two chained OVS bridges. The first one (br-int
) connects the vhost interface, the second one (br-tun
) encapsulates the traffic in a VXLAN tunnel starting at the host interface.root@host2:~# ovs-vsctl add-br br-int root@host2:~# ovs-vsctl add-port br-int patch-tun -- set Interface \ patch-tun type=patch options:peer=patch-int root@host2:~# ovs-vsctl add-br br-tun root@host2:~# ovs-vsctl add-port br-tun patch-int -- set Interface \ patch-int type=patch options:peer=patch-tun root@host2:~# ip link set up dev tap0 root@host2:~# ovs-vsctl add-port br-int tap0 root@host2:~# ovs-vsctl add-port br-tun vxlan10 -- set Interface vxlan10 \ type=vxlan options:remote_ip=10.200.0.5 options:local_ip=10.200.0.6 \ options:out_key=flow root@host2:~# ip link set up dev br-tun
Testing¶
We can send traffic from vm1
to vm2
and check that each fast path switches it to
and from the VM. First, let’s do a ping to check the setup.
Reset the fast path statistics first on each host.
root@host:~# fp-cli stats-reset
Ping the
vm1
address fromvm2
.root@vm2:~# ping 11.5.0.105 PING 11.5.0.105 (11.5.0.105) 56(84) bytes of data. 64 bytes from 11.5.0.105: icmp_seq=1 ttl=64 time=5.27 ms 64 bytes from 11.5.0.105: icmp_seq=2 ttl=64 time=0.465 ms 64 bytes from 11.5.0.105: icmp_seq=3 ttl=64 time=0.442 ms [...]
During traffic, you can check that the flows are available in the kernel on each host.
root@host1:~# ovs-dpctl dump-flows recirc_id(0),tunnel(),in_port(3),eth(src=52:54:00:ee:fb:f5, dst=52:54:00:e8:0b:95),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:23, bytes:2254, used:0.045s, actions:set(tunnel( tun_id=0x0,src=10.200.0.5,dst=10.200.0.6,ttl=64,flags(df|key))),4 recirc_id(0),tunnel(tun_id=0x0,src=10.200.0.6,dst=10.200.0.5,ttl=64, flags(-df-csum+key)),in_port(4),skb_mark(0),eth(src=52:54:00:e8:0b:95, dst=52:54:00:ee:fb:f5),eth_type(0x0800), ipv4(frag=no), packets:23, bytes:3082, used:0.045s, actions:3 root@host2:~# ovs-dpctl dump-flows recirc_id(0),tunnel(),in_port(3),eth(src=52:54:00:e8:0b:95, dst=52:54:00:ee:fb:f5),eth_type(0x0800),ipv4(tos=0/0x3,frag=no), packets:17, bytes:1666, used:0.000s, actions:set(tunnel( tun_id=0x0,src=10.200.0.6,dst=10.200.0.5,ttl=64,flags(df|key))),4 recirc_id(0),tunnel(tun_id=0x0,src=10.200.0.5,dst=10.200.0.6,ttl=64, flags(-df-csum+key)),in_port(4),skb_mark(0),eth(src=52:54:00:ee:fb:f5, dst=52:54:00:e8:0b:95),eth_type(0x0800),ipv4(frag=no), packets:17, bytes:2278, used:0.000s, actions:3
The statistics are increased in the fast path, showing that the fast path processed the packets.
root@host1:~# fp-cli fp-vswitch-stats flow_not_found:7 output_ok:81 set_tunnel_id:41 root@host2:~# fp-cli fp-vswitch-stats flow_not_found:7 output_ok:85 set_tunnel_id:42
Note
The flow_not_found statistics is increased for the first packets of each flow, that are sent to Linux because they don’t match any known flow in the fast path. Linux receives the packets and sends them to the ovs-vswitchd daemon (it is the standard Linux processing). The daemon creates a flow in the OVS kernel data plane. The flow is automatically synchronized to the fast path, and the next packets of the flow are processed by the fast path.
Now that we checked the setup, we can try iperf
.
Reset the fast path statistics first on each host.
root@host1:~# fp-cli stats-reset
root@host2:~# fp-cli stats-reset
Start the
iperf
server onvm1
.root@vm1:~# iperf -s
Start
iperf
client onvm2
.root@vm2:~# iperf -c 11.5.0.105 -i 10 ------------------------------------------------------------ Client connecting to 11.5.0.105, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 11.6.0.106 port 57649 connected with 11.5.0.105 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 10.6 GBytes 9.07 Gbits/sec [ 3] 0.0-10.0 sec 10.6 GBytes 9.07 Gbits/sec
During traffic, you can check the fast path CPU usage on each host.
root@host1:~# fp-cpu-usage Fast path CPU usage: cpu: %busy cycles cycles/packet 10: 100% 538671904 3210 26: <1% 5098764 0 average cycles/packets received from NIC: 3240 (543770668/167807)
After traffic, you can check the fast path statistics on each host.
root@host1:~# fp-cli fp-vswitch-stats flow_not_found:4 output_ok:801241 set_tunnel_id:347493