2.2.4. Install as a VM using KVM¶
This chapter explains how to start a VM using KVM.
First, you should have a look at the hypervisor prerequisites section.
After the prerequisites are completed, you have two choices:
a simple configuration to try Turbo CG-NAT CLI using a VM with virtual NICs
a more complex configuration with good performance using a VM with physical NICs
Note
Most of this chapter was written for an Ubuntu 16.04 hypervisor. There should be no technical problem when using another distribution, only some commands might vary.
Hypervisor prerequisites¶
We will not detail how to install a linux distribution here. Once it is installed, some tasks must be completed to configure the distribution into an hypervisor.
The
kvm
andkvm_intel
modules have to be inserted:# lsmod | grep kvm kvm_intel 172032 0 kvm 544768 1 kvm_intel
qemu-kvm
,libvirt
andvirt-install
have to be installed:# apt-get install -y qemu-kvm # apt-get install -y virtinst libvirt-bin
or
# yum install -y qemu-kvm # yum install -y virt-install libvirtd
VM with virtual NICs¶
In this example, the VM will have three interfaces:
one management interface on the libvirt default virtual network using NAT forwarding,
two data plane interfaces on top of the host’s interfaces using bridged networking to connect the VM to the LAN.
See also
the libvirt networking documentation for more information about networking with KVM.
On the host, set interfaces up.
# ip link set eth1 up # ip link set eth2 up
On the host, create two Linux bridges, each containing one physical interface.
# brctl addbr br0 # brctl addif br0 eth1 # ip link set br0 up # brctl addbr br1 # brctl addif br1 eth2 # ip link set br1 up
To boot Turbo CG-NAT in libvirt as a guest VM, use:
# cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2 # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \ --os-type linux --cpu host --network=default,model=e1000 \ --ram 6144 --noautoconsole --import \ --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \ --network bridge=br0,model=e1000 --network bridge=br1,model=e1000
Connect to the VM:
# virsh console vm1 (...) Login:
The next step is to perform your first configuration.
VM with physical NICs¶
This section details how to start Turbo CG-NAT with dedicated physical NICs.
Using dedicated NICs requires some work which is detailed in Hypervisor mandatory prerequisites.
Once the hypervisor is configured properly, two technologies are available:
whole NICs are dedicated to Turbo CG-NAT, see Passthrough mode, simpler configuration, but only one VM can use each NIC
portions of NICs are dedicated to Turbo CG-NAT, see SR-IOV mode, to have more VMs running on the hypervisor
For production setups, you might want to consider checking Optimize performance in virtual environment to get the best performance.
Hypervisor mandatory prerequisites¶
enable Intel VT-d¶
Intel VT-d stands for “Intel Virtualization Technology for Directed I/O”. It is needed to give a physical NIC to a VM. To enable it:
it usually has to be enabled from the BIOS. The name of this feature can differ from one hardware to the other, we advise you to check your hardware documentation to enable it.
it has to be enabled also in the kernel, by adding
intel_iommu=on iommu=pt
in the kernel command line.
To do so, run:
# echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on iommu=pt"' \
>> /etc/default/grub
# update-grub2
# reboot
You can check the boot logs at next boot to verify that Intel VT-d is properly enabled.
# dmesg |grep "Intel(R) Virtualization Technology for Directed I/O"
[ 1.391229] DMAR: Intel(R) Virtualization Technology for Directed I/O
hugepages¶
For performance reasons, the memory used by the VMs that will harbor Turbo CG-NAT must be reserved in hugepages.
Note
A hugepage is a page that addresses more memory than the usual 4KB. Accessing a hugepage is more efficient than accessing a regular memory page. Its default size is 2MB.
hugeadm
can be used to managed hugepages. It is part of the hugepages
deb
package and libhugetlbfs-utils
rpm package.
To see if your system already has hugepages available, and which sizes are supported, do:
# hugeadm --pool-list
Size Minimum Current Maximum Default
2097152 0 0 0 *
1073741824 0 0 0
On this system, 2MB and 1GB pages are supported.
If your hardware has several sockets, for performance reason, the memory should be allocated on the same node as the interfaces that will be dedicated to the Turbo CG-NAT VM.
numactl
can show which memory node should be chosen for a particular interface. Look formembind
in the following command output. This NIC is on memory node 1.# numactl -m netdev:ens4f0 --show policy: bind preferred node: 1 physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 cpubind: 0 1 nodebind: 0 1 membind: 1
Add 6 1GB hugepages for one Turbo CG-NAT VM to NUMA node 1. You should add this command to a custom startup script to make it persistent.
# echo 6 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
Check that the pages were allocated
# hugeadm --pool-list Size Minimum Current Maximum Default 2097152 0 0 0 * 1073741824 6 6 6
Passthrough mode¶
With this configuration, the Turbo CG-NAT VM will get dedicated interfaces.
The passthrough mode is only available if the hypervisor’s hardware supports Intel VT-d, and if it is enabled (see enable Intel VT-d).
You must first find the pci id of the interfaces that will be dedicated to the Turbo CG-NAT VM.
# lspci |grep Ethernet 03:00.0 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T 03:00.1 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T 05:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) 05:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01) 07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01) 07:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
Then use virt-install to spawn the VM, specifying one
host-device
argument for each device that you want to dedicate. In this example, we dedicate03:00.0
and03:00.1
.# cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2 # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \ --os-type linux --cpu host --network=default,model=e1000 \ --ram 6144 --noautoconsole \ --import --memorybacking hugepages=yes \ --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \ --host-device 03:00.0 --host-device 03:00.1
Connect to the VM:
# virsh console vm1 (...) Login:
To get the best performance, the VM CPUs should be associated to physical CPUs. This is called pinning, and is described in CPU pinning.
The next step is to perform your first configuration.
SR-IOV mode¶
SR-IOV enables an Ethernet port to appear as multiple, separate, physical devices called Virtual Functions (VF). You will need compatible hardware, and Intel VT-d configured. The traffic coming from each VF can not be seen by the other VFs. The performance is almost as good as the performance in passthrough mode.
Being able to split an Ethernet port can increase the VM density on the hypervisor compared to passthrough mode.
In this configuration, the Turbo CG-NAT VM will get Virtual Functions.
First check if the network interface that you want to use supports SR-IOV and how much VFs can be configured. Here we check for
eno1
interface.# lspci -vvv -s $(ethtool -i eno1 | grep bus-info | awk -F': ' '{print $2}') | grep SR-IOV Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV) # lspci -vvv -s $(ethtool -i eno1 | grep bus-info | awk -F': ' '{print $2}') | grep VFs Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
Then add VFs, and check that those VFs were created. You should add this command to a custom startup script to make it persistent.
# echo 2 > /sys/class/net/eno1/device/sriov_numvfs # lspci | grep Ethernet | grep Virtual 03:10.0 Ethernet controller: Intel Corporation Ethernet Connection X552 Virtual Function 03:10.2 Ethernet controller: Intel Corporation Ethernet Connection X552 Virtual Function
You need to set
eno1
up so that VFs are properly detected in the guest VM.# ip link set eno1 up
Then use virt-install to spawn the VM, specifying one
host-device
argument for each VF that you want to give. In this example, we give the VF03:10.0
to Turbo CG-NAT.# cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2 # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \ --os-type linux --cpu host --network=default,model=e1000 \ --ram 6144 --noautoconsole --import \ --memorybacking hugepages=yes \ --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \ --host-device 03:10.0
Connect to the VM:
# virsh console vm1 (...) Login:
To get the best performance, the VM CPUs should be associated to physical CPUs. This is called pinning, and is described in CPU pinning.
The next step is to perform your first configuration.
Optimize performance in virtual environment¶
To get good performance, Turbo CG-NAT needs dedicated resources. It includes:
NICs
CPUs
The first thing to do is to identify the resources that will be dedicated. This can be done in the Identifying hardware resources section.
Then, all the resources must be properly isolated, and configured, see Isolating and configuring hardware resources.
Identifying hardware resources¶
resource inventory¶
Before identifying the resources that will be dedicated to the Turbo CG-NAT VM, you need to know which NICs and CPUs are available.
It can be done using lstopo
, which is part of the hwloc
package.
# lstopo -p --merge
Machine (31GB total)
NUMANode P#0 (16GB)
Core P#0
PU P#0
PU P#20
Core P#1
PU P#1
PU P#21
(...)
Core P#12
PU P#9
PU P#29
HostBridge P#0
PCIBridge
PCI 1000:005b
PCIBridge
PCI 15b3:1013
PCI 15b3:1013
Net "ens1f1"
PCIBridge
PCI 8086:1d6b
PCIBridge
PCI 8086:1521
Net "mgmt0"
PCI 8086:1521
Net "enp5s0f1"
PCI 8086:1521
Net "enp5s0f2"
PCI 8086:1521
Net "enp5s0f3"
PCIBridge
PCI 102b:0522
PCI 8086:1d00
Block(Disk) "sda"
PCI 8086:1d08
NUMANode P#1 (16GB)
Core P#0
PU P#10
PU P#30
Core P#1
PU P#11
PU P#31
(...)
Core P#12
PU P#19
PU P#39
HostBridge P#2
PCIBridge
PCI 8086:1583
PCI 8086:1583
PCIBridge
PCI 8086:1583
Net "ens4f0"
PCI 8086:1583
On this machine:
logical CPUs 0 to 9, and ens1f1, mgmt0, enp5s0f1, enp5s0f2, and enp5s0f1 interfaces use NUMA node 0
logical CPUs 10 to 19, and the ens4f0 interface use NUMA node 1
Note
NUMA (Non-uniform memory access) is a memory design, in which a hardware resource can access local memory faster than non-local memory. The memory is organized into several NUMA nodes.
resource dedication¶
Now that you identified your hardware, you can select which NICs and CPUs will be dedicated.
There are some constraints:
we leave the first cpu for Linux
CPUs must be taken on the same node as NICs
crossing NUMA nodes costs performance, so all NICs should be taken on the same node
We recommend to start with a few CPUs, and increase when the setup is functional if needed. The example in this chapter use 3 virtual CPUs.
Isolating and configuring hardware resources¶
CPU isolation¶
The CPUs that will be dedicated to the Turbo CG-NAT VM need to be properly
isolated from other processes. The more reliable way to achieve this is to
isolate the CPUs at boot time, on the kernel command line, using the
isolcpus
and rcu_nocbs
directives. For instance, adding isolcpus=1-12,29-40
rcu_nocbs=1-12,29-40
will isolate CPUs 1 to 12 and 29 to 40. It can be added
to the kernel command line by doing:
# echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX isolcpus=1-12,29-40 rcu_nocbs=1-12,29-40"' >> /etc/default/grub
# update-grub2
# reboot
CPU pinning¶
After the vm is created, you can use virsh vcpupin vm1 vm-cpu cpu
to do the
one-to-one pinning, using the isolated CPUs. The CPUs should be taken in
the list of dedicated CPUs obtained in
Identifying hardware resources. The setup is persistent.
For instance, the next commands will pin:
virtual CPU 0 and CPU 2,
virtual CPU 1 and CPU 10,
virtual CPU 2 and CPU 4
# virsh vcpupin vm1 0 2
# virsh vcpupin vm1 1 10
# virsh vcpupin vm1 2 4
CPU configuration¶
The hypervisor CPUs have to be configured for several reasons.
To get stable performance, it is better to disable intel_pstate from the kernel command line:
# echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_pstate=disable"' >> /etc/default/grub # update-grub2 # reboot
To get better performance, the CPUs should use the performance governor. You should add this command to a custom startup script to make it persistent.
# cpupower set -b 0 # cpupower frequency-set -g performance
For persistent configuration, the previous commands can be added to a custom startup script.
IRQ affinities configuration¶
Having IRQ triggered on the CPUs that are dedicated to the Turbo CG-NAT VM can result in a few packets lost from time to time. If you don’t notice this problem during testing, you don’t need to take care of this step.
To do so, first ensure that the
irqbalance
package is removed.# apt-get remove -y irqbalance
or
# yum remove -y irqbalance
Then run this script:
for file in $(ls /proc/irq) do if [ -f /proc/irq/$file/smp_affinity_list ]; then echo "irq: $file" echo 0-4,7 > /proc/irq/$file/smp_affinity_list mask=$(cat /proc/irq/$file/smp_affinity) fi done echo $mask > /proc/irq/default_smp_affinity
0-4,7 should be changed to the list of CPUs that are not dedicated to the Turbo CG-NAT VM.
For persistent configuration, the previous commands can be added to a custom startup script.