2.2.4. Install as a VM using KVM

This chapter explains how to start a VM using KVM.

First, you should have a look at the hypervisor prerequisites section.

After the prerequisites are completed, you have two choices:

Note

Most of this chapter was written for an Ubuntu 16.04 hypervisor. There should be no technical problem when using another distribution, only some commands might vary.

Hypervisor prerequisites

We will not detail how to install a linux distribution here. Once it is installed, some tasks must be completed to configure the distribution into an hypervisor.

  1. The kvm and kvm_intel modules have to be inserted:

    # lsmod | grep kvm
    kvm_intel             172032  0
    kvm                   544768  1 kvm_intel
    
  2. qemu-kvm, libvirt and virt-install have to be installed:

    # apt-get install -y qemu-kvm
    # apt-get install -y virtinst libvirt-bin
    

    or

    # yum install -y qemu-kvm
    # yum install -y virt-install libvirtd
    

VM with virtual NICs

In this example, the VM will have three interfaces:

  • one management interface on the libvirt default virtual network using NAT forwarding,

  • two data plane interfaces on top of the host’s interfaces using bridged networking to connect the VM to the LAN.

See also

the libvirt networking documentation for more information about networking with KVM.

  1. On the host, set interfaces up.

    # ip link set eth1 up
    # ip link set eth2 up
    
  2. On the host, create two Linux bridges, each containing one physical interface.

    # brctl addbr br0
    # brctl addif br0 eth1
    # ip link set br0 up
    # brctl addbr br1
    # brctl addif br1 eth2
    # ip link set br1 up
    
  3. To boot Turbo CG-NAT in libvirt as a guest VM, use:

    # cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2
    # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \
                   --os-type linux --cpu host --network=default,model=e1000 \
                   --ram 6144 --noautoconsole --import \
                   --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \
                   --network bridge=br0,model=e1000 --network bridge=br1,model=e1000
    
  4. Connect to the VM:

    # virsh console vm1
    (...)
    Login:
    

The next step is to perform your first configuration.

VM with physical NICs

This section details how to start Turbo CG-NAT with dedicated physical NICs.

Using dedicated NICs requires some work which is detailed in Hypervisor mandatory prerequisites.

Once the hypervisor is configured properly, two technologies are available:

  • whole NICs are dedicated to Turbo CG-NAT, see Passthrough mode, simpler configuration, but only one VM can use each NIC

  • portions of NICs are dedicated to Turbo CG-NAT, see SR-IOV mode, to have more VMs running on the hypervisor

For production setups, you might want to consider checking Optimize performance in virtual environment to get the best performance.

Hypervisor mandatory prerequisites

enable Intel VT-d

Intel VT-d stands for “Intel Virtualization Technology for Directed I/O”. It is needed to give a physical NIC to a VM. To enable it:

  • it usually has to be enabled from the BIOS. The name of this feature can differ from one hardware to the other, we advise you to check your hardware documentation to enable it.

  • it has to be enabled also in the kernel, by adding intel_iommu=on iommu=pt in the kernel command line.

To do so, run:

# echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_iommu=on iommu=pt"' \
 >> /etc/default/grub
# update-grub2
# reboot

You can check the boot logs at next boot to verify that Intel VT-d is properly enabled.

# dmesg |grep "Intel(R) Virtualization Technology for Directed I/O"
[    1.391229] DMAR: Intel(R) Virtualization Technology for Directed I/O

hugepages

For performance reasons, the memory used by the VMs that will harbor Turbo CG-NAT must be reserved in hugepages.

Note

A hugepage is a page that addresses more memory than the usual 4KB. Accessing a hugepage is more efficient than accessing a regular memory page. Its default size is 2MB.

hugeadm can be used to managed hugepages. It is part of the hugepages deb package and libhugetlbfs-utils rpm package.

To see if your system already has hugepages available, and which sizes are supported, do:

# hugeadm --pool-list
      Size  Minimum  Current  Maximum  Default
   2097152        0        0        0        *
1073741824        0        0        0

On this system, 2MB and 1GB pages are supported.

If your hardware has several sockets, for performance reason, the memory should be allocated on the same node as the interfaces that will be dedicated to the Turbo CG-NAT VM.

  1. numactl can show which memory node should be chosen for a particular interface. Look for membind in the following command output. This NIC is on memory node 1.

    # numactl -m netdev:ens4f0 --show
    policy: bind
    preferred node: 1
    physcpubind: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
    cpubind: 0 1
    nodebind: 0 1
    membind: 1
    
  2. Add 6 1GB hugepages for one Turbo CG-NAT VM to NUMA node 1. You should add this command to a custom startup script to make it persistent.

    # echo 6 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
    
  3. Check that the pages were allocated

    # hugeadm --pool-list
          Size  Minimum  Current  Maximum  Default
       2097152        0        0        0        *
    1073741824        6        6        6
    

Passthrough mode

With this configuration, the Turbo CG-NAT VM will get dedicated interfaces.

The passthrough mode is only available if the hypervisor’s hardware supports Intel VT-d, and if it is enabled (see enable Intel VT-d).

  1. You must first find the pci id of the interfaces that will be dedicated to the Turbo CG-NAT VM.

    # lspci |grep Ethernet
    03:00.0 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T
    03:00.1 Ethernet controller: Intel Corporation Ethernet Connection X552/X557-AT 10GBASE-T
    05:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
    05:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter (rev 01)
    07:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
    07:00.1 Ethernet controller: Intel Corporation I350 Gigabit Network Connection (rev 01)
    
  2. Then use virt-install to spawn the VM, specifying one host-device argument for each device that you want to dedicate. In this example, we dedicate 03:00.0 and 03:00.1.

    # cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2
    # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \
                   --os-type linux --cpu host --network=default,model=e1000 \
                   --ram 6144 --noautoconsole \
                   --import --memorybacking hugepages=yes \
                   --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \
                   --host-device 03:00.0 --host-device 03:00.1
    
  3. Connect to the VM:

    # virsh console vm1
    (...)
    Login:
    

To get the best performance, the VM CPUs should be associated to physical CPUs. This is called pinning, and is described in CPU pinning.

The next step is to perform your first configuration.

SR-IOV mode

SR-IOV enables an Ethernet port to appear as multiple, separate, physical devices called Virtual Functions (VF). You will need compatible hardware, and Intel VT-d configured. The traffic coming from each VF can not be seen by the other VFs. The performance is almost as good as the performance in passthrough mode.

Being able to split an Ethernet port can increase the VM density on the hypervisor compared to passthrough mode.

In this configuration, the Turbo CG-NAT VM will get Virtual Functions.

  1. First check if the network interface that you want to use supports SR-IOV and how much VFs can be configured. Here we check for eno1 interface.

    # lspci -vvv -s $(ethtool -i eno1 | grep bus-info | awk -F': ' '{print $2}') | grep SR-IOV
             Capabilities: [160 v1] Single Root I/O Virtualization (SR-IOV)
    # lspci -vvv -s $(ethtool -i eno1 | grep bus-info | awk -F': ' '{print $2}') | grep VFs
                 Initial VFs: 64, Total VFs: 64, Number of VFs: 0, Function Dependency Link: 00
    
  2. Then add VFs, and check that those VFs were created. You should add this command to a custom startup script to make it persistent.

    # echo 2 > /sys/class/net/eno1/device/sriov_numvfs
    # lspci | grep Ethernet | grep Virtual
    03:10.0 Ethernet controller: Intel Corporation Ethernet Connection X552 Virtual Function
    03:10.2 Ethernet controller: Intel Corporation Ethernet Connection X552 Virtual Function
    
  3. You need to set eno1 up so that VFs are properly detected in the guest VM.

    # ip link set eno1 up
    
  4. Then use virt-install to spawn the VM, specifying one host-device argument for each VF that you want to give. In this example, we give the VF 03:10.0 to Turbo CG-NAT.

    # cp turbo-cgnat-ee.qcow2 /var/lib/libvirt/images/vm1.qcow2
    # virt-install --name vm1 --vcpus=3,sockets=1,cores=3,threads=1 \
                   --os-type linux --cpu host --network=default,model=e1000 \
                   --ram 6144 --noautoconsole --import \
                   --memorybacking hugepages=yes \
                   --disk /var/lib/libvirt/images/vm1.qcow2,device=disk,bus=virtio \
                   --host-device 03:10.0
    
  5. Connect to the VM:

    # virsh console vm1
    (...)
    Login:
    

To get the best performance, the VM CPUs should be associated to physical CPUs. This is called pinning, and is described in CPU pinning.

The next step is to perform your first configuration.

Optimize performance in virtual environment

To get good performance, Turbo CG-NAT needs dedicated resources. It includes:

  • NICs

  • CPUs

The first thing to do is to identify the resources that will be dedicated. This can be done in the Identifying hardware resources section.

Then, all the resources must be properly isolated, and configured, see Isolating and configuring hardware resources.

Identifying hardware resources

resource inventory

Before identifying the resources that will be dedicated to the Turbo CG-NAT VM, you need to know which NICs and CPUs are available.

It can be done using lstopo, which is part of the hwloc package.

# lstopo -p --merge
Machine (31GB total)
  NUMANode P#0 (16GB)
    Core P#0
      PU P#0
      PU P#20
    Core P#1
      PU P#1
      PU P#21
(...)
    Core P#12
      PU P#9
      PU P#29
    HostBridge P#0
      PCIBridge
        PCI 1000:005b
      PCIBridge
        PCI 15b3:1013
        PCI 15b3:1013
          Net "ens1f1"
      PCIBridge
        PCI 8086:1d6b
      PCIBridge
        PCI 8086:1521
          Net "mgmt0"
        PCI 8086:1521
          Net "enp5s0f1"
        PCI 8086:1521
          Net "enp5s0f2"
        PCI 8086:1521
          Net "enp5s0f3"
      PCIBridge
        PCI 102b:0522
      PCI 8086:1d00
        Block(Disk) "sda"
      PCI 8086:1d08
  NUMANode P#1 (16GB)
    Core P#0
      PU P#10
      PU P#30
    Core P#1
      PU P#11
      PU P#31
(...)
    Core P#12
      PU P#19
      PU P#39
    HostBridge P#2
      PCIBridge
        PCI 8086:1583
        PCI 8086:1583
      PCIBridge
        PCI 8086:1583
          Net "ens4f0"
        PCI 8086:1583

On this machine:

  • logical CPUs 0 to 9, and ens1f1, mgmt0, enp5s0f1, enp5s0f2, and enp5s0f1 interfaces use NUMA node 0

  • logical CPUs 10 to 19, and the ens4f0 interface use NUMA node 1

Note

NUMA (Non-uniform memory access) is a memory design, in which a hardware resource can access local memory faster than non-local memory. The memory is organized into several NUMA nodes.

resource dedication

Now that you identified your hardware, you can select which NICs and CPUs will be dedicated.

There are some constraints:

  • we leave the first cpu for Linux

  • CPUs must be taken on the same node as NICs

  • crossing NUMA nodes costs performance, so all NICs should be taken on the same node

We recommend to start with a few CPUs, and increase when the setup is functional if needed. The example in this chapter use 3 virtual CPUs.

Isolating and configuring hardware resources

CPU isolation

The CPUs that will be dedicated to the Turbo CG-NAT VM need to be properly isolated from other processes. The more reliable way to achieve this is to isolate the CPUs at boot time, on the kernel command line, using the isolcpus and rcu_nocbs directives. For instance, adding isolcpus=1-12,29-40 rcu_nocbs=1-12,29-40 will isolate CPUs 1 to 12 and 29 to 40. It can be added to the kernel command line by doing:

# echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX isolcpus=1-12,29-40 rcu_nocbs=1-12,29-40"' >> /etc/default/grub
# update-grub2
# reboot

CPU pinning

After the vm is created, you can use virsh vcpupin vm1 vm-cpu cpu to do the one-to-one pinning, using the isolated CPUs. The CPUs should be taken in the list of dedicated CPUs obtained in Identifying hardware resources. The setup is persistent.

For instance, the next commands will pin:

  • virtual CPU 0 and CPU 2,

  • virtual CPU 1 and CPU 10,

  • virtual CPU 2 and CPU 4

# virsh vcpupin vm1 0 2
# virsh vcpupin vm1 1 10
# virsh vcpupin vm1 2 4

CPU configuration

The hypervisor CPUs have to be configured for several reasons.

  1. To get stable performance, it is better to disable intel_pstate from the kernel command line:

    # echo 'GRUB_CMDLINE_LINUX="$GRUB_CMDLINE_LINUX intel_pstate=disable"' >> /etc/default/grub
    # update-grub2
    # reboot
    
  2. To get better performance, the CPUs should use the performance governor. You should add this command to a custom startup script to make it persistent.

    # cpupower set -b 0
    # cpupower frequency-set -g performance
    

For persistent configuration, the previous commands can be added to a custom startup script.

IRQ affinities configuration

Having IRQ triggered on the CPUs that are dedicated to the Turbo CG-NAT VM can result in a few packets lost from time to time. If you don’t notice this problem during testing, you don’t need to take care of this step.

  1. To do so, first ensure that the irqbalance package is removed.

    # apt-get remove -y irqbalance
    

    or

    # yum remove -y irqbalance
    
  2. Then run this script:

    for file in $(ls /proc/irq)
    do
       if [ -f /proc/irq/$file/smp_affinity_list ]; then
          echo "irq: $file"
          echo 0-4,7 > /proc/irq/$file/smp_affinity_list
          mask=$(cat /proc/irq/$file/smp_affinity)
       fi
    done
    echo $mask > /proc/irq/default_smp_affinity
    

0-4,7 should be changed to the list of CPUs that are not dedicated to the Turbo CG-NAT VM.

For persistent configuration, the previous commands can be added to a custom startup script.