4.2.1. Startup Issues

Turbo CG-NAT cannot start

Symptoms
  • systemctl status turbo shows issues

Hints
  • On Intel and Arm, check whether the configuration file is correct by looking at fast-path.sh config output for relevancy, and by checking config file syntactic correctness with fast-path.sh config -c. Follow the advice regarding deprecated options as it may become problematic in later versions. Take into account the WARNINGs in the output.

  • If you tried running the fast path and it crashed or failed along the way, some “runtime-only” files may be left unremoved. Make sure to call fast-path.sh stop before trying to start the fast path again.

  • Look for error messages either on the console or in the logs. See rsyslog and journalctl sections for details regarding what can be found in the logs.

  • Executable paths may change between two Turbo CG-NAT versions. Some shells (bash for example) keep a cache of the executable paths. After upgrading Turbo CG-NAT, if some commands are not found, you may need to start a new shell.

Hugepages fragmentation

Symptoms
  • One of the following messages appears on he console or in the logs:

    No more huge pages left for fastpath initialization
    
    EAL: Not enough memory available! Requested: <X>MB, available: <Y smaller than X>MB
    PANIC in rte_eal_init(): Cannot init memory
    
    EAL: rte_eal_common_log_init(): cannot create log_history mempool
    PANIC in rte_eal_init():
    Cannot init logs
    
    Not enough physically contiguous memory to allocate the mbuf pool on this socket (0): max_seg_size=178257920, total_mem=459276288, nb_seg=35
    Increase the number of huge pages, use larger huge pages, or reboot the machine
    PANIC in fpn_socket_mbufpool_create():
    Cannot create mbuf pool for socket 0
    
Hints
  • There is a problem with the available memory.

  • Add more memory.

  • Check the output from /proc/meminfo, especially the MemFree and HugePage_Free fields. See meminfo section for details.

    MemFree

    gives an indication of how much memory you may use for the fast path shared memory.

    HugePage_Free

    indicates how many huge pages are available for use by the fast path.

    Beware, if hugepages are fragmented, you need to allocate more or simply reboot, as the DPDK requires contiguous physical memory.

Not enough memory

Symptoms
  • The following message appears on he console or in the logs (and subsequent commands fail with similar messages):

    /usr/bin/fast-path.sh: 435: /usr/bin/fast-path.sh: Cannot fork
    /usr/bin/fast-path.sh: 668: /usr/bin/fast-path.sh: Cannot fork
    
  • The following message appears on he console or in the logs:

    ...
    EAL:   PCI memory mapped at 0x7ffae4a40000
    PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e
    Using fpn_port 0x7ffae654c000 size=150576 (0M)
    Killed
    //usr/bin/fast-path.sh: error starting //usr/bin//fp-rte. Check logs for details.
    

    At this point, the machine may have hung. Check the logs after reboot, especially if they contain something similar to:

    ...
    fp-rte[5113]: Using fp_ebtables_vr_shared=0x7ffae63c2000 size=4352 (0M)
    fp-rte[5113]: Using fp-tc-shared=0x7ffad976f000 size=524608 (0M)
    kernel: [ 1022.485264] fp-rte invoked oom-killer: gfp_mask=0x2d2, order=0, oom_score_adj=0
    kernel: [ 1022.485271] fp-rte cpuset=/ mems_allowed=0
    

    Note

    Look for error messages either on the console or in the logs. See rsyslog and journalctl sections for details regarding what can be found in the logs.

Hints
  • There is a problem with the available memory, the fast path process has been killed because available memory was getting too small. Typically, after hugepages allocation, the fast path tried to allocate memory and there was not enough free.

  • Add more memory.

  • Check the output from /proc/meminfo, especially the MemFree field. See meminfo section for details.

    MemFree

    estimates how much memory is free before starting the fast path.

1G hugepages problems

Symptoms
  • The following message appears on he console or in the logs:

    sh: echo: I/O error
    WARNING: Can not allocate 1 hugepages for fast path
             0 pages of size 1024 MB were allocated
    
Hints
  • It seems you enabled the support of 1G hugepages in the kernel boot command line (hugepagesz=1G default_hugepagesz=1G). The fast path starting script failed to allocate the required amount of hugepages.

OVA startup fails

Symptoms
  • With VMware 6.0 and vSphere desktop client, starting Turbo CG-NAT VM from OVA file fails with the following message:

    The OVF package is invalid and cannot be deployed.
    
Hints
  • Use the vSphere HTML5 client (the desktop client is deprecated).

  • Repackage the OVA file to use SHA1 hashing instead of the latest SHA256 using ovftool available at https://www.vmware.com/support/developer/ovf/.

    #  ovftool --shaAlgorithm=SHA1 /path/to/original/file.ova /path/to/new/file-sha1.ova
    

SR-IOV problems

Symptoms
  • Starting a VM (with PCI passthrough in its conf) with libvirt fails, yielding:

    error: unsupported configuration: host doesn't support passthrough of host PCI devices
    

    Your XML libvirt domain contains something like this:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <source>
        <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/>
      </source>
    </hostdev>
    
Hints
  • Your NIC and your motherboard must support SR-IOV, and the Linux kernel must have booted with appropriate options. Enable the Directed I/O parameter in the BIOS, and ensure “intel_iommu=on” is provided in the kernel command line.

Turbo CG-NAT hangs when starting with i40e devices

Symptoms
  • Starting Turbo CG-NAT with i40e devices in a VM hangs. Looking at the logs:

    Jun 22 22:15:07 dut-vm fp-rte[14244]: /usr/bin/fp-rte  --huge-dir=/dev/hugepages -n 4 -l 4-39 --socket-mem 2292 -d librte_ext_crypto_multibuffer.so -w 0000:00:04.0 -w 0000:00:05.0 -w 0000:00:06.0 --  -t c4=0/c5=0/c6=0/c7=0/c8=0/c9=0/c10=0/c11=0/c12=0/c13=0/c14=0/c15=0/c16=0/c17=0/c18=0/c19=0/c20=0/c21=0/c22=1/c23=1/c24=1/c25=1/c26=1/c27=1/c28=1/c29=1/c30=1/c31=1/c32=1/c33=1/c34=1/c35=1/c36=1/c37=1/c38=1/c39=1 --nb-mbuf 262144 -- --max-vr=16
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Detected 40 lcore(s)
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Detected 1 NUMA nodes
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Some devices want iova as va but pa will be used because.. EAL: vfio-noiommu mode configured
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: No free hugepages reported in hugepages-1048576kB
    Jun 22 22:15:07 dut-vm fp-rte[14244]: Based on DPDK 18.05.0-6WIND.0
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Probing VFIO support...
    Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: VFIO support initialized
    Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles !
    Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: PCI device 0000:00:04.0 on NUMA socket -1
    Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL:   Invalid NUMA socket, default to 0
    Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL:   probe driver: 8086:1583 net_i40e
    Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL:   using IOMMU type 8 (No-IOMMU)
    
Hints
  • The i40e hardware has a known issue with regards to INTX interrupts. A workaround has been implemented in the vfio-pci kernel driver to hide INTX support and force a fallback to MSIX. The workaround must be applied on both the VM side and the hypervisor side. Upstream patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=450744051d20

    This issue can be checked by looking at the kernel logs. Without the patch, some interrupt ends up as an orphan:

    [  219.768519] i40e 0000:83:00.0: i40e_ptp_stop: removed PHC on ens260f0
    [  223.302710] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x1d0
    [  224.810517] vfio_bar_restore: 0000:83:00.0 reset recovery - restoring bars
    [  227.330187] irq 47: nobody cared (try booting with the "irqpoll" option)
    [  227.330195] CPU: 22 PID: 0 Comm: swapper/22 Not tainted 4.4.0-127-generic #153-Ubuntu
    [  227.330197] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016
    [  227.330199]  0000000000000086 754d6f2e166f11f1 ffff88086de03e60 ffffffff814001c3
    [  227.330203]  ffff880864613e00 ffff880864613ed4 ffff88086de03e88 ffffffff810e0c33
    [  227.330205]  ffff880864613e00 0000000000000000 000000000000002f ffff88086de03ec0
    [  227.330208] Call Trace:
    [  227.330210]  <IRQ>  [<ffffffff814001c3>] dump_stack+0x63/0x90
    [  227.330225]  [<ffffffff810e0c33>] __report_bad_irq+0x33/0xc0
    [  227.330228]  [<ffffffff810e0fc7>] note_interrupt+0x247/0x290
    [  227.330232]  [<ffffffff810de0b2>] handle_irq_event_percpu+0x172/0x1e0
    [  227.330234]  [<ffffffff810de15e>] handle_irq_event+0x3e/0x60
    [  227.330237]  [<ffffffff810e154c>] handle_fasteoi_irq+0x9c/0x160
    [  227.330243]  [<ffffffff810311f3>] handle_irq+0x23/0x30
    [  227.330249]  [<ffffffff8185419b>] do_IRQ+0x4b/0xe0
    [  227.330252]  [<ffffffff8185187f>] common_interrupt+0xbf/0xbf
    [  227.330253]  <EOI>  [<ffffffff816e06b7>] ? cpuidle_enter_state+0x157/0x2d0
    [  227.330261]  [<ffffffff816e0867>] cpuidle_enter+0x17/0x20
    [  227.330265]  [<ffffffff810c72b2>] call_cpuidle+0x32/0x60
    [  227.330267]  [<ffffffff816e0849>] ? cpuidle_select+0x19/0x20
    [  227.330269]  [<ffffffff810c7576>] cpu_startup_entry+0x296/0x360
    [  227.330275]  [<ffffffff81052b02>] start_secondary+0x172/0x1b0
    [  227.330276] handlers:
    [  227.330282] [<ffffffffc01d0230>] vfio_intx_handler [vfio_pci]
    [  227.330284] Disabling IRQ #47
    

    But, with the patch, vfio-pci reports that it has hidden INTX support:

    [  215.389554] i40e 0000:83:00.0: i40e_ptp_stop: removed PHC on ens260f0
    [  224.501452] vfio-pci 0000:83:00.0: Masking broken INTx support
    [  224.501522] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x1d0
    [  226.191488] vfio_bar_restore: 0000:83:00.0 reset recovery - restoring bars