4.2.1. Startup Issues¶
Turbo CG-NAT cannot start¶
- Symptoms
systemctl status turbo
shows issues
- Hints
On Intel and Arm, check whether the configuration file is correct by looking at
fast-path.sh config
output for relevancy, and by checking config file syntactic correctness withfast-path.sh config -c
. Follow the advice regarding deprecated options as it may become problematic in later versions. Take into account the WARNINGs in the output.If you tried running the fast path and it crashed or failed along the way, some “runtime-only” files may be left unremoved. Make sure to call
fast-path.sh stop
before trying to start the fast path again.Look for error messages either on the console or in the logs. See rsyslog and journalctl sections for details regarding what can be found in the logs.
Executable paths may change between two Turbo CG-NAT versions. Some shells (bash for example) keep a cache of the executable paths. After upgrading Turbo CG-NAT, if some commands are not found, you may need to start a new shell.
Hugepages fragmentation¶
- Symptoms
One of the following messages appears on he console or in the logs:
No more huge pages left for fastpath initialization EAL: Not enough memory available! Requested: <X>MB, available: <Y smaller than X>MB PANIC in rte_eal_init(): Cannot init memory EAL: rte_eal_common_log_init(): cannot create log_history mempool PANIC in rte_eal_init(): Cannot init logs Not enough physically contiguous memory to allocate the mbuf pool on this socket (0): max_seg_size=178257920, total_mem=459276288, nb_seg=35 Increase the number of huge pages, use larger huge pages, or reboot the machine PANIC in fpn_socket_mbufpool_create(): Cannot create mbuf pool for socket 0
- Hints
There is a problem with the available memory.
Add more memory.
Check the output from
/proc/meminfo
, especially theMemFree
andHugePage_Free
fields. See meminfo section for details.MemFree
gives an indication of how much memory you may use for the fast path shared memory.
HugePage_Free
indicates how many huge pages are available for use by the fast path.
Beware, if hugepages are fragmented, you need to allocate more or simply reboot, as the DPDK requires contiguous physical memory.
Not enough memory¶
- Symptoms
The following message appears on he console or in the logs (and subsequent commands fail with similar messages):
/usr/bin/fast-path.sh: 435: /usr/bin/fast-path.sh: Cannot fork /usr/bin/fast-path.sh: 668: /usr/bin/fast-path.sh: Cannot fork
The following message appears on he console or in the logs:
... EAL: PCI memory mapped at 0x7ffae4a40000 PMD: eth_em_dev_init(): port_id 2 vendorID=0x8086 deviceID=0x100e Using fpn_port 0x7ffae654c000 size=150576 (0M) Killed //usr/bin/fast-path.sh: error starting //usr/bin//fp-rte. Check logs for details.
At this point, the machine may have hung. Check the logs after reboot, especially if they contain something similar to:
... fp-rte[5113]: Using fp_ebtables_vr_shared=0x7ffae63c2000 size=4352 (0M) fp-rte[5113]: Using fp-tc-shared=0x7ffad976f000 size=524608 (0M) kernel: [ 1022.485264] fp-rte invoked oom-killer: gfp_mask=0x2d2, order=0, oom_score_adj=0 kernel: [ 1022.485271] fp-rte cpuset=/ mems_allowed=0
Note
Look for error messages either on the console or in the logs. See rsyslog and journalctl sections for details regarding what can be found in the logs.
- Hints
There is a problem with the available memory, the fast path process has been killed because available memory was getting too small. Typically, after hugepages allocation, the fast path tried to allocate memory and there was not enough free.
Add more memory.
Check the output from
/proc/meminfo
, especially theMemFree
field. See meminfo section for details.MemFree
estimates how much memory is free before starting the fast path.
1G hugepages problems¶
- Symptoms
The following message appears on he console or in the logs:
sh: echo: I/O error WARNING: Can not allocate 1 hugepages for fast path 0 pages of size 1024 MB were allocated
- Hints
It seems you enabled the support of 1G hugepages in the kernel boot command line (
hugepagesz=1G default_hugepagesz=1G
). The fast path starting script failed to allocate the required amount of hugepages.
OVA startup fails¶
- Symptoms
With VMware 6.0 and vSphere desktop client, starting Turbo CG-NAT VM from OVA file fails with the following message:
The OVF package is invalid and cannot be deployed.
- Hints
Use the vSphere HTML5 client (the desktop client is deprecated).
Repackage the OVA file to use SHA1 hashing instead of the latest SHA256 using ovftool available at https://www.vmware.com/support/developer/ovf/.
# ovftool --shaAlgorithm=SHA1 /path/to/original/file.ova /path/to/new/file-sha1.ova
SR-IOV problems¶
- Symptoms
Starting a VM (with PCI passthrough in its conf) with libvirt fails, yielding:
error: unsupported configuration: host doesn't support passthrough of host PCI devices
Your XML libvirt domain contains something like this:
<hostdev mode='subsystem' type='pci' managed='yes'> <source> <address domain='0x0000' bus='0x83' slot='0x00' function='0x0'/> </source> </hostdev>
- Hints
Your NIC and your motherboard must support SR-IOV, and the Linux kernel must have booted with appropriate options. Enable the Directed I/O parameter in the BIOS, and ensure “intel_iommu=on” is provided in the kernel command line.
Turbo CG-NAT hangs when starting with i40e devices¶
- Symptoms
Starting Turbo CG-NAT with i40e devices in a VM hangs. Looking at the logs:
Jun 22 22:15:07 dut-vm fp-rte[14244]: /usr/bin/fp-rte --huge-dir=/dev/hugepages -n 4 -l 4-39 --socket-mem 2292 -d librte_ext_crypto_multibuffer.so -w 0000:00:04.0 -w 0000:00:05.0 -w 0000:00:06.0 -- -t c4=0/c5=0/c6=0/c7=0/c8=0/c9=0/c10=0/c11=0/c12=0/c13=0/c14=0/c15=0/c16=0/c17=0/c18=0/c19=0/c20=0/c21=0/c22=1/c23=1/c24=1/c25=1/c26=1/c27=1/c28=1/c29=1/c30=1/c31=1/c32=1/c33=1/c34=1/c35=1/c36=1/c37=1/c38=1/c39=1 --nb-mbuf 262144 -- --max-vr=16 Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Detected 40 lcore(s) Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Detected 1 NUMA nodes Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Multi-process socket /var/run/dpdk/rte/mp_socket Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Some devices want iova as va but pa will be used because.. EAL: vfio-noiommu mode configured Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: No free hugepages reported in hugepages-1048576kB Jun 22 22:15:07 dut-vm fp-rte[14244]: Based on DPDK 18.05.0-6WIND.0 Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: Probing VFIO support... Jun 22 22:15:07 dut-vm fp-rte[14244]: EAL: VFIO support initialized Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: WARNING: cpu flags constant_tsc=yes nonstop_tsc=no -> using unreliable clock cycles ! Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: PCI device 0000:00:04.0 on NUMA socket -1 Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: Invalid NUMA socket, default to 0 Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: probe driver: 8086:1583 net_i40e Jun 22 22:15:08 dut-vm fp-rte[14244]: EAL: using IOMMU type 8 (No-IOMMU)
- Hints
The i40e hardware has a known issue with regards to INTX interrupts. A workaround has been implemented in the vfio-pci kernel driver to hide INTX support and force a fallback to MSIX. The workaround must be applied on both the VM side and the hypervisor side. Upstream patch: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=450744051d20
This issue can be checked by looking at the kernel logs. Without the patch, some interrupt ends up as an orphan:
[ 219.768519] i40e 0000:83:00.0: i40e_ptp_stop: removed PHC on ens260f0 [ 223.302710] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x1d0 [ 224.810517] vfio_bar_restore: 0000:83:00.0 reset recovery - restoring bars [ 227.330187] irq 47: nobody cared (try booting with the "irqpoll" option) [ 227.330195] CPU: 22 PID: 0 Comm: swapper/22 Not tainted 4.4.0-127-generic #153-Ubuntu [ 227.330197] Hardware name: Intel Corporation S2600CWR/S2600CWR, BIOS SE5C610.86B.01.01.0019.101220160604 10/12/2016 [ 227.330199] 0000000000000086 754d6f2e166f11f1 ffff88086de03e60 ffffffff814001c3 [ 227.330203] ffff880864613e00 ffff880864613ed4 ffff88086de03e88 ffffffff810e0c33 [ 227.330205] ffff880864613e00 0000000000000000 000000000000002f ffff88086de03ec0 [ 227.330208] Call Trace: [ 227.330210] <IRQ> [<ffffffff814001c3>] dump_stack+0x63/0x90 [ 227.330225] [<ffffffff810e0c33>] __report_bad_irq+0x33/0xc0 [ 227.330228] [<ffffffff810e0fc7>] note_interrupt+0x247/0x290 [ 227.330232] [<ffffffff810de0b2>] handle_irq_event_percpu+0x172/0x1e0 [ 227.330234] [<ffffffff810de15e>] handle_irq_event+0x3e/0x60 [ 227.330237] [<ffffffff810e154c>] handle_fasteoi_irq+0x9c/0x160 [ 227.330243] [<ffffffff810311f3>] handle_irq+0x23/0x30 [ 227.330249] [<ffffffff8185419b>] do_IRQ+0x4b/0xe0 [ 227.330252] [<ffffffff8185187f>] common_interrupt+0xbf/0xbf [ 227.330253] <EOI> [<ffffffff816e06b7>] ? cpuidle_enter_state+0x157/0x2d0 [ 227.330261] [<ffffffff816e0867>] cpuidle_enter+0x17/0x20 [ 227.330265] [<ffffffff810c72b2>] call_cpuidle+0x32/0x60 [ 227.330267] [<ffffffff816e0849>] ? cpuidle_select+0x19/0x20 [ 227.330269] [<ffffffff810c7576>] cpu_startup_entry+0x296/0x360 [ 227.330275] [<ffffffff81052b02>] start_secondary+0x172/0x1b0 [ 227.330276] handlers: [ 227.330282] [<ffffffffc01d0230>] vfio_intx_handler [vfio_pci] [ 227.330284] Disabling IRQ #47
But, with the patch, vfio-pci reports that it has hidden INTX support:
[ 215.389554] i40e 0000:83:00.0: i40e_ptp_stop: removed PHC on ens260f0 [ 224.501452] vfio-pci 0000:83:00.0: Masking broken INTx support [ 224.501522] vfio_ecap_init: 0000:83:00.0 hiding ecap 0x19@0x1d0 [ 226.191488] vfio_bar_restore: 0000:83:00.0 reset recovery - restoring bars