Usage

FPN-SDK Add-on for DPDK automatically starts when you start the fast path with the following script:

# fast-path.sh start

See also

For more information on how to start the fast path, see the Fast Path Baseline documentation.

Providing options

The FPN-SDK can take several arguments on the fast path command line. Most of the usual options are automatically added by the fast path start script while parsing the fast-path.env configuration file.

The configuration variables EAL_OPTIONS (for DPDK EAL options) and FPNSDK_OPTIONS (for FPN-SDK options) of the fast path configuration file may contain additional options that are described here.

Useful DPDK EAL options

Here are the most useful EAL options to set in the EAL_OPTIONS variable:

--log-level=<LOGLEVEL>

Available since dpdk-1.8.0. Optional. Set the dpdk log level. When set to n, all dpdk drivers messages with a log level equal or below n are printed. LOGLEVEL can have one of the following values:

  • 1: EMERG System is unusable.

  • 2: ALERT Action must be taken immediately.

  • 3: CRIT Critical conditions.

  • 4: ERR Error conditions.

  • 5: WARNING Warning conditions.

  • 6: NOTICE Normal but significant condition.

  • 7: INFO Informational.

  • 8: DEBUG Debug-level messages.

Any other EAL option can be passed by modifying the EAL_OPTIONS variable.

See also

For the full list of options, see the DPDK documentation.

FPN-SDK options

Here are the least common FPN-SDK options to set in the FPNSDK_OPTIONS variable:

Log options

Parameters

--logmode=[console|syslog]

Optional. Specify where the log of the fast path should be display: console or syslog. By default, the log are displayed with syslog.

--debug-rpc

If specified, enable fpn-rpc debug logs.

--no-shm-hugetlb

If specified, do not attempt to map shared memory in hugepages.

Vlan stripping option

Parameters

--vlan-strip

Optional. Enable the VLAN header stripping feature on incoming frames if supported by the hardware. By default, vlan stripping feature is disabled.

Alternately, you can specify : ${VLAN_STRIP:=on} in the configuration file.

Port options

Parameters

--nb-rxq=[all|phys|virt]:[number of rx queues]

Optional. Specify the number of queues per port for each device type. Useful when using automatic mapping of cores to ports. By default, the maximum of queues is allocated to a port. Cannot be used together with -t.

The different types of device that can be configured are phys (i.e. any physical network device like Intel or Mellanox NICS), virt (i.e. any virtual network device like vhost-user ports).

Core / Port binding

-t [<core number>=<port number>/<core number>=<port number>:<port number>]
core number

Fast path logical core number preceded by c.

port number

Number of port to poll. For each occurrence, one RX queue is created.

Specify how many RX queues logical cores poll on ports. Unspecified logical cores and ports are idle.

It is preferred to set the core / port binding in the fast path configuration file using the option : ${CORE_PORT_MAPPING:=<value>}.

If this parameter is not set, each core polls all the ports on the same numa socket. If there is not enough RX queues, only a subset of these cores will poll the port. If the maximal number of TX queues is smaller than the number of fast path cores, the port is configured with only one shared TX queue. The total number of queues per port must not exceed the NIC’s hardware limit, else only shared queue will be used.

The hardware is responsible for distributing flows over queues (for instance, via RSS on Intel or Arm platforms).

This parameter overrides the --nb-rxq options described above.

Offloads

Description

To support offload features such as TSO or L4 checksum offloading to the NIC, or forwarding offload information from a guest to the NIC through a virtual interface, you must enable offloading in the fast path. You can then tune the offload features more precisely using ethtool -K <iface> <feature> on|off.

--offload

Enable the offload feature in the fast path.

Example

###########################
##### FPN-SDK OPTIONS #####
###########################

:${FPNSDK_OPTIONS:=--offload}

TX scatter

Parameters

--tx-scatter

Optional. Enable the TX scatter feature if it is supported by the hardware. By default, TX scatter is disabled. If offload feature is enabled, TX scatter feature is enabled too.

Cryptography options

Description

You can tune the maximum available cryptographic sessions and the number of buffers allocated for cryptography.

--crypto-max-sessions

Tune maximum available cryptographic sessions. Cryptography code will reserve space to store sessions in pools. If we are doing IPsec, we need one session per sa.

--crypto-buffers

Tune number of buffers allocated for cryptography. Crypto library will allocate buffers from pools to store per buffer crypto parameters. This parameter is the size of the buffer pool.

Example

###########################
##### FPN-SDK OPTIONS #####
###########################

:${FPNSDK_OPTIONS:= --crypto-max-sessions=8192 --crypto-buffers=32768}

Cryptographic offloading mask

Description

Specify which fast path cores are able to do cryptographic operations for other cores. This cryptographic offloading is done only between core on the same numa node. By default crypto offloading is done only for decrypt operations and to cores that are not receiving traffic at this moment.

-i <crypto_offloading mask>

Specify the fast path cores available for crypto offloading. By default, all cores specified in fp_mask are used. The format of the mask can be a list of cpus or a mask starting with 0x. Two specific strings are also available: none to disable the crypto offloading feature and all (the default value). It is preferred to set the cryptographic offloading mask in the fast path configuration file using the option : ${CRYPTO_OFFLOAD_MASK:=<value>}.

Example

: ${CRYPTO_OFFLOAD_MASK:=none}
  • Statistics of offloaded cryptographic operations can be retrieved with:

    crypto-offload-stats
    
  • It is possible to enable/disable the cryptographic offload for packets encryption. Be aware that enabling encyption offload for an IPsec tunnel that aggregates many flows managed by several fast path cores can cause IPsec errors due to the anti-replay window.

    crypto-offload-encrypt-set on|off
    
  • It is possible to disable the cryptographic offload for small packets by setting the threshold of minimal size of data

    crypto-offload-threshold-set <min>
    
    <min>

    Minimal size of data to offload cryptographic operations. By default any packet can be offloaded (size set to 0)

  • To avoid to send cryptographic offloading to core that receives high traffic, it is possible to disable the cryptographic offload to overloaded cores during a configurable time period.

    crypto-offload-timer-set <time>
    
    <time>

    Time, in microseconds, during an overloaded core is ignored for cryptographic offloading. Default value is 10 milliseconds

Intercore options

Description

You can tune the size of the intercore rings, used for pipeline implementation.

--intercore-ring-size

One ring is allocated for each core, to store messages sent from other cores. Use this option to change the size of the rings. Default value is CONFIG_MCORE_INTERCORE_RING_SIZE, as specified in /etc/6WINDGate/fpnsdk.config.

Alternately, you can specify : ${INTERCORE_RING_SIZE:=<value>} in the configuration file.

Intercore implementation

The following functions have been added to the DPDK mbuf api:

  • m_set_process_fct()

  • m_call_process_fct()

These functions use mbuf headroom to store a fpn_callback structure.

To avoid updating mbuf internal pointers, m_prepend/m_adj functions are not used, yet the checks on available lengths in headroom are the same.

Since we don’t call m_prepend/m_adj, once m_set_process_fct() has been called, no operation can be done on mbuf until m_call_process_fct() is called.

Software scheduling implementation

The software scheduling API is implemented on top of the DPDK librte_sched library.

Linux / fast path communication

Parameters

-l <exception mask>

Specify the fast path cores involved in polling packets coming from Linux over fpvi interfaces. By default, all cores specified in fp_mask are used. All packets locally sent by the Linux stack are forwarded to the fast path and processed by the selected cores.

--nb-fpvi-queue=[all|phys|virt|excp]:<number of queues>

Specify the number of queues per fpvi port. Each port in fast path has a tuntap netdevice used to send/receive packets between fast path and linux kernel. The number of queues of the tuntap device can be configured with a different value in function of the fast path port devtype associated. The different devtypes for the fast path are: configured are phys (i.e. any physical network device like Intel or Mellanox NICS), virt (i.e. any virtual network device like vhost-user ports), excp (i.e. for fpn0).

RX/TX descriptors and thresholds

Parameters

--max-gro-mbufs=<max-phys-mbuf-number,max-virt-mbufs-number,>

Specify the maximum number of mbufs that can be stored in the GRO reassembly queues for physical and virtual ports. This is the value per port and per core. It means that the total number of mbufs required for the GRO reassembly will be (the total number of physical ports * MAX_GRO_MBUFS[1] + the total number of virtual ports * MAX_GRO_MBUFS[1]) * the total number of cores.

This parameter helps to allocate more precisely the right amount of mbufs required for the good working of the fast-path.

The value should contain 2 integers separate by a coma. The first integer is for physical ports and the seconde one is for the virtual ports.

Each integer must be between 0 and 65535.

--max-vports=<max number of dynamic vdev>

Specify the maximum number of vports that will be created using the fp-vdev command. Increasing this value will increase the preallocated mbuf numbers.

This value cannot exceed 300 and the total number of ports (physical / virtual) can not be more than CONFIG_MCORE_FPN_MAX_PORTS

--nb-mbuf=<total mbuf number>|<sock0-mbuf-number,sock1-mbufs-number,...>

Specify the number of mbufs to add in the pool (default is 16384).

It is preferred to set the number of mbufs in the fast path configuration file using the variable : ${NB_MBUF:=<value>}, as it allows to set the value to auto which lets the wizard calculate how many mbufs are required. Indeed, the number of needed mbufs can not be easily computed manually as some features like TCP use additional mbufs.

Optimal performance is reached when there are as few mbufs as possible. However, mbuf allocation failure can lead to unexpected behavior.

See also

If vhost ports are created, additional mbufs should be pre-allocated. This is automatically done by the wizard when setting : ${MAX_VPORT:=<value>} in fast-path.env.

Each soft-queue can hold at most the soft tx queue size of mbuf. This size can be found in fast-path.env defined with : ${SOFT_TXQ_SIZE:=<value>}. Therefore NB_MBUF has to be increased by SOFT_TXQ_SIZE * (MAX_VPORT + PHYS_PORTS) where PHYS_PORTS is the number of physical ports.

--mbuf-rx-size=<size>

Specify the size of Rx data (excluding headroom) inside each mbuf. The default value is 2176. Changing this value may affect performance.

The equivalent option in the fast path configuration file is : ${MBUF_RX_SIZE:=<value>}.

--nb-rxd=[RX descriptor number]

Optional. Specify the number of RX descriptors allocated to the NIC. It is highly recommended to use a power of 2 to be compliant with any PMD. The minimal value is 64. Default is 128.

Important

For Intel Ethernet Controller XL710, specify 1024 RX descriptors.

--nb-txd=[TX descriptor number]

Optional. Specify the number of TX descriptors allocated to the NIC. It is highly recommended to use a power of 2 to be compliant with any PMD. The minimal value is 64. Default is 512.

--nb-rxdtxd-fpvi=[all|phys|virt|excp]:<FPVI RXTX descriptor number>

Optional. Specify the number of RX/TX descriptors allocated to the fpvi NIC. It is highly recommended to use a power of 2. The minimal value is 64. Default value is 512.

Specify the number of descriptors in each RX/TX fpvi queue. Each port in fast path has a tuntap netdevice used to send/receive packets between fast path and linux kernel. The number of descriptors of the tuntap device can be configured with a different value depending on the fast path port devtype associated. The different devtypes for the fast path ports are: phys (i.e. any physical network device like Intel or Mellanox NICS). virt (i.e. any virtual network device like vhost-user ports). excp (i.e. internal exception port: fpn0).

--soft-queue=[<port number>=<additional TX desc number>/default=<additional TX desc number>]

Optional. Add a software queue at FPN-SDK level. This is particularly useful for NICs where it is not possible to configure the number of Tx descriptors. For performance purpose this field must be a power of 2. The maximal value is 32768. Default is 0 (i.e. no software queue).

port number

Fast path port number preceded by p or default to apply the configuration to all ports

additional TX desc number

Number of additional TX descriptors allocated at FPN-SDK level.

Alternately, you can specify : ${SOFT_TXQ_SIZE:=<value>} in the configuration file. Only the default value can be modified this way.

Control Plane protection options

mode

Parameters

--rx-cp-filter-mode=[all|phys|virt|excp]:[none|software-filter|hardware-filter|dedicated-queue]
--tx-cp-filter-mode=[all|phys|virt|excp]:[none|software-filter]

Optional. Choose control plane protection mode on RX or TX for each device type.

The different types of device that can be configured are phys (i.e. any physical network device like Intel or Mellanox NICS), virt (i.e. any virtual network device like vhost-user ports), excp (i.e. any device in charge to send packets in exception like fpvi using vhost).

For each device type, the following values are possible:

  • phys: hardware-filter or dedicated-queue (only with Mellanox NICs, fallback in software-filter for other NICs) or software-filter or none (default value is none)

  • virt: software-filter or none (default value is none)

  • excp : software-filter or none in both RX and TX (default value is none in RX and software-filter in TX)

Example

###########################
##### FPN-SDK OPTIONS #####
###########################

${FPNSDK_OPTIONS:=--rx-cp-filter-mode=phys:hardware-filter,virt:software-filter --tx-cp-filter-mode=all:none}
threshold

Parameters

--rx-cp-filter-threshold=[all|phys|virt|excp]:[threshold number][%]
--tx-cp-filter-threshold=[all|phys|virt|excp]:[threshold number][%]

Optional. Configure control plane protection threshold on RX or TX for software-filter or hardware-filter mode. The value can be provided with a fixed value or a percentage by using the % keyword. By default, control plane protection threshold is 50% in RX and TX.

Example

###########################
##### FPN-SDK OPTIONS #####
###########################

${FPNSDK_OPTIONS:=--rx-cp-filter-threshold=phys:2048,virt:10% --tx-cp-filter-threshold=all:50%}

fast path port name

Parameters

--netdev-name "0:port1,1:port2[,...]"

Override the name of the FPVI kernel netdevice that is created for each fast path port. Alternately, you can specify : ${NETDEV_NAME:='<port>:<value>[,...]'} in the configuration file.

Example

: ${NETDEV_NAME:=0:fp1,1:fp2}

Managing RETA entries

RETAs are per-port configurable tables used by the NIC controllers’ RSS filtering feature to select the RX queue into which to store an IP input packet. When receiving an IPv4 or an IPv6 packet, the controller computes a 32-bit hash based on:

  • the source address and the destination address in the packet’s IP header,

  • the source port and the destination port in the UDP/TCP header, if any.

The controller then uses the RSS hash value to compute a RETA table index to get the number of the RX queue where to store the packet.

The DPDK API includes a function that is exported by PMDs to configure a port’s RETA entries.

Note

  • The number or RETA entries depends on your NIC:

    NIC

    RETA entries

    Intel 1GbE

    128

    Intel 10GbE

    128

    Intel 40GbE

    512

  • For test purposes, the testpmd application includes the following command to configure RETA entries:

    port config X rss reta (reta_index,queue_number)[,(hash_index,queue_number)]
    
    X

    Port for which to configure RETA entries.

    hash_index

    Index of a RETA entry.

    hash_index

    RX queue number to be stored in the RETA entry.

  • Configure a port’s RETA entries:

    dpdk-rss-reta-set <Pi> index <i>[ <j>[ ...]] queue <Qj>
    
    <Pi>

    Port number

    index <i>[ <j>[ …]]

    Space-separated or tab-separated list of RETA entries.

    queue <Qj>

    RX queue index to write in RETA entries.

  • Display a port’s RETA entries:

    dpdk-rss-reta <Pi> [index <i> [<j>]]
    
    <Pi>

    Port number.

    index <i>

    Optional. RETA entry index.

    index <i> [<j>]

    Optional. RETA entries range indices. By default, all RETA entries are displayed.

  • Select the method used to compute the IP input packets RSS hash value:

    dpdk-rss-hash-func-set <Pi> [<HF>[ <HF>[ ...]]]
          HF := { ipv4|ipv4-frag|ipv4-tcp|ipv4-udp|ipv4-sctp|ipv4-other|
                  ipv6|ipv6-frag|ipv6-tcp|ipv6-udp|ipv6-sctp|ipv6-other|
                  l2-payload|ipv6-ex|ipv6-tcp-ex|ipv6-udp-ex|
                  port|vxlan|geneve|nvgre }
    
    <Pi>

    Port number.

    [<HF>[ <HF>[ …]]]

    Space-separated or tab-separated list of RSS hash functions among the following: ipv4, ipv4-frag, ipv4-tcp, ipv4-udp, ipv4-sctp, ipv4-other, ipv6, ipv6-frag, ipv6-tcp, ipv6-udp, ipv6-sctp, ipv6-other, l2-payload, ipv6-ex, ipv6-tcp-ex, ipv6-udp-ex, port, vxlan, geneve, nvgre. Disable RSS filtering is no hash function is supplied.

  • Display the set of methods currently used to compute the IP input packets RSS hash value:

    dpdk-rss-hash-func <Pi>
    
    <Pi>

    Port number.

  • Set the 40-byte RSS hash key used to compute the IP input packets RSS hash value:

    dpdk-rss-hash-key-set <Pi> <key>
    
    <Pi>

    Port number.

    <key>

    40-bytes key as a contiguous set of 80 hexadecimal digits (2 hexadecimal digits per byte).

  • Display the current 40-byte RSS hash key used to compute the IP input packets RSS hash value:

    dpdk-rss-hash-key <Pi>
    
    <Pi>

    Port number.