High Availability Fast-path conntrack

High Availability fast path conntrack enables synchronizing fast path conntracks used by CG-NAT and fast path Firewall between two HA nodes in active/backup mode.

The conntrack entries are synchronized between the HA nodes by using dedicated connections. If the synchronization connections go down, a complete resynchronization will occur when the connection is reestablished.

Note

  • The synchronization messages are distributed evenly among a number of synchronization connections per peer, which is determined by the number of event threads. (/ system fast-path limits cg-nat nb-event-threads).

  • It is recommanded to establish the synchronization connection on a non fast path port, as HA connections between nodes may be blocked by the fast path firewall.

The HA state of a node can be statically configured using CLI/netconf or dynamically using the VRRP service.

../../../_images/ha-sample.svg

In this example, conntracks from the main VRF are synchronized between ha1 to ha2 as time goes along, and used in ha2’s dataplane when ha2 becomes master.

HA fast path conntrack parameters are configured per VRF in the ha-fp-conntrack context:

ha1 running config# vrf main ha-fp-conntrack
ha1 running ha-fp-conntrack#!

Configure mandatory options in ha1:

ha1 running ha-fp-conntrack#! peer peer1 source 10.150.0.1 address 10.150.0.2
ha1 running ha-fp-conntrack#! listen-ha-group ha-group1
  • peer corresponds to a remote HA node, the synchronization link connection is established using IPv4 or IPv6 with the source and address parameters.

  • listen-ha-group is the high-availability group that controls the activity state of this HA node. See High-availability Groups for more information.

Display ha1 HA fast path conntrack state:

ha1 running ha-fp-conntrack# show state
ha-fp-conntrack
    enabled true
    listen-ha-group ha_group
    peer peer1
        source 10.150.0.1
        address 10.150.0.2
        ..
    ..

Display ha1 HA synchronization connections state and statistics:

ha1> show ha-fp-conntrack peer vrf main
show-ha-fp-conntrack-peer
    peer peer1
        source 10.150.0.1
        address 10.150.0.2
        client npf_event0
            statistics
                transmit 0
                transmit-error 0
                build-error 0
                ..
            connection-status established
            ..
        client npf_event1
            statistics
                transmit 1
                transmit-error 0
                build-error 0
                ..
            connection-status established
            ..
        ..
    local-node 10.150.0.1
        statistics
            receive 0
            receive-error 0
            ..
        ..
    ..

On ha2, the peer must be adjusted:

ha2 running config# vrf main ha-fp-conntrack
ha2 running ha-fp-conntrack#! peer peer1 source 10.150.0.2 address 10.150.0.1
ha2 running ha-fp-conntrack#! listen-ha-group ha-group1

Conntrack timeout behavior

Conntracks are synchronized when they enter the established state with their session timeout but they cannot expire while the HA node state is backup. When a TCP conntrack leaves the established state, a synchronization update event is sent to set its current timeout to the value of TCP undefined state, the default value is 10 seconds.

The following configuration changes the timeout of TCP undefined state to 20 seconds for the main VRF:

vsr running config# / vrf main network-stack fast-path conntrack timeouts tcp undefined 20

The conntracks activity time (the time the last packet was seen for a conntrack) is not synchronized between HA nodes, so after a failover, the new master node is not aware of the last activity time of each conntrack. Default behavior is to wait for the entire current timeout period, starting from switch date, before releasing unused conntracks. This could lead to a temporary large increase in conntrack usage, until the timeout expires and releases the unused conntracks.

Some delays, configurable for TCP, UDP and ICMP protocols, have been implemented to mitigate this issue:

  • synchronization will only synchronize conntracks that have been established for at least the number of seconds specified, preventing short-lived connections from being synchronized. Since protocols other than TCP do not provide an explicit connection close mechanism, these connections are synchronized only if traffic is observed after the specified delay. Whereas TCP connections are synchronized after this delay even if no traffic is seen, as long as they remain in established state.

  • undefined-grace-period defines a delay during which conntracks are preserved on the new master following a failover procedure before being randomly dropped, possibly earlier than their current timeout. If unset, the entire current timeout period is observed before closing unused conntracks. It will have no effect if set to a value bigger than conntrack established state timeout.

The following configuration sets the TCP synchronization delay to 1 and the undefined-grace-period to 10` for the main VRF:

vsr running config# / vrf main network-stack fast-path conntrack ha-delays tcp synchronization 1
vsr running config# / vrf main network-stack fast-path conntrack ha-delays tcp undefined-grace-period 10

See also

The command reference for details.

Note

Some conntracks are not synchronized. This includes ALG-related conntracks and conntracks that were not fully established or did not have sufficient time to be synchronized due to the synchronization setting. After failover, CG-NAT destroy logs for these conntracks (or for blocks that contain only non-synchronized conntracks) will be lost.

Before a planned failover, it is possible to update the activity time and force the synchronization of the conntracks not yet synchronized with the following command on the HA master node:

vsr running config# cmd ha-fp-conntrack synchronization vrf main
Synchronization started, completion will appear in the log.

After this command, if a failover occurs within the time window between the execution of the command and the conntrack current timeout, the synchronized activity time will be used to determine whether the conntrack should time out after failover, instead of using the switch to master time as explained above.

Warning

There is currently no configuration synchronization between nodes, but all nodes must share the same CG-NAT and fast path Firewall configuration. Be careful to maintain the same configuration between nodes. Failing to do so may lead to unexpected behaviors.

Warning

When HA is enabled in a VRF, all conntracks of this VRF are synchronized on the remote peer. All of them will be managed by the other node in case of failover, and traffic matching these conntracks will be blocked on the backup node. User must ensure that all conntracks of the VRF are meaningful on all peers. If some tracking is needed on connections local to a peer, it must be done in another VRF.