1.3.1. ISSU procedure

To reduce as much as possible the network outage during packages upgrade, an ISSU (In Service Software Upgrade) procedure is available. During this procedure, the updated Virtual Accelerator is maintained alive to process packets while the new Virtual Accelerator instance is being installed and configured.

Here is a typical running Virtual Accelerator block diagram:

../../_images/issu-initial.svg

Virtual Accelerator

Virtual Accelerator is switching traffic between some VMs taps, and physical NICs, taking its configuration through Linux - Fast Path Synchronization, that updates some network configuration shared memories and ovs-vswitchd daemon that feeds an Open vSwitch dedicated shared memory.

The ISSU procedure consists in installing new version of product packages while Virtual Accelerator continues to forward packets; it is composed of following steps, that must be done in order:

  • ISSU procedure startup

  • New packages installation

  • Virtual Accelerator service restart, leading to two Virtual Accelerator running at the same time, old one is managing data traffic.

  • Open vSwitch service stop

  • Virtual Accelerator internal state copy from old to new product instance

  • Open vSwitch service restart

  • old Virtual Accelerator instance is killed (traffic interruption), new Virtual Accelerator instance takes over VMs taps and physical NICs and starts forwarding traffic.

  • ISSU procedure finalization

Several commands are provided by fast-path.sh script to handle the ISSU procedure:

  1. ISSU procedure initiation

    fast-path.sh upgrade start
    

    This command is used to mark the beginning of the ISSU procedure. It instructs the Virtual Accelerator to keep dataplane running during next call to systemctl restart, instead of killing the old instance.

  2. Virtual Accelerator configuration restoration

    fast-path.sh upgrade restore-conf
    

    This command is used to copy internal states and configuration from old Virtual Accelerator to new Virtual Accelerator, such as Open vSwitch flows or ports configuration (Fast Path QoS - Exception Rate Limitation, control plane protection, …).

    Can return a non null exit code if an error occured during configuration retrieval. It is the user responsibility to call Virtual Accelerator service restart if this error occurs to ensure a proper restart of the Virtual Accelerator without ISSU procedure.

  3. Virtual Accelerator switching

    fast-path.sh upgrade takeover
    

    This command is used to kill the old instance of the Virtual Accelerator and to instruct the new Virtual Accelerator to take over all VMs taps and physical NICs released by the old Virtual Accelerator instance. There will be a traffic interruption as soon as the old Virtual Accelerator is killed and until the new Virtual Accelerator takes over all VMs taps and physical NICs.

    Open vSwitch flows max idle time must be greater than this command run time to be sure that no flow will be dropped during takeover procedure due to max idle time reached. This idle time (here 15s as an example) can be configured using following command:

    ovs-vsctl set Open_vSwitch . other_config:max-idle=15000
    

    Can return a non null exit code if an error occured during takeover process. It is the user responsibility to call Virtual Accelerator service restart if this error occurs to ensure a proper restart of the Virtual Accelerator without ISSU procedure.

  4. ISSU procedure finalization

    fast-path.sh upgrade done
    

    This command is used to finalize ISSU procedure. Virtual Accelerator daemon processes are monitored during all ISSU process. If a daemon restart occured during the ISSU process, Virtual Accelerator may be in an unstable state. If it happens, this command will return a non null error code.

    Can return a non null exit code if process monitoring scripts did trigger during the ISSU procedure. It is the user responsibility to call Virtual Accelerator service restart if this error occurs to ensure a proper restart of the Virtual Accelerator without ISSU procedure.

  5. ISSU script example

    Here is a typical ISSU script:

    # Start ISSU procedure
    fast-path.sh upgrade start
    
    # Upgrade packages
    dnf update -y
    
    # Restart service
    systemctl restart virtual-accelerator.target
    if ! fast-path.sh upgrade status; then
        if systemctl is-active virtual-accelerator.target; then
            # Failure during stop stage, already restarted in slow mode
            echo "ISSU failure, automatic slow restart done"
        else
            # Failure during restart stage, restart in slow mode
            echo "ISSU failure, force slow restart"
            systemctl restart virtual-accelerator.target
            systemctl restart network.service
        fi
        exit 1
    fi
    
    # Save ovs flows
    ovs_bridges=$(ovs-vsctl -- --real list-br)
    ovs_flows=$(/usr/share/openvswitch/scripts/ovs-save save-flows $ovs_bridges)
    
    # Restart the database first, since a large database may take a
    # while to load, and we want to minimize forwarding disruption.
    systemctl --job-mode=ignore-dependencies restart ovsdb-server
    
    # Stop ovs-vswitchd.
    systemctl --job-mode=ignore-dependencies stop ovs-vswitchd
    
    # Start vswitchd by asking it to wait till flow restore is finished.
    ovs-vsctl --no-wait set open_vswitch . other_config:flow-restore-wait="true"
    systemctl --job-mode=ignore-dependencies start ovs-vswitchd
    
    # Restore configuration
    eval "$ovs_flows"
    
    # Sync fastpath internal state from old to new va
    if ! fast-path.sh upgrade restore-conf; then
        # Failure during sync stage, restart in slow mode
        echo "ISSU sync failure, force slow restart"
        systemctl restart virtual-accelerator.target
        systemctl restart network.service
        exit 1
    fi
    
    # Restore OVS normal operation
    ovs-vsctl --if-exists remove open_vswitch . other_config flow-restore-wait="true"
    
    # Kill old instance, and takeover needed resources
    if ! fast-path.sh upgrade takeover; then
        # Failure during takeover stage, force restart in slow mode
        echo "ISSU takeover failure, force slow restart"
        systemctl restart virtual-accelerator.target
        systemctl restart network.service
        exit 1
    fi
    
    # Finalize procedure and check process monitoring
    if ! fast-path.sh upgrade done; then
        # Monitoring triggered a process restart during ISSU, force restart in slow mode
        echo "Daemone failure during ISSU, force slow restart"
        systemctl restart virtual-accelerator.target
        systemctl restart network.service
        exit 1
    fi
    
    echo "ISSU upgrade done"
    

    Following paragraphs will give a short description of the various commands used in the script to achieve the ISSU procedure.

    1. ISSU upgrade procedure startup

      # Start ISSU procedure
      fast-path.sh upgrade start
      

      This command is used to mark the beginning of the ISSU procedure.

    2. Packages update

      # Upgrade packages
      dnf update -y
      

      Distribution packaging tool is used to install the new packages. This assumes that repositories configuration is correct and that this update command will lead to an effective update of the Virtual Accelerator packages.

    3. Virtual Accelerator restart

      # Restart service
      systemctl restart virtual-accelerator.target
      

      Service is restarted using regular service restart command, but new Virtual Accelerator instance is running along with old instance:

      ../../_images/issu-restart.svg

      Virtual Accelerator restart

      ovs-vswitchd is always connected to old Virtual Accelerator instance, that continues to forward traffic between VMs taps and physical NICs; while new Virtual Accelerator instance is setting up and synchronize its configuration through Linux - Fast Path Synchronization. New Virtual Accelerator instance is running on the same set of cores than the old instance, but does not consume CPU cycles, since it does not forward any traffic at this time.

    4. ovs-vswitchd stop

      # Save ovs flows
      ovs_bridges=$(ovs-vsctl -- --real list-br)
      ovs_flows=$(/usr/share/openvswitch/scripts/ovs-save save-flows $ovs_bridges)
      
      # Restart the database first, since a large database may take a
      # while to load, and we want to minimize forwarding disruption.
      systemctl --job-mode=ignore-dependencies restart ovsdb-server
      
      # Stop ovs-vswitchd.
      systemctl --job-mode=ignore-dependencies stop ovs-vswitchd
      

      These commands are used to save the Open vSwitch bridges and stop ovs-vswitchd.

    5. Fastpath Open vSwitch flows recovery

      fast-path.sh upgrade restore-conf
      

      Now that the ovs-vswitchd is not running, the restore-conf command can be issued to recover current Open vSwitch flows and internal Open vSwitch structures from old Virtual Accelerator to new Virtual Accelerator.

      ../../_images/issu-no-ovs.svg

      Virtual Accelerator recover Open vSwitch flows

    6. ovs-vswitchd restart

      # Start vswitchd by asking it to wait till flow restore is finished.
      ovs-vsctl --no-wait set open_vswitch . other_config:flow-restore-wait="true"
      systemctl --job-mode=ignore-dependencies start ovs-vswitchd
      
      # Restore configuration
      eval "$ovs_flows"
      

      These commands are used to restart the ovs-vswitchd daemon in flow-restore-wait mode. The ovs-vswitchd daemon will now be connected to the new Virtual Accelerator instance.

      ../../_images/issu-ovs-restarted.svg

      Virtual Accelerator restart

    7. Virtual Accelerator takeover

      fast-path.sh upgrade takeover [--log] [--wait-link [t]]
      

      This command is used to switch between old and new Virtual Accelerator. Old Virtual Accelerator is killed, and new one is starting to process the incoming traffic.

      Options:

      • –wait-link can be used to specify a time to wait for all links to recover their previous state before returning from the command. Time is specified in seconds. If t is omitted, a default timeout of 10s is used.

      • –log can be used to log additional timing information that can help to debug timing issues during takeover process.


      Returned values:

      • 1 if an internal error occurs during takeover process.

      • 2 if –wait-link is specified and timeout occurs before all physical links are properly up.

      • 3 if –wait-link is specified and timeout occurs before all virtual ports are properly up, all physical ports being properly up.


      ../../_images/issu-final.svg

      Virtual Accelerator new instance running

    8. ISSU finalization

      # Restore normal operation
      ovs-vsctl --if-exists remove open_vswitch . other_config flow-restore-wait="true"
      
      # Finalize procedure and check process monitoring
      if ! fast-path.sh upgrade done; then
          # Monitoring triggered a process restart during ISSU, force restart in slow mode
          echo "Daemon failure during ISSU, force slow restart"
          systemctl restart virtual-accelerator.target
          systemctl restart network.service
          exit 1
      fi
      

      Finally ovs-vswitchd normal mode of operation can be restored, and ISSU procedure can be finalized properly.