Configuration

Leaf routers configuration

In this example, the leaf routers of the network fabric use OSPF to exchange underlay routes information, and BGP to exchange EVPN L2 and L3 information.

Note

In this deployment guide, focusing on how to configure the HNA, the spines routers are omitted.

Warning

In production, it is advised to set ebgp-requires-policy to true, and to configure relevant policies.

Customize and apply the following configuration (download link) on leaf1:

/ system license online serial HIDDEN
/ system hostname leaf1
/ system fast-path port pci-b0s4
/ system fast-path port pci-b0s5
/ vrf main interface physical eth1 port pci-b0s4
/ vrf main interface physical eth1 mtu 1600
/ vrf main interface physical eth1 ipv4 address 10.0.11.1/24
/ vrf main interface physical eth2 port pci-b0s5
/ vrf main interface physical eth2 mtu 1600
/ vrf main interface physical eth2 ipv4 address 10.0.21.1/24
/ vrf main interface loopback loop0 ipv4 address 10.0.0.201/32
/ vrf main routing bgp as 65500
/ vrf main routing bgp listen neighbor-range 10.0.0.0/16 neighbor-group HNAs
/ vrf main routing bgp router-id 10.0.0.201
/ vrf main routing bgp ebgp-requires-policy false
/ vrf main routing bgp neighbor-group HNAs remote-as 65500
/ vrf main routing bgp neighbor-group HNAs neighbor-description HNA
/ vrf main routing bgp neighbor-group HNAs update-source loop0
/ vrf main routing bgp neighbor-group HNAs address-family ipv4-unicast enabled false
/ vrf main routing bgp neighbor-group HNAs address-family ipv6-unicast enabled false
/ vrf main routing bgp neighbor-group HNAs address-family l2vpn-evpn route-reflector-client true
/ vrf main routing bgp neighbor-group HNAs track bfd
/ vrf main routing ospf
/ vrf main routing ospf router-id 10.0.0.201
/ vrf main routing ospf network 10.0.0.0/16 area 0
/ vrf main routing interface eth1 ip ospf track bfd
/ vrf main routing interface eth2 ip ospf track bfd

Note

Take care to at least update the license serial and the PCI bus addresses.

Do the same for the configuration of leaf2 (download link):

/ system license online serial HIDDEN
/ system hostname leaf2
/ system fast-path port pci-b0s4
/ system fast-path port pci-b0s5
/ vrf main interface physical eth1 port pci-b0s4
/ vrf main interface physical eth1 mtu 1600
/ vrf main interface physical eth1 ipv4 address 10.0.12.1/24
/ vrf main interface physical eth2 port pci-b0s5
/ vrf main interface physical eth2 mtu 1600
/ vrf main interface physical eth2 ipv4 address 10.0.22.1/24
/ vrf main interface loopback loop0 ipv4 address 10.0.0.202/32
/ vrf main routing bgp as 65500
/ vrf main routing bgp listen neighbor-range 10.0.0.0/16 neighbor-group HNAs
/ vrf main routing bgp router-id 10.0.0.202
/ vrf main routing bgp ebgp-requires-policy false
/ vrf main routing bgp neighbor-group HNAs remote-as 65500
/ vrf main routing bgp neighbor-group HNAs neighbor-description HNA
/ vrf main routing bgp neighbor-group HNAs update-source loop0
/ vrf main routing bgp neighbor-group HNAs address-family ipv4-unicast enabled false
/ vrf main routing bgp neighbor-group HNAs address-family ipv6-unicast enabled false
/ vrf main routing bgp neighbor-group HNAs address-family l2vpn-evpn route-reflector-client true
/ vrf main routing bgp neighbor-group HNAs track bfd
/ vrf main routing ospf
/ vrf main routing ospf router-id 10.0.0.202
/ vrf main routing ospf network 10.0.0.0/16 area 0
/ vrf main routing interface eth1 ip ospf track bfd
/ vrf main routing interface eth2 ip ospf track bfd

HNA configuration

Namespace

Create the hna namespace where the HNA pods will be spawned.

root@node1:~# kubectl create namespace hna
namespace/hna created

Bootstrap configuration

The bootstrap configuration on an HNA is typically used to enable the license. This initial configuration is saved as startup configuration in the pod filesystem, so it can be used as a starting point in the configuration template.

Note

Take care to set a valid license serial in the initial configuration.

Customize the hna-init-config.yaml file (download link) and apply it:

root@node1:~# kubectl create -n hna -f hna-init-config.yaml
secret/vsr-init-config created

Startup probe

In a ConfigMap, add a script that will be used as a startup probe by the container: this script startup-probe.sh is executed by the container runtime inside the container to check if it is ready (download link):

ret=$(systemctl is-system-running)
if [ "$ret" = "running" ] || [ "$ret" = "degraded" ]; then
	exit 0
fi

Apply the ConfigMap like this (it will be used by the deployment file later):

root@node1:~# kubectl create configmap -n hna startup-probe --from-file=startup-probe.sh=startup-probe.sh

HNA deployment

The HNA pod takes the role of the HBR (Host Based Router). It provides the network connectivity to the CNF Pods, through a virtio or a veth interface. It runs on each Kubernetes node, which means the deployment type is a DaemonSet.

As described in the nc-k8s-plugin documentation, the multus-hna-hbr network must be present in the metadata annotations.

In the example below, the multus SR-IOV networks that corresponds to the connections to leaf1 and leaf2 are respectively called multus-sriov-1 and multus-sriov-2. You can get the name associated to your Kubernetes cluster with the following command:

root@node1:~# kubectl get --show-kind network-attachment-definitions

Similarly, the SR-IOV resources are called sriov/sriov1 and sriov/sriov2. You can get the name associated to your Kubernetes cluster with the following command:

root@node1:~# kubectl get -o yaml -n kube-system configMap sriovdp-config

The content of the deployment file deploy-hna.yaml is shown below (download link):

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: hna
  namespace: hna
spec:
  selector:
    matchLabels:
      role: hna
  template:
    metadata:
      labels:
        role: hna
      annotations:
         k8s.v1.cni.cncf.io/networks: multus-sriov-1,multus-sriov-2,multus-hna-hbr
    spec:
      restartPolicy: Always
      securityContext:
        appArmorProfile:
          type: Unconfined
        sysctls:
        - name: net.ipv4.conf.default.disable_policy
          value: "1"
        - name: net.ipv4.ip_local_port_range
          value: "30000 40000"
        - name: net.ipv4.ip_forward
          value: "1"
        - name: net.ipv6.conf.all.forwarding
          value: "1"
        - name: net.netfilter.nf_conntrack_events
          value: "1"
      containers:
      - image: download.6wind.com/vsr/x86_64-ce-vhost/3.12:3.12.0.ga
        imagePullPolicy: IfNotPresent
        name: hna
        startupProbe:
          exec:
            command: ["bash", "-c", "/bin/startup-probe"]
          initialDelaySeconds: 10
          failureThreshold: 20
          periodSeconds: 10
          timeoutSeconds: 9
        resources:
          limits:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            sriov/sriov1: 1
            sriov/sriov2: 1
            smarter-devices/ppp: 1
            nc-k8s-plugin.6wind.com/vhost-user-all: 1
          requests:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            sriov/sriov1: 1
            sriov/sriov2: 1
            smarter-devices/ppp: 1
            nc-k8s-plugin.6wind.com/vhost-user-all: 1
        env:
        - name: K8S_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW", "SYS_ADMIN", "SYS_NICE", "IPC_LOCK", "NET_BROADCAST", "SYSLOG", "SYS_TIME"
                 , "SYS_RAWIO", "SYS_CHROOT"
                 ]
        volumeMounts:
        - mountPath: /dev/hugepages
          name: hugepage
        - mountPath: /dev/shm
          name: shm
        - mountPath: /tmp
          name: tmp
        - mountPath: /run
          name: run
        - mountPath: /run/lock
          name: run-lock
        - mountPath: /bin/startup-probe
          subPath: startup-probe.sh
          name: startup-probe
        - mountPath: /etc/init-config/config.cli
          subPath: vsr_init_config
          name: init-config
        stdin: true
        tty: true
      imagePullSecrets:
      - name: regcred
      volumes:
      - emptyDir:
          medium: HugePages
          sizeLimit: 2Gi
        name: hugepage
      - name: shm
        emptyDir:
          sizeLimit: "2Gi"
          medium: "Memory"
      - emptyDir:
          sizeLimit: "500Mi"
          medium: "Memory"
        name: tmp
      - emptyDir:
          sizeLimit: "200Mi"
          medium: "Memory"
        name: run
      - emptyDir:
          sizeLimit: "200Mi"
          medium: "Memory"
        name: run-lock
      - name: startup-probe
        configMap:
          name: startup-probe
          defaultMode: 0500
      - name: init-config
        secret:
          secretName: vsr-init-config
          defaultMode: 0400

Note

In addition to the multus SR-IOV networks and the SR-IOV resources that must be customized, other parameters of the deployment file can be adapted to your use-case (ex: CPUs, memory)

Apply the DaemonSet with the following command:

root@node1:~# kubectl apply -f deploy-hna.yaml

Once applied, the pods should be running:

root@node1:~# kubectl get pod -n hna -o wide
NAME        READY   STATUS    RESTARTS   AGE   IP             NODE                 NOMINATED NODE   READINESS GATES
hna-hzj7l   1/1     Running   0          31s   10.229.0.119   node1                <none>           <none>
hna-ltlzw   1/1     Running   0          31s   10.229.1.123   node2                <none>           <none>

HNA configuration template

The HNA Pods are configured automatically by the hna-operator. The configuration is generated from a Jinja2 template.

See also

For detailed instructions, please refer to the HNA Configuration Template section of the nc-k8s-plugin documentation.

This template relies on the Kubernetes database (list of Pods, list of nodes, custom resources definitions, …) to generate a valid CLI configuration that depends on the properties of the CNFs that are running on node.

Hera are some details about the template used in this document:

  • The list of PCI ports to configure on the Host Network Accelerator pod are retrieved from the Pod annotations (k8s.v1.cni.cncf.io/network-status).

  • For each hna_net (i.e. a network connection registered by a running CNF pod), a specific fast path and interface configuration is added, which depends on the interface kind (veth or virtio-user).

  • The tenants and its subnets are retrieved from CRDs. They are described in the next section.

  • Depending on the tenant of the CNF associated to the hna_net, the interface is added into a bridge associated to a L2 network, inside a l3vrf corresponding to the tenant. A vxlan interface is connected to this bridge to provide L2 connectivity for this subnet.

  • For each tenant, a bridge and a vxlan interface is also created in the same l3vrf, providing L3 connectivity (inter-subnet and inter-tenant).

  • An HNA identifier hna_id is used to build a unique IP for the HNA Pod. It has to be set as a label on the HNA pod.

  • A BGP configuration is used to peer with the leaf routers, to exchange EVPN information.

  • An OSPF configuration is used to peer with the leaf routers, to exchange underlay IP routes.

  • A KPI configuration is used to export metrics to an influxdb Pod on the Kubernetes cluster. This part is optional and can be removed.

The content of the configuration template hna-config-template.nc-cli is shown below (download link):

{#
This configuration bridges the virtual ports of the same tenant, and
does BGP with the leaf routers.
#}

{# Enable license #}
load startup

{# Enable fast path #}
# Fast path
/ system fast-path enabled true
/ system fast-path core-mask fast-path max
/ system fast-path advanced power-mode eco
/ system fast-path advanced machine-memory 2048
/ system fast-path max-virtual-ports 16

    {% if "hna_id" not in hna_pod.metadata.labels %}
# No hna_id label on the hna pod
    {% else %}
        {% set hna_id = hna_pod.metadata.labels['hna_id'] | int %}

{# Parse network status annotation in hna pod to retrieve the PCI bus address of each iface, and configure it #}
# Physical ports
        {% set pci_ifaces = [] %}
        {% for sriov in hna_pod.metadata.annotations["k8s.v1.cni.cncf.io/network-status"] |
           parse_json |
           selectattr('device-info', 'defined') |
           selectattr('device-info.type', 'eq', 'pci') %}
            {% set pci_iface = sriov["device-info"]["pci"]["pci-address"] | pci2name %}
            {% set _ = pci_ifaces.append(pci_iface) %}
/ system fast-path port {{pci_iface}}
/ vrf main interface physical {{pci_iface}} description "physical index {{loop.index}}"
/ vrf main interface physical {{pci_iface}} port {{pci_iface}}
/ vrf main interface physical {{pci_iface}} mtu 1600
/ vrf main interface physical {{pci_iface}} ipv4 address 10.0.{{loop.index * 10 + hna_id}}.2/24
        {% endfor %}

{# For each virtual network connected to the HNA, configure the fpvhost interface, and store the tenant and subnet for it #}
        {% set local_tenants = {} %}
# Virtual ports
        {% for hna_net in hna_nets.values() | selectattr('kind', 'ne', 'hbr') %}
            {% if hna_net.kind == "veth" %}
/ system fast-path virtual-port infrastructure infra-{{hna_net.name}}
                {% set hna_net_iface = "veth-" + hna_net.name %}
/ vrf main interface infrastructure {{hna_net_iface}} port infra-{{hna_net.name}}
/ vrf main interface infrastructure {{hna_net_iface}} description "{{hna_net.pod_name}}"
            {% elif hna_net.kind == "vhost-user" %}
/ system fast-path virtual-port fpvirtio fpvirtio-{{hna_net.name}}
                {% if "queues" in hna_net.userdata %}
/ system fast-path virtual-port fpvirtio fpvirtio-{{hna_net.name}} queues {{hna_net.userdata.queues}}
                {% endif %}
                {% set hna_net_iface = "vir-" + hna_net.name %}
/ vrf main interface fpvirtio {{hna_net_iface}} port fpvirtio-{{hna_net.name}}
/ vrf main interface fpvirtio {{hna_net_iface}} description "{{hna_net.pod_name}}"
            {% elif hna_net.kind == "virtio-user" %}
/ system fast-path virtual-port fpvhost fpvhost-{{hna_net.name}}
                {% if "profile" in hna_net.userdata %}
/ system fast-path virtual-port fpvhost fpvhost-{{hna_net.name}} profile {{hna_net.userdata.profile}}
                {% endif %}
                {% if hna_net.userdata.socket_mode == "server" %}
/ system fast-path virtual-port fpvhost fpvhost-{{hna_net.name}} socket-mode client
                {% else %}
/ system fast-path virtual-port fpvhost fpvhost-{{hna_net.name}} socket-mode server
                {% endif %}
                {% set hna_net_iface = "vho-" + hna_net.name %}
/ vrf main interface fpvhost {{hna_net_iface}} port fpvhost-{{hna_net.name}}
/ vrf main interface fpvhost {{hna_net_iface}} description "{{hna_net.pod_name}}"
            {% endif %}
            {% set pod = pods[hna_net.pod_name] %}
            {% if 'tenant' in pod.metadata.labels and 'subnet' in pod.metadata.labels and 'ip' in pod.metadata.labels %}
                {% set tenant = pod.metadata.labels['tenant'] %}
                {% set subnet = pod.metadata.labels['subnet'] %}
                {% set ip = pod.metadata.labels['ip'] %}
                {% if tenant not in local_tenants %}
                    {% set _ = local_tenants.update({tenant: {}}) %}
                {% endif %}
                {% if subnet not in local_tenants[tenant] %}
                    {% set _ = local_tenants[tenant].update({subnet: {}}) %}
                {% endif %}
                {% set _ = local_tenants[tenant][subnet].update({hna_net_iface: {"name": hna_net_iface, "ip": ip, "port": hna_net.name}}) %}
            {% endif %}
        {% endfor %}

{# List tenants referenced by intertenant CRDs #}
        {% set local_intertenants = {} %}
        {% for intertenant in intertenants.values() %}
            {% set _ = local_intertenants.update({intertenant.spec.tenant1: True, intertenant.spec.tenant2: True}) %}
        {% endfor %}

{# For each tenant present locally and its subnet, configure the network #}
        {% for tenant in tenants.values()
           if tenant.metadata.name in local_tenants or tenant.metadata.name in local_intertenants %}
# -- Tenant {{tenant.metadata.name}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} description "tenant {{tenant.metadata.name}}"
/ vrf main l3vrf vrf{{tenant.spec.identifier}} table-id {{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp l3vni {{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family ipv4-unicast l3vpn export route-target 65500:{{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family ipv4-unicast l3vpn export route-distinguisher 65500:{{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family ipv4-unicast l3vpn import route-target 65500:{{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family ipv4-unicast redistribute connected
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family l2vpn-evpn advertisement ipv4-unicast
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family l2vpn-evpn auto-route-target rfc8365
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family l2vpn-evpn export route-distinguisher 65500:{{tenant.spec.identifier}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} routing bgp address-family l2vpn-evpn import
/ vrf main interface vxlan vx{{tenant.spec.identifier}} description "tenant {{tenant.metadata.name}}"
/ vrf main interface vxlan vx{{tenant.spec.identifier}} mtu 1500
/ vrf main interface vxlan vx{{tenant.spec.identifier}} vni {{tenant.spec.identifier}}
/ vrf main interface vxlan vx{{tenant.spec.identifier}} local 10.0.0.{{hna_id}}
/ vrf main interface vxlan vx{{tenant.spec.identifier}} link-interface loop0
/ vrf main interface vxlan vx{{tenant.spec.identifier}} learning false
            {% for subnet in subnets.values()
               if subnet.metadata.namespace == tenant.metadata.name %}
# Subnet {{subnet.metadata.name}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} description "subnet {{subnet.metadata.name}}"
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} ipv4 address {{subnet.spec.gateway}}/{{subnet.spec.prefixlen}}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} ethernet mac-address 00:00:00:00:01:01
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} network-stack ipv4 arp-accept-gratuitous always
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} network-stack neighbor ipv4-base-reachable-time 30000
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} mtu 1550
                {% if tenant.metadata.name in local_tenants and subnet.metadata.name in local_tenants[tenant.metadata.name] %}
                    {% for link_iface in local_tenants[tenant.metadata.name][subnet.metadata.name].values() %}
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} link-interface {{link_iface.name}}
                    {% endfor %}
                {% endif %}
/ vrf main interface vxlan vx{{subnet.spec.identifier}} description "subnet {{subnet.metadata.name}}"
/ vrf main interface vxlan vx{{subnet.spec.identifier}} mtu 1500
/ vrf main interface vxlan vx{{subnet.spec.identifier}} vni {{subnet.spec.identifier}}
/ vrf main interface vxlan vx{{subnet.spec.identifier}} local 10.0.0.{{hna_id}}
/ vrf main interface vxlan vx{{subnet.spec.identifier}} link-interface loop0
/ vrf main interface vxlan vx{{subnet.spec.identifier}} learning false
/ vrf main l3vrf vrf{{tenant.spec.identifier}} interface bridge br{{subnet.spec.identifier}} link-interface vx{{subnet.spec.identifier}} learning false
            {% endfor %}
        {% endfor %}

{# For each tenant and subnet, configure the ACLs #}
/ system network-stack bridge call-ipv4-filtering true
/ vrf main firewall ipv4 filter forward policy drop
        {% for tenant in tenants.values()
           if tenant.metadata.name in local_tenants or tenant.metadata.name in local_intertenants %}
# -- Tenant {{tenant.metadata.name}}
            {% for subnet in subnets.values()
               if subnet.metadata.namespace == tenant.metadata.name %}
# - Subnet {{subnet.metadata.name}}
# filter by interface address
                {% if tenant.metadata.name in local_tenants and
                   subnet.metadata.name in local_tenants[tenant.metadata.name] %}
                    {% for link_iface in local_tenants[tenant.metadata.name][subnet.metadata.name].values() %}
/ vrf main firewall ipv4 mangle prerouting rule {{subnet.spec.identifier * 1000 + loop.index-}}
  {{' '}}inbound-bridge-port {{link_iface.name-}}
  {{' '}}source address not {{link_iface.ip-}}
  {{' '}}action drop
                    {% endfor %}
                {% endif %}
# intra-node traffic
                {% for acl in tenant.spec.get("acls", []) %}
/ vrf main firewall ipv4 filter forward rule {{subnet.spec.identifier * 1000 + 500 + loop.index-}}
  {{' '}}inbound-interface br{{subnet.spec.identifier-}}
  {% if acl.source or acl.sport %} source{% endif -%}
  {% if acl.source %} address {{acl.source}}{% endif -%}
  {% if acl.sport %} port {{acl.sport}}{% endif -%}
  {% if acl.destination or acl.dport %} destination{% endif -%}
  {% if acl.destination %} address {{acl.destination}}{% endif -%}
  {% if acl.dport %} port {{acl.dport}}{% endif -%}
  {% if acl.protocol %} protocol {{acl.protocol}}{% endif -%}
  {% if acl.conntrack %} conntrack state {{acl.conntrack}}{% endif -%}
  {{' '}}action {{acl.action}}
                {% endfor %}
/ vrf main firewall ipv4 filter forward rule {{subnet.spec.identifier * 1000 + 999-}}
  {{' '}}inbound-interface br{{subnet.spec.identifier-}}
  {{' '}}action {{tenant.spec.acl_policy}}
            {% endfor %}
# inter-node traffic
            {% for acl in tenant.spec.get("acls", []) %}
/ vrf main firewall ipv4 filter forward rule {{tenant.spec.identifier * 100000 + 99000 + loop.index-}}
  {{' '}}inbound-interface vrf{{tenant.spec.identifier-}}
  {% if acl.source or acl.sport %} source{% endif -%}
  {% if acl.source %} address {{acl.source}}{% endif -%}
  {% if acl.sport %} port {{acl.sport}}{% endif -%}
  {% if acl.destination or acl.dport %} destination{% endif -%}
  {% if acl.destination %} address {{acl.destination}}{% endif -%}
  {% if acl.dport %} port {{acl.dport}}{% endif -%}
  {% if acl.protocol %} protocol {{acl.protocol}}{% endif -%}
  {% if acl.conntrack %} conntrack state {{acl.conntrack}}{% endif -%}
  {{' '}}action {{acl.action}}
            {% endfor %}
/ vrf main firewall ipv4 filter forward rule {{tenant.spec.identifier * 100000 + 99999-}}
  {{' '}}inbound-interface vrf{{tenant.spec.identifier-}}
  {{' '}}action {{tenant.spec.acl_policy}}
        {% endfor %}

{# Configure route leaks from intertenant CRD #}
# Inter-tenants
        {% for intertenant in intertenants.values()
           if (intertenant.spec.tenant1 in local_tenants or intertenant.spec.tenant2 in local_tenants)
           and intertenant.spec.tenant1 in tenants
           and intertenant.spec.tenant2 in tenants
           and intertenant.spec.tenant1 + "/" + intertenant.spec.subnet1 in subnets
           and intertenant.spec.tenant2 + "/" + intertenant.spec.subnet2 in subnets %}
            {% set tenant1 = tenants[intertenant.spec.tenant1] %}
            {% set tenant2 = tenants[intertenant.spec.tenant2] %}
            {% set subnet1 = subnets[intertenant.spec.tenant1 + "/" + intertenant.spec.subnet1] %}
            {% set subnet2 = subnets[intertenant.spec.tenant2 + "/" + intertenant.spec.subnet2] %}
# {{intertenant.spec.tenant1}}/{{intertenant.spec.subnet1}} <-> {{intertenant.spec.tenant2}}/{{intertenant.spec.subnet2}}
/ routing ipv4-prefix-list pl{{tenant1.spec.identifier}}-export seq {{subnet1.spec.identifier}} address {{subnet1.spec.network}}/{{subnet1.spec.prefixlen}} policy permit
/ routing ipv4-prefix-list pl{{tenant1.spec.identifier}}-import seq {{subnet1.spec.identifier}} address {{subnet2.spec.network}}/{{subnet2.spec.prefixlen}} policy permit
/ routing ipv4-prefix-list pl{{tenant2.spec.identifier}}-export seq {{subnet2.spec.identifier}} address {{subnet2.spec.network}}/{{subnet2.spec.prefixlen}} policy permit
/ routing ipv4-prefix-list pl{{tenant2.spec.identifier}}-import seq {{subnet2.spec.identifier}} address {{subnet1.spec.network}}/{{subnet1.spec.prefixlen}} policy permit
/ routing route-map rm{{tenant1.spec.identifier}}-export seq {{subnet1.spec.identifier}} policy permit
/ routing route-map rm{{tenant1.spec.identifier}}-export seq {{subnet1.spec.identifier}} match ip address prefix-list pl{{tenant1.spec.identifier}}-export
/ routing route-map rm{{tenant1.spec.identifier}}-import seq {{subnet1.spec.identifier}} policy permit
/ routing route-map rm{{tenant1.spec.identifier}}-import seq {{subnet1.spec.identifier}} match ip address prefix-list pl{{tenant1.spec.identifier}}-import
/ routing route-map rm{{tenant1.spec.identifier}}-import seq {{subnet1.spec.identifier}} match source-l3vrf vrf{{tenant2.spec.identifier}}
/ routing route-map rm{{tenant2.spec.identifier}}-export seq {{subnet2.spec.identifier}} policy permit
/ routing route-map rm{{tenant2.spec.identifier}}-export seq {{subnet2.spec.identifier}} match ip address prefix-list pl{{tenant2.spec.identifier}}-export
/ routing route-map rm{{tenant2.spec.identifier}}-import seq {{subnet2.spec.identifier}} policy permit
/ routing route-map rm{{tenant2.spec.identifier}}-import seq {{subnet2.spec.identifier}} match ip address prefix-list pl{{tenant2.spec.identifier}}-import
/ routing route-map rm{{tenant2.spec.identifier}}-import seq {{subnet2.spec.identifier}} match source-l3vrf vrf{{tenant1.spec.identifier}}
/ vrf main l3vrf vrf{{tenant1.spec.identifier}} routing bgp address-family l2vpn-evpn advertisement ipv4-unicast route-map rm{{tenant1.spec.identifier}}-export
/ vrf main l3vrf vrf{{tenant1.spec.identifier}} routing bgp address-family ipv4-unicast l3vrf import l3vrf vrf{{tenant2.spec.identifier}}
/ vrf main l3vrf vrf{{tenant1.spec.identifier}} routing bgp address-family ipv4-unicast l3vrf import route-map rm{{tenant1.spec.identifier}}-import
/ vrf main l3vrf vrf{{tenant2.spec.identifier}} routing bgp address-family l2vpn-evpn advertisement ipv4-unicast route-map rm{{tenant2.spec.identifier}}-export
/ vrf main l3vrf vrf{{tenant2.spec.identifier}} routing bgp address-family ipv4-unicast l3vrf import l3vrf vrf{{tenant1.spec.identifier}}
/ vrf main l3vrf vrf{{tenant2.spec.identifier}} routing bgp address-family ipv4-unicast l3vrf import route-map rm{{tenant2.spec.identifier}}-import
        {% endfor %}

# Loopback
/ vrf main interface loopback loop0 ipv4 address 10.0.0.{{hna_id}}/32
/ vrf main interface loopback loop0 mtu 1600

# BGP
/ vrf main routing bgp as 65500
/ vrf main routing bgp router-id 10.0.0.{{hna_id}}
/ vrf main routing bgp ebgp-requires-policy false
/ vrf main routing bgp address-family l2vpn-evpn advertise-all-vni true
        {% for pci_iface in pci_ifaces %}
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} remote-as 65500
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} neighbor-description leaf-{{loop.index}}
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} update-source loop0
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} address-family ipv4-unicast enabled false
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} address-family ipv6-unicast enabled false
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} address-family l2vpn-evpn
/ vrf main routing bgp neighbor 10.0.0.{{200 + loop.index}} track bfd
        {% endfor %}

# OSPF
/ vrf main routing ospf router-id 10.0.0.{{hna_id}}
/ vrf main routing ospf network 10.0.0.0/16 area 0
        {% for pci_iface in pci_ifaces %}
/ vrf main routing interface {{pci_iface}} ip ospf track bfd
        {% endfor %}
    {% endif %}

# KPIs
    {% for pci_iface in pci_ifaces %}
/ vrf main kpi telegraf metrics monitored-interface vrf main name {{pci_iface}}
    {% endfor %}
    {% for hna_net_iface in hna_net_ifaces %}
/ vrf main kpi telegraf metrics monitored-interface vrf main name {{hna_net_iface}}
    {% endfor %}
/ vrf main kpi telegraf metrics metric network-nic-traffic-stats enabled true period 3
/ vrf main kpi telegraf interval 5
/ vrf main kpi telegraf influxdb-output url http://influxdb.monitoring:8086 database telegraf

To apply the template, run the following command:

root@node1:~# kubectl create configmap -n hna-operator hna-template --from-file=config.nc-cli=/root/hna-config-template.nc-cli

Network segmentation using CRDs

The tenants, subnets, and inter-tenant connections are configured through standard Kubernetes CRDs. This document gives an example of CRDs, but it is up to the user to define its own CRDs, containing the information required for its use-case. These CRDs are used in the Jinja template described in the previous section.

These CRDs are designed to configure the connection of the CNF pods as below:

_images/logical-network.svg

Logical view of the network.

Tenant CRD

The tenant CRD describes a tenants, and the ACLs that applies to it (download link):

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: tenants.hna.6wind.com
spec:
  group: hna.6wind.com
  scope: Namespaced
  names:
    plural: tenants
    singular: tenant
    kind: Tenant
  versions:
    - name: v1
      served: true
      storage: true
      schema:
        openAPIV3Schema:
          type: object
          properties:
            spec:
              type: object
              properties:
                identifier:
                  type: integer
                  description: A unique identifier for the tenant. It is used in the network configuration for the l3vni and l3vrf table-id.
                  minimum: 100
                  maximum: 999
                description:
                  type: string
                  description: A text description giving details about this tenant
                acl_policy:
                  type: string
                  enum: [accept, drop]
                  description: The default action if no ACL match
                acls:
                  type: array
                  description: A list of ACLs applied to this tenant
                  items:
                    type: object
                    properties:
                      source:
                        type: string
                        description: The optional IPv4 source address or network
                      destination:
                        type: string
                        description: The optional IPv4 destination address or network
                      sport:
                        type: integer
                        minimum: 1
                        maximum: 65534
                        description: The optional source port
                      dport:
                        type: integer
                        minimum: 1
                        maximum: 65534
                        description: The optional destination port
                      action:
                        type: string
                        enum: [accept, drop]
                        description: The action to execute for this ACL
                      conntrack:
                        type: string
                        enum: [new, established, related, invalid]
                        description: The optional conntrack state
                      protocol:
                        type: string
                        enum: [ah, esp, gre, icmp, ipip, l2tp, sctp, tcp, udp, vrrp]
                        description: The optional protocol
                      description:
                        type: string
                        description: A text description giving details about this ACL
                    required:
                      - action
              required:
                - acl_policy
      selectableFields:
        - jsonPath: .spec.identifier
      additionalPrinterColumns:
        - jsonPath: .spec.identifier
          name: Identifier
          type: integer
---
apiVersion: "hna.6wind.com/v1"
kind: Tenant
metadata:
  name: green
spec:
  description: "For the green tenant, everything is allowed inside 10.60.
    A VM on 10.60 can connect to the port 5000 of a machine on 10.61.
    A VM on 10.60 can also connect to the 10.62 of the red tenant, on port 6000."
  identifier: 100
  acl_policy: drop
  acls:
    - source: "10.60.0.0/16"
      destination: "10.60.0.0/16"
      action: "accept"
      description: "Accept all traffic on the 10.60 subnet"
    - source: "10.61.0.0/16"
      destination: "10.60.0.0/16"
      protocol: "icmp"
      action: "accept"
      description: "Accept icmp from 10.60 to 10.61"
    - source: "10.60.0.0/16"
      destination: "10.61.0.0/16"
      protocol: "icmp"
      action: "accept"
      description: "Accept icmp from 10.61 to 10.60"
    - source: "10.60.0.0/16"
      destination: "10.61.0.0/16"
      protocol: "tcp"
      dport: 5000
      conntrack: "new"
      action: "accept"
      description: "Accept TCP port 5000 new connections from 10.60 to 10.61"
    - source: "10.62.0.0/16"
      destination: "10.60.0.0/16"
      protocol: "icmp"
      action: "accept"
      description: "Accept icmp from the 10.62 subnet of the red tenant to the 10.60"
    - source: "10.60.0.0/16"
      destination: "10.62.0.0/16"
      protocol: "icmp"
      action: "accept"
      description: "Accept icmp from the 10.60 to the 10.62 subnet of the red tenant"
    - source: "10.60.0.0/16"
      destination: "10.62.0.0/16"
      protocol: "tcp"
      dport: 6000
      conntrack: "new"
      action: "accept"
      description: "Accept TCP port 6000 new connections from 10.60 to 10.62 of the red tenant"
    - conntrack: "established"
      action: "accept"
      description: "Accept established connections"
---
apiVersion: "hna.6wind.com/v1"
kind: Tenant
metadata:
  name: red
spec:
  description: "For the red tenant, there is no ACL: everything is accepted, including inter-tenant."
  identifier: 101
  acl_policy: accept

Apply it like this:

admin@k8s:~$ kubectl create -f tenant-crd.yaml

The tenant CRDs must be configured in the default namespace.

Subnet CRD

The subnet CRD describes the subnets that are attached to a tenant (download link):

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: subnets.hna.6wind.com
spec:
  group: hna.6wind.com
  scope: Namespaced
  names:
    plural: subnets
    singular: subnet
    kind: Subnet
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              network:
                type: string
                description: The IPv4 network subnet.
              prefixlen:
                type: integer
                description: The IPv4 prefix length.
                minimum: 16
                maximum: 32
              gateway:
                type: string
                description: The IPv4 address of the gateway on the subnet.
              identifier:
                type: integer
                description: A unique identifier for the subnet. It is used in the network configuration for the l2vni.
                minimum: 10000
                maximum: 99999
              description:
                type: string
                description: A text description giving details about this subnet
    selectableFields:
    - jsonPath: .spec.network
    - jsonPath: .spec.prefixlen
    - jsonPath: .spec.gateway
    - jsonPath: .spec.identifier
    additionalPrinterColumns:
    - jsonPath: .spec.network
      name: Network
      type: string
    - jsonPath: .spec.prefixlen
      name: Prefixlen
      type: integer
    - jsonPath: .spec.gateway
      name: Gateway
      type: string
    - jsonPath: .spec.identifier
      name: Identifier
      type: integer
---
apiVersion: "hna.6wind.com/v1"
kind: Subnet
metadata:
  name: green-10-60
  namespace: green
spec:
  network: "10.60.0.0"
  prefixlen: 16
  gateway: "10.60.0.254"
  identifier: 10000
  description: "Frontend network."
---
apiVersion: "hna.6wind.com/v1"
kind: Subnet
metadata:
  name: green-10-61
  namespace: green
spec:
  network: "10.61.0.0"
  prefixlen: 16
  gateway: "10.61.0.254"
  identifier: 10001
  description: "Backend network."
---
apiVersion: "hna.6wind.com/v1"
kind: Subnet
metadata:
  name: red-10-61
  namespace: red
spec:
  network: "10.61.0.0"
  prefixlen: 16
  gateway: "10.61.0.254"
  identifier: 10101
  description: "infra network."
---
apiVersion: "hna.6wind.com/v1"
kind: Subnet
metadata:
  name: red-10-62
  namespace: red
spec:
  network: "10.62.0.0"
  prefixlen: 16
  gateway: "10.62.0.254"
  identifier: 10100
  description: "Another infra network."

Apply it like this:

admin@k8s:~$ kubectl create -f subnet-crd.yaml

The subnet CRDs must be configured in the tenant namespace.

Inter-tenant CRD

The inter-tenant CRD describes how 2 tenants are connected together (download link):

---
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: intertenants.hna.6wind.com
spec:
  group: hna.6wind.com
  scope: Namespaced
  names:
    plural: intertenants
    singular: intertenant
    kind: Intertenant
  versions:
  - name: v1
    served: true
    storage: true
    schema:
      openAPIV3Schema:
        type: object
        properties:
          spec:
            type: object
            properties:
              tenant1:
                type: string
                description: The name of the first tenant
              tenant2:
                type: string
                description: The name of the second tenant
              subnet1:
                type: string
                description: The name of the first tenant subnet
              subnet2:
                type: string
                description: The name of the second tenant subnet
              description:
                type: string
                description: A text description giving details about this intertenant connection
            required:
              - tenant1
              - tenant2
              - subnet1
              - subnet2
    selectableFields:
    - jsonPath: .spec.tenant1
    - jsonPath: .spec.tenant2
    - jsonPath: .spec.subnet1
    - jsonPath: .spec.subnet2
    additionalPrinterColumns:
    - jsonPath: .spec.tenant1
      name: Tenant1
      type: string
    - jsonPath: .spec.tenant2
      name: Tenant2
      type: string
    - jsonPath: .spec.subnet1
      name: Subnet1
      type: string
    - jsonPath: .spec.subnet2
      name: Subnet2
      type: string
---
apiVersion: "hna.6wind.com/v1"
kind: Intertenant
metadata:
  name: green-10-60-to-red-10-62
spec:
  tenant1: "green"
  tenant2: "red"
  subnet1: "green-10-60"
  subnet2: "red-10-62"
  description: "Green to red inter-tenant."

Apply it like this:

admin@k8s:~$ kubectl create -f intertenant-crd.yaml

The intertenant CRDs must be configured in the default namespace.

HNA operator configuration

The use of CRDs implies few modifications in the default HNA operator configuration.

Update cluster role

A default ClusterRole called hna-operator-cluster-role, grants the HNA operator the authorizations that are required to monitor Kubernetes objects.

To monitor these new CRDs, it has to be updated, as below (download link):

[
  {
    "op": "add",
    "path": "/rules/-",
    "value": {
      "apiGroups": ["hna.6wind.com"],
      "resources": ["tenants", "subnets", "intertenants"],
      "verbs": ["get", "watch", "list"]
    }
  }
]

Apply it like this:

admin@k8s:~$ kubectl patch clusterrole hna-operator-cluster-role --type=json --patch-file=clusterrole-patch.json

Patch HNA operator daemonset

To instruct the operator to monitor these new CRDs, the hna-operator daemonset must be patched (download link):

spec:
  template:
    spec:
      containers:
      - name: hna-operator
        command:
        - /hna-operator
        - --log-level
        - INFO
        - --hna-pod-selector
        - role=hna
        - --watch-crd
        - api=hna.6wind.com/v1,kind=Subnet,namespace=*,alias=subnets
        - --watch-crd
        - api=hna.6wind.com/v1,kind=Tenant,namespace=default,alias=tenants
        - --watch-crd
        - api=hna.6wind.com/v1,kind=Intertenant,namespace=default,alias=intertenants

Apply it like this:

admin@k8s:~$ kubectl patch -n hna-operator daemonset hna-operator --patch-file=/root/hna-operator-patch.yaml

CNFs configuration

Namespaces

Create the green and red namespaces where the CNF pods will be spawned.

root@node1:~# kubectl create namespace green
namespace/green created
root@node1:~# kubectl create namespace red
namespace/red created

Boostrap configuration

The way CNFs are configured in a production environment is out-of-scope of this document.

In this deployment example, the CNFs run a Virtual Service Router. The configuration is generated thanks to an initContainer that runs the below script, which generates a startup configuration inside the container in /etc/init-config/config.cli, based on environment variables passed by Kubernetes. This CLI file is automatically applied when the Virtual Service Router container starts.

To demonstrate the 2 interface kinds, the red pods use veth based interfaces, while green ones use virtio based interfaces. The dataplane IP addresses are simply generated using the pod identifier.

The cnf-bootstrap.py python script (download link):

#!/usr/bin/env python3
# Copyright 2025 6WIND S.A.

"""
This script is exported by Kubernetes in the VSR filesystem for the greenX and redX pods. It
is used to generate the startup configuration.
"""

import json
import os
import re
import subprocess
import sys

BUS_ADDR_RE = re.compile(r'''
    ^
    (?P<domain>([\da-f]+)):
    (?P<bus>([\da-f]+)):
    (?P<slot>([\da-f]+))\.
    (?P<func>(\d+))
    $
    ''', re.VERBOSE | re.IGNORECASE)

ADDRS = {
    "green1": "10.60.0.1",
    "green2": "10.60.0.2",
    "green3": "10.61.0.3",
    "red1": "10.62.0.1",
    "red2": "10.62.0.2",
    "red3": "10.61.0.3",
}

def bus_addr_to_name(bus_addr):
    """
    Convert a PCI bus address into a port name as used in nc-cli.
    """
    match = BUS_ADDR_RE.match(bus_addr)
    if not match:
        raise ValueError('pci bus address %s does not match regexp' % bus_addr)

    d = match.groupdict()
    domain = int(d['domain'], 16)
    bus = int(d['bus'], 16)
    slot = int(d['slot'], 16)
    func = int(d['func'], 10)

    name = 'pci-'
    if domain != 0:
        name += 'd%d' % domain
    name += 'b%ds%d' % (bus, slot)
    if func != 0:
        name += 'f%d' % func

    return name

def get_env_vm():
    with open('/run/init-env.json', encoding='utf-8') as f:
        env = json.load(f)
    env['HNA_IFNAME'] = subprocess.run(
        "ip -json -details link | jq --raw-output "
        "'.[] | select(has(\"linkinfo\") | not) | "
        "select(.address | match(\"00:09:c0\")) | .ifname'",
        shell=True, check=True, capture_output=True, text=True).stdout.strip()
    pci_addr = subprocess.run(
        rf"ethtool -i {env['HNA_IFNAME']} | sed -n 's,^bus-info: \(.*\)$,\1,p'",
        shell=True, check=True, capture_output=True, text=True).stdout.strip()
    env['HNA_PCIADDR'] = bus_addr_to_name(pci_addr)
    return env

def get_env_container():
    if os.getpid() == 1:
        env = dict(os.environ)
    else:
        with open('/proc/1/environ', encoding='utf-8') as f:
            data = f.read()
        env = dict((var.split('=') for var in data.split('\x00') if var))

    env['K8S_POD_ID'] = int(re.sub('[^0-9]', '', env['K8S_POD_ROLE']))
    env['DEFAULT_ROUTE'] = subprocess.run(
        'ip -j route get 8.8.8.8 | jq -r .[0].gateway', shell=True,
        check=True, capture_output=True, text=True).stdout.strip()
    env['VETH_INFRA_ID'] = subprocess.run(
        "ip -j link | jq --raw-output "
        "'(.[] | select(.ifname | match(\"veth-[0-9a-f]{10}\"))) | .ifalias'",
        shell=True, check=True, capture_output=True, text=True).stdout.strip()
    return env

def get_env():
    if os.path.exists('/run/init-env.json'):
        env = get_env_vm()
    else:
        env = get_env_container()
    env['K8S_POD_ID'] = int(re.sub('[^0-9]', '', env['K8S_POD_ROLE']))
    return env

def gen_green_config(env):
    mac = f"de:ad:de:80:00:{env['K8S_POD_ID']:02x}"
    addr = ADDRS.get(env['K8S_POD_ROLE'])
    gw = re.sub("[0-9]+$", "254", addr)
    conf = ""
    if 'HNA_PCIADDR' in env:
        conf += """\
/ vrf main interface physical eth1 ethernet mac-address {mac}
/ vrf main interface physical eth1 port {HNA_PCIADDR}
/ vrf main interface physical eth1 ipv4 address {addr}/16
/ system fast-path port {HNA_PCIADDR}
"""
    else:
        conf += """\
/ vrf main interface fpvirtio eth1 ethernet mac-address {mac}
/ vrf main interface fpvirtio eth1 port fpvirtio-0
/ vrf main interface fpvirtio eth1 ipv4 address {addr}/16
/ system fast-path virtual-port fpvirtio fpvirtio-0
/ system fast-path max-virtual-ports 1
"""
    conf += """\
/ system fast-path advanced machine-memory 2048
/ system fast-path advanced power-mode eco
/ system license online serial HIDDEN
/ vrf main routing static ipv4-route 10.0.0.0/8 next-hop {gw}
"""

    return conf.format(**env, mac=mac, addr=addr, gw=gw)

def gen_red_config(env):
    mac = f"de:ad:de:80:01:{env['K8S_POD_ID']:02x}"
    addr = ADDRS.get(env['K8S_POD_ROLE'])
    gw = re.sub("[0-9]+$", "254", addr)
    return """\
cmd license file import content {license_data} serial {license_serial} | ignore-error
/ vrf main interface infrastructure eth1 ethernet mac-address {mac}
/ vrf main interface infrastructure eth1 port {VETH_INFRA_ID}
/ vrf main interface infrastructure eth1 ipv4 address {addr}/16
/ system fast-path virtual-port infrastructure {VETH_INFRA_ID}
/ system fast-path advanced machine-memory 2048
/ system fast-path advanced power-mode eco
/ system license online serial HIDDEN
/ vrf main routing static ipv4-route 10.0.0.0/8 next-hop {gw}
""".format(**env, mac=mac, addr=addr, gw=gw)

def gen_config():
    env = get_env()
    if 'green' in env['K8S_POD_ROLE']:
        return gen_green_config(env)
    return gen_red_config(env)

def main():
    config = gen_config()
    if config[-1] != '\n':
        config += '\n'

    os.makedirs('/etc/init-config', exist_ok=True)
    with open('/etc/init-config/config.cli', 'w', encoding='utf-8') as f:
        f.write(config)
    if os.getpid() != 1:
        sys.stdout.write(config)

    return 0

if __name__ == '__main__':
    sys.exit(main())

Note

Take care to at least update the license serial.

To store this in a ConfigMap, run the following command on the Kubernetes control plane:

root@node1:~# kubectl create configmap -n green cnf-bootstrap-config --from-file=cnf-bootstrap.py=/root/cnf-bootstrap.py
root@node1:~# kubectl create configmap -n red cnf-bootstrap-config --from-file=cnf-bootstrap.py=/root/cnf-bootstrap.py

CNF Deployment

Now you can deploy the CNF Pods: green1, green2, green3, red1, red2 and red3. The green* Pods use a virtio connection, while the red* Pods use a veth connection.

In this document, we use a deployment file for each CNF to ease the placement of Pods on the different nodes: green1, green2, and red1 will have an affinity to node1, while the other ones will have an affinity to node2.

Some labels are set in the deployment file and will be added to the CNF pod, as they are used by the Jinja configuration template:

  • “tenant”: the name of the tenant for this pod, it must be the same as the pod namespace.

  • “subnet”: the name of the subnet for this pod.

  • “ip”: the IPv4 address affected to this pod.

The template also requires that the pod resides in a namespace that corresponds to its tenant (i.e. namespace name is the same than tenant name).

The content of the deployment file deploy-green1.yaml is shown below (download link):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: green1
  namespace: green
spec:
  replicas: 1
  selector:
    matchLabels:
      role: green1
  template:
    metadata:
      labels:
        role: green1
        tenant: green
        subnet: green-10-60
        ip: 10.60.0.1
      annotations:
        k8s.v1.cni.cncf.io/networks: default/multus-hna-virtio-user
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/hostname
                operator: In
                values:
                - node1
      restartPolicy: Always
      securityContext:
        appArmorProfile:
          type: Unconfined
        sysctls:
        - name: net.ipv4.conf.default.disable_policy
          value: "1"
        - name: net.ipv4.ip_local_port_range
          value: "30000 40000"
        - name: net.ipv4.ip_forward
          value: "1"
        - name: net.ipv6.conf.all.forwarding
          value: "1"
        - name: net.netfilter.nf_conntrack_events
          value: "1"
      initContainers:
      - name: bootstrap
        image: download.6wind.com/vsr/x86_64-ce/3.12:3.12.0.ga
        command: ["/sbin/bootstrap"]
        resources:
          limits:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            nc-k8s-plugin.6wind.com/virtio-user: 1
          requests:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            nc-k8s-plugin.6wind.com/virtio-user: 1
        env:
        - name: K8S_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: K8S_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: K8S_POD_ROLE
          value: green1
        - name: K8S_POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: K8S_POD_CPU_REQUEST
          valueFrom:
            resourceFieldRef:
              resource: requests.cpu
        - name: K8S_POD_MEM_REQUEST
          valueFrom:
            resourceFieldRef:
              resource: requests.memory
        volumeMounts:
        - mountPath: /sbin/bootstrap
          subPath: cnf-bootstrap.py
          name: bootstrap
        - mountPath: /etc/init-config
          name: init-config
      containers:
      - image: download.6wind.com/vsr/x86_64-ce/3.12:3.12.0.ga
        imagePullPolicy: IfNotPresent
        name: green1
        startupProbe:
          exec:
            command: ["bash", "-c", "/bin/startup-probe"]
          initialDelaySeconds: 10
          failureThreshold: 20
          periodSeconds: 10
          timeoutSeconds: 9
        resources:
          limits:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            smarter-devices/ppp: 1
            smarter-devices/vhost-net: 1
            smarter-devices/net_tun: 1
            nc-k8s-plugin.6wind.com/virtio-user: 1
          requests:
            cpu: "2"
            memory: 2048Mi
            hugepages-2Mi: 1024Mi
            smarter-devices/ppp: 1
            smarter-devices/vhost-net: 1
            smarter-devices/net_tun: 1
            nc-k8s-plugin.6wind.com/virtio-user: 1
        env:
        - name: K8S_POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        securityContext:
          capabilities:
            add: ["NET_ADMIN", "NET_RAW", "SYS_ADMIN", "SYS_NICE", "IPC_LOCK", "NET_BROADCAST", "SYSLOG", "SYS_TIME"
                 , "SYS_RAWIO", "SYS_CHROOT"
                 ]
        volumeMounts:
        - mountPath: /dev/hugepages
          name: hugepage
        - mountPath: /dev/shm
          name: shm
        - mountPath: /tmp
          name: tmp
        - mountPath: /run
          name: run
        - mountPath: /run/lock
          name: run-lock
        - mountPath: /bin/startup-probe
          subPath: startup-probe.sh
          name: startup-probe
        - mountPath: /etc/init-config
          name: init-config
        stdin: true
        tty: true
      imagePullSecrets:
      - name: regcred
      volumes:
      - emptyDir:
          medium: HugePages
          sizeLimit: 2Gi
        name: hugepage
      - name: shm
        emptyDir:
          sizeLimit: "2Gi"
          medium: "Memory"
      - emptyDir:
          sizeLimit: "500Mi"
          medium: "Memory"
        name: tmp
      - emptyDir:
          sizeLimit: "200Mi"
          medium: "Memory"
        name: run
      - emptyDir:
          sizeLimit: "200Mi"
          medium: "Memory"
        name: run-lock
      - name: bootstrap
        configMap:
          name: cnf-bootstrap-config
          defaultMode: 0500
      - name: startup-probe
        configMap:
          name: startup-probe
          defaultMode: 0500
      - name: init-config
        emptyDir:
          sizeLimit: "10Mi"
          medium: "Memory"

To apply the deployment file, run the following command:

root@node1:~# kubectl apply -f deploy-green1.yaml

After some time, the pod should be visible as “Running”:

root@node1:~# kubectl get pod -n green
NAME                      READY   STATUS    RESTARTS   AGE
green1-75667cbd6f-8kn74   1/1     Running   0          39s

Login to the pod with kubectl exec -n green -it POD_NAME -- login (admin/admin is the default login/password), and list interfaces:

green1-75667cbd6f-8kn74> show interface
Name   State L3vrf   IPv4 Addresses  IPv6 Addresses               Description
====   ===== =====   ==============  ==============               ===========
lo     UP    default 127.0.0.1/8     ::1/128                      loopback_main
eth0   UP    default 10.229.0.126/24 fe80::c437:61ff:feb5:165c/64 infra-eth0
eth1   UP    default 10.60.0.1/16    fe80::dcad:deff:fe80:1/64
fptun0 UP    default                 fe80::6470:74ff:fe75:6e30/64
  • eth0 is the primary CNI

  • eth1 is the virtio interface connected to the HNA

Note

eth1 may take some time to appear, since it requires to start the fast path.

The content of other deployment files is very similar (only changes are the pod name, the node affinity, the tenant, and the hna_net kind). Here are the download links for each of them:

VNF Deployment with Kubevirt

KubeVirt is an open-source project that lets you run VMs alongside containers in a Kubernetes cluster. You can use KubeVirt to deploy your network function as a VM, and connect it to the HNA using Virtio interfaces:

  • on VNF side, a Virtio PCI interface will be used,

  • on HNA side, a Vhost-user interface will be used.

Only Virtio is supported by the HNA CNI when using a VM. The use of Veth interfaces is not possible. So in our example, only the green pods can be instantiated as a VM.

This section explains how to deploy your Virtual Service Router as a VNF and connect it to the HNA. It requires the installation of a hook sidecar script, whose role is to add the VNF Virtio PCI ports connected to the HNA into the VM configuration, by modifying the libvirt XML domain description.

See also

  • Refer to the KubeVirt Installation section of the 6WIND HNA documentation for details about KubeVirt installation and configuration for HNA.

  • Refer to the kubevirt section of the nc-k8s-plugin documentation to deploy the hook sidecar script.

Load the hook sidecar script retrieved from nc-k8s-plugin documentation into a ConfigMap:

# kubectl create configmap -n green kubevirt-sidecar --from-file=kubevirt_sidecar.py=/path/to/kubevirt_sidecar.py
# kubectl create configmap -n red kubevirt-sidecar --from-file=kubevirt_sidecar.py=/path/to/kubevirt_sidecar.py

Then, create a new NetworkAttachmentDefinition with the following content (download link):

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: multus-hna-virtio-user-kubevirt
  annotations:
    k8s.v1.cni.cncf.io/resourceName: nc-k8s-plugin.6wind.com/virtio-user
spec:
  config: '{
  "cniVersion": "1.0.0",
  "name": "multus-hna-virtio-user-kubevirt",
  "type": "hna-cni",
  "kind": "virtio-user",
  "capabilities": {"CNIDeviceInfoFile": true, "deviceID": true},
  "log-level": "INFO",
  "log-file": "stderr",
  "userdata": {
    "socket_mode" : "server"
  }
}'

This NetworkAttachmentDefinition is similar to the default one provided in nc-k8s-plugin, except that it includes a userdata specifying a socket mode. This user data is used by the HNA configuration template.

To run a VM, a VirtualMachine is expected by KubeVirt. The content of this file, deploy-kubevirt-green1.yaml is shown below (download link):

apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  name: green1
  namespace: green
spec:
  runStrategy: Always
  template:
    metadata:
      annotations:
        hooks.kubevirt.io/hookSidecars:  >
          [
            {
              "args": ["--version", "v1alpha3"],
              "image": "quay.io/kubevirt/sidecar-shim:v1.5.2",
              "configMap": {"name": "kubevirt-sidecar", "key": "kubevirt_sidecar.py", "hookPath": "/usr/bin/onDefineDomain"}
            }
          ]
        k8s.v1.cni.cncf.io/networks: default/multus-hna-virtio-user-kubevirt
      labels:
        kubevirt.io/domain: green1
        role: green1
        tenant: green
        subnet: green-10-60
        ip: 10.60.0.1
    spec:
      nodeSelector:
        kubernetes.io/hostname: vm-k8s-hypervisor
      domain:
        cpu:
          sockets: 1
          cores: 1
          threads: 2
          dedicatedCpuPlacement: true
        devices:
          disks:
            - name: ctdisk
              disk: {}
          filesystems:
            - name: bootstrap
              virtiofs: {}
          interfaces:
            - name: default
              macAddress: de:ad:de:01:02:03
              masquerade: {}
        resources:
          requests:
            memory: 2048Mi
            nc-k8s-plugin.6wind.com/virtio-user: '1'
          limits:
            memory: 2048Mi
            nc-k8s-plugin.6wind.com/virtio-user: '1'
        memory:
          hugepages:
            pageSize: "2Mi"
      networks:
        - name: default
          pod: {}
      volumes:
      - name: ctdisk
        containerDisk:
          image: download.6wind.com/vsr/x86_64/3.12:3.12.0.ga
      - name: bootstrap
        configMap:
          name: cnf-bootstrap-config
      - name: cloudinitdisk
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            bootcmd:
              - "echo '{ \"K8S_POD_ROLE\": \"green1\" }' > /run/init-env.json"
              - "mkdir /run/bootstrap_script"
              - "mount -t virtiofs bootstrap /run/bootstrap_script"
              - "mkdir /etc/init-config"
              - "python3 /run/bootstrap_script/cnf-bootstrap.py"

To apply the deployment file, run the following command:

root@node1:~# kubectl apply -f deploy-kubevirt-green1.yaml

After some time, the pod should be visible as “Running”. Note that KubeVirt creates several containers in the Pod (here, 6):

root@node1:~# kubectl get pod -n green
NAME                         READY   STATUS    RESTARTS   AGE
virt-launcher-green1-7nwdj   6/6     Running   0          20m

Login to the pod with virtctl console green1 (admin/admin is the default login/password), and list interfaces:

green1-vm-kubevirt> show interface
Name   State L3vrf   IPv4 Addresses IPv6 Addresses               Description
====   ===== =====   ============== ==============               ===========
lo     UP    default 127.0.0.1/8    ::1/128                      loopback_main
eth0   UP    default 10.0.2.2/24    fe80::dcad:deff:fe01:203/64
eth1   UP    default 10.60.0.1/16   fe80::dcad:deff:fe80:1/64
fptun0 UP    default                fe80::6470:74ff:fe75:6e30/64
  • eth0 is the primary CNI

  • eth1 is the virtio interface connected to the HNA

Note

eth1 may take some time to appear, since it requires to start the fast path.