2.3.3. Run container using Kubernetes¶
This section describes how to install a minimal Kubernetes cluster (one pod, running on the same node as the controller), while prodiving SR-IOV interfaces to the container. It illustrates the deployment of Virtual Service Router within this cluster. It has been tested on Ubuntu 20.04 and Ubuntu 22.04.
If you are already familiar with Kubernetes or if you already have a Kubernetes cluster deployed, you may want to skip the Kubernetes installation procedure and focus on Install smarter-device-manager plugin.
Note
To simplify the documentation, we assume that all commands are run by the root user.
Kubernetes installation¶
Load kernel modules¶
Load the required kernel modules on the host node, as listed in Kernel modules.
Memory configuration¶
To run Kubernetes, you need to disable swap:
# swapoff -a
Note
The kubelet agent running on each node fails to start if swap is enabled. Indeed it currently can’t guarantee that a pod requesting a given amount of memory will never swap during its lifecycle and thus does not know how to enforce the memory limitation as swap does not account for memory. For now, the kubelet agent avoids this issue by requiring that swap is disabled.
The Virtual Service Router pod requires hugepages to run (the recommended value is 8GB per pod). If you need to spawn multiple Virtual Service Router on the same node, you may allocate more, let’s say 16GB per node for example:
# echo 16 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 16 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# umount -f /dev/hugepages
# mkdir -p /dev/hugepages
# mount -t hugetlbfs -o pagesize=1G none /dev/hugepages
Install containerd¶
At the time of writing, the latest Kubernetes stable version is 1.29.2.
Kubernetes version >= 1.26 needs containerd >= 1.6. If such version is not available in your distribution repository, you may install it from docker repository using the following procedure:
# curl -s https://download.docker.com/linux/ubuntu/gpg | apt-key add -
# echo "deb [arch=$(dpkg --print-architecture)] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
| sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# apt -qy update
# apt -qy install containerd.io apparmor apparmor-utils
Install Kubernetes packages¶
First install the apt-transport-https package, which will allow us to use http and https in Ubuntu repositories:
# apt install -y apt-transport-https curl
Next, add the Kubernetes signing key:
# K8S_VERSION=1.29
# mkdir -p /etc/apt/keyrings
# curl -fsSL "https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/Release.key" | \
gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
Next, add the Kubernetes package repository:
# echo "deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v${K8S_VERSION}/deb/ /" | \
tee /etc/apt/sources.list.d/kubernetes.list
# apt update
Now install Kubernetes packages:
# apt install -y kubeadm kubectl kubelet
Once the packages are installed, put them on hold as a Kubernetes update is way more subtle than a simple update of the packages provided by the distribution:
# apt-mark hold kubeadm kubectl kubelet
Configure containerd as Kubernetes Container Runtime¶
Enable required sysctls:
# cat <<EOF | tee /etc/sysctl.d/99-kubernetes-cri.conf
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward = 1
fs.inotify.max_user_instances = 2048
fs.inotify.max_user_watches = 1048576
EOF
# sysctl -p /etc/sysctl.d/99-kubernetes-cri.conf
Extend lock memory limit in pods:
# mkdir -p /etc/systemd/system/containerd.service.d
# cat <<EOF | tee /etc/systemd/system/containerd.service.d/override.conf
[Service]
LimitMEMLOCK=4194304
LimitNOFILE=1048576
EOF
Create the containerd configuration, and restart it:
# mkdir -p /etc/containerd
# containerd config default > /etc/containerd/config.toml
# sed -i 's/SystemdCgroup \= false/SystemdCgroup \= true/g' /etc/containerd/config.toml
# systemctl daemon-reload
# systemctl restart containerd
Cluster initialization¶
Create a kubelet-config.yaml
with the following content
:
apiVersion: kubeadm.k8s.io/v1beta3
kind: InitConfiguration
nodeRegistration:
criSocket: "unix:///run/containerd/containerd.sock"
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
networking:
podSubnet: 10.229.0.0/16
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
cgroupDriver: systemd
reservedSystemCPUs: 0-3
allowedUnsafeSysctls:
- net.*
cpuManagerPolicy: static
topologyManagerPolicy: best-effort
The static CPU Manager policy enables the dedication of cores to a pod. The
best-effort Topology Manager policy will try to conciliate the network devices and
the dedicated cores that might be allocated to a pod so that these resources are
allocated from the same NUMA node. We also reserve a few cores to kubernetes
housekeeping daemons and processes with the reservedSystemCPUs
configuration.
See also
the CPU management policies and the Topology manager pages from Kubernetes documentation.
The desired subnet for internal communication is also provided in this file and can be modified.
# kubeadm config images pull
# kubeadm init --config kubelet-config.yaml
# mkdir /root/.kube
# cp /etc/kubernetes/admin.conf /root/.kube/config
# chown $(id -u):$(id -g) /root/.kube/config
By default, your cluster will not schedule Pods on the controller node for security reasons. In our example of a single-machine Kubernetes cluster, we allow scheduling pods on the controller node:
# kubectl taint nodes --all node-role.kubernetes.io/control-plane-
Network configuration¶
In this subsection we will install several network plugins for Kubernetes. These are plugins that provide networking to the pods deployed in the cluster.
We will use:
flannel: a basic networking plugin leveraging veth interfaces to provide the default pod connectivity;
sriov-cni: to actually plumb host VFs into pods;
sriov-network-device-plugin: to allocate host resources to the pods;
multus: a CNI meta-plugin enabling the multiplexing of several other plugins required to make the previous three plugins work together.
Install golang¶
Golang is required to build multus
and sriov-cni
.
# cd /root
# wget https://golang.org/dl/go1.18.5.linux-amd64.tar.gz
# tar -C /usr/local -xf go1.18.5.linux-amd64.tar.gz
# export PATH=$PATH:/usr/local/go/bin
Install flannel plugin¶
Flannel is responsible for allocating a subnet lease to each host out of a larger, preconfigured address space. In Kubernetes, each pod has a unique, routable IP inside the cluster. The transport between pods from different nodes is assured by a VXLAN overlay.
Create a kube-flannel.yaml
file with the following content
:
---
kind: Namespace
apiVersion: v1
metadata:
name: kube-flannel
labels:
pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: flannel
namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-flannel
labels:
tier: node
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.229.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-flannel
labels:
tier: node
app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
#image: flannelcni/flannel-cni-plugin:v1.1.0 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.1.0
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
#image: flannelcni/flannel:v0.19.2 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.19.2
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
#image: flannelcni/flannel:v0.19.2 for ppc64le and mips64le (dockerhub limitations may apply)
image: docker.io/rancher/mirrored-flannelcni-flannel:v0.19.2
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
limits:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: EVENT_QUEUE_DEPTH
value: "5000"
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: xtables-lock
mountPath: /run/xtables.lock
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
Install the YAML file describing this network provider:
# kubectl apply -f kube-flannel.yaml
This deploys the configuration and the daemonset that runs the flanneld
binary
on each node.
Note
This documentation has been tested with flannel v0.19.2.
Install multus plugin¶
Multus is a meta-plugin that can leverage several other CNI plugins simultaneously. We use it here to be able to provide a sriov-type configuration to a SR-IOV device allocated by the sriov-network-device-plugin. It can also provide additional network connectivity through other CNI plugins. When installed, it automatically includes the existing flannel plugin in its configuration so that it provides this default connectivity to all pods in addition to explicitely defined network interfaces. This default connectivity is required by the cluster so that a pod can reach the external world (other pods on the same or on another node, internet resources …).
Build the CNI:
# TAG=v3.9.2
# cd /root
# git clone https://github.com/intel/multus-cni.git
# cd multus-cni/
# git checkout $TAG
# ./hack/build-go.sh
Install the plugin binary:
# cp bin/multus /opt/cni/bin
Install the daemonset:
# sed -i 's,\(image: ghcr\.io/k8snetworkplumbingwg/multus-cni\):.*,\1:'$TAG',' deployments/multus-daemonset.yml
# kubectl create -f deployments/multus-daemonset.yml
Note
This documentation has been tested with multus v3.9.2.
Install sriov plugin¶
The purpose of the sriov-cni plugin is to configure the VF allocated to the containers.
# cd /root
# git clone https://github.com/intel/sriov-cni.git
# cd sriov-cni
# git checkout v2.6.3
# make build
# cp build/sriov /opt/cni/bin
Note
This documentation has been tested with sriov-cni v2.6.3.
Configure the NICs¶
In this example, we want to pass to the pod the two Intel NICs ens787f0
and
ens804f0
, that have the PCI addresses 0000:81:00.0 and 0000:83:00.0 respectively.
We will create a Virtual Function for each interface (PCI addresses 0000:81:10.0
and 0000:83:10.0) and bind them to the vfio-pci
driver.
Set the PF devices up and create the desired number of VFs for each NIC:
Note
This subsection applies to Intel network devices (ex: Niantic, Fortville). Other devices like Nvidia Mellanox NICs require different operations, which are not detailed in this document.
# ip link set ens787f0 up
# echo 1 > /sys/class/net/ens787f0/device/sriov_numvfs
# ip link set ens804f0 up
# echo 1 > /sys/class/net/ens804f0/device/sriov_numvfs
Source the following helper shell functions:
# Bind a device to a driver
# $1: pci bus address (ex: 0000:04:00.0)
# $2: driver
bind_device () {
echo "Binding $1 to $2"
sysfs_dev=/sys/bus/pci/devices/$1
if [ -e ${sysfs_dev}/driver ]; then
sudo sh -c "echo $1 > ${sysfs_dev}/driver/unbind"
fi
sudo sh -c "echo $2 > ${sysfs_dev}/driver_override"
sudo sh -c "echo $1 > /sys/bus/pci/drivers/$2/bind"
if [ ! -e ${sysfs_dev}/driver ]; then
echo "Failed to bind device $1 to driver $2" >&2
return 1
fi
}
# Bind a device and devices in the same iommu group to a driver
# $1: pci bus address (ex: 0000:04:00.0)
# $2: driver
bind_device_and_siblings () {
bind_device $1 $2
# take devices in the same iommu group
for dir in $sysfs_dev/iommu_group/devices/*; do
[ -e "$dir" ] || continue
sibling=$(basename $(readlink -e "$dir"))
# we can skip ourself
[ "$sibling" = "$1" ] && continue
bind_device $sibling $2
done
}
# get the iommu group of a device
# $1: pci bus address (ex: 0000:04:00.0)
get_iommu_group () {
iommu_is_enabled || echo -n "noiommu-"
echo $(basename $(readlink -f /sys/bus/pci/devices/$1/iommu_group))
}
# return 0 (success) if there is at least one file in /sys/class/iommu
iommu_is_enabled() {
for f in /sys/class/iommu/*; do
if [ -e "$f" ]; then
return 0
fi
done
return 1
}
# get arguments to be passed to docker cli
# $*: list of pci devices
get_vfio_device_args () {
iommu_is_enabled || echo -n "--cap-add=SYS_RAWIO "
echo "--device /dev/vfio/vfio "
for d in $*; do
echo -n "--device /dev/vfio/$(get_iommu_group $d) "
done
echo
}
These helpers can be downloaded from there
.
The following command sets the unsafe
mode in case the IOMMU is not
available.
$ if ! iommu_is_enabled; then \
sudo sh -c "echo Y > /sys/module/vfio/parameters/enable_unsafe_noiommu_mode"; \
fi
Bind the VFs devices to the vfio-pci driver:
# bind_device_and_siblings 0000:81:10.0 vfio-pci
# bind_device_and_siblings 0000:83:10.0 vfio-pci
See also
Depending on your system, additional configuration may be required. See Providing physical devices or virtual functions to the container paragraph.
Create Kubernetes networking resources¶
To pass the VF interfaces to the pod, we need to declare networking resources via the sriov-network-device plugin.
To do so, create the following config-map-dpdk.yaml
file with this
content
:
apiVersion: v1
kind: ConfigMap
metadata:
name: sriovdp-config
namespace: kube-system
data:
config.json: |
{
"resourceList": [
{
"resourceName": "intel_sriov_nic_vsr1",
"resourcePrefix": "intel.com",
"selectors": {
"vendors": ["8086"],
"devices": ["10ed"],
"drivers": ["vfio-pci"],
"pfNames": ["ens787f0"],
"needVhostNet": true
}
},
{
"resourceName": "intel_sriov_nic_vsr2",
"resourcePrefix": "intel.com",
"selectors": {
"vendors": ["8086"],
"devices": ["10ed"],
"drivers": ["vfio-pci"],
"pfNames": ["ens804f0"],
"needVhostNet": true
}
}
]
}
This configuration file declares two different resource types, one for VF 0 of
the Intel NIC ens787f0
and one for VF 0 of the Intel NIC ens804f0
.
The selector keywords vendor
and devices
match the hexadecimal ID that can
be found in /sys/bus/pci/devices/<PCI_ID>/
of the VF devices.
The "needVhostNet": true
directive instructs the sriov-network-device-plugin
to mount the /dev/vhost-net
device alongside the VF devices into the pods. This
is required for all interfaces that will be handled by the fast path.
See also
The detailed explanation of the syntax can be found on the plugin website
Now deploy this ConfigMap into the cluster:
# kubectl apply -f config-map-dpdk.yaml
Finally, we deploy the YAML file describing the daemonset enabling the SR-IOV
network device plugin on all worker nodes. It leverages the ConfigMap that we
just installed and the SR-IOV device plugin image reachable by all nodes. This
file named sriovdp-daemonset.yaml
has the following content
:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: sriov-device-plugin
namespace: kube-system
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-amd64
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
hostPID: true
nodeSelector:
kubernetes.io/arch: amd64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:v3.5.1
imagePullPolicy: Always
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
resources:
requests:
cpu: "250m"
memory: "40Mi"
limits:
cpu: 1
memory: "200Mi"
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
- name: device-info
mountPath: /var/run/k8s.cni.cncf.io/devinfo/dp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: device-info
hostPath:
path: /var/run/k8s.cni.cncf.io/devinfo/dp
type: DirectoryOrCreate
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.json
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-sriov-device-plugin-arm64
namespace: kube-system
labels:
tier: node
app: sriovdp
spec:
selector:
matchLabels:
name: sriov-device-plugin
template:
metadata:
labels:
name: sriov-device-plugin
tier: node
app: sriovdp
spec:
hostNetwork: true
nodeSelector:
kubernetes.io/arch: arm64
tolerations:
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
serviceAccountName: sriov-device-plugin
containers:
- name: kube-sriovdp
image: ghcr.io/k8snetworkplumbingwg/sriov-network-device-plugin:latest-arm64
imagePullPolicy: Always
args:
- --log-dir=sriovdp
- --log-level=10
securityContext:
privileged: true
resources:
requests:
cpu: "250m"
memory: "40Mi"
limits:
cpu: 1
memory: "200Mi"
volumeMounts:
- name: devicesock
mountPath: /var/lib/kubelet/
readOnly: false
- name: log
mountPath: /var/log
- name: config-volume
mountPath: /etc/pcidp
- name: device-info
mountPath: /var/run/k8s.cni.cncf.io/devinfo/dp
volumes:
- name: devicesock
hostPath:
path: /var/lib/kubelet/
- name: log
hostPath:
path: /var/log
- name: device-info
hostPath:
path: /var/run/k8s.cni.cncf.io/devinfo/dp
type: DirectoryOrCreate
- name: config-volume
configMap:
name: sriovdp-config
items:
- key: config.json
path: config.json
# kubectl apply -f sriovdp-daemonset.yaml
Note
We use the v3.5.1 stable version of sriov-network-device-plugin.
Create a multus-sriov-dpdk.yaml
file to integrate the SR-IOV plugins into the
multus environment and then deploy it, with the following content
:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: multus-intel-sriov-nic-vsr1
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_nic_vsr1
spec:
config: '{
"type": "sriov",
"cniVersion": "0.3.1",
"name": "sriov-intel-nic-vsr1",
"trust": "on",
"spoofchk": "off"
}'
---
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: multus-intel-sriov-nic-vsr2
annotations:
k8s.v1.cni.cncf.io/resourceName: intel.com/intel_sriov_nic_vsr2
spec:
config: '{
"type": "sriov",
"cniVersion": "0.3.1",
"name": "sriov-intel-nic-vsr2",
"trust": "on",
"spoofchk": "off"
}'
# kubectl apply -f multus-sriov-dpdk.yaml
Install smarter-device-manager plugin¶
If you plan to use PPP inside the container, it is required to pass
the /dev/ppp
device to the container. This can be done with the
smarter-device-manager
plugin.
First, deploy the YAML file describing the daemonset enabling the
smarter-device-manager
plugin on all worker nodes. This file named
smarter-device-manager.yaml
has the following content
:
# derived from https://gitlab.com/arm-research/smarter/smarter-device-manager/-/blob/master/smarter-device-manager-ds.yaml
apiVersion: v1
kind: Namespace
metadata:
name: smarter-device-manager
labels:
name: smarter-device-manager
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: smarter-device-manager
namespace: smarter-device-manager
labels:
name: smarter-device-manager
role: agent
spec:
selector:
matchLabels:
name: smarter-device-manager
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
name: smarter-device-manager
annotations:
node.kubernetes.io/bootstrap-checkpoint: "true"
spec:
priorityClassName: "system-node-critical"
hostname: smarter-device-management
hostNetwork: true
dnsPolicy: ClusterFirstWithHostNet
containers:
- name: smarter-device-manager
image: registry.gitlab.com/arm-research/smarter/smarter-device-manager:v1.20.11
imagePullPolicy: IfNotPresent
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
resources:
limits:
cpu: 100m
memory: 15Mi
requests:
cpu: 10m
memory: 15Mi
volumeMounts:
- name: device-plugin
mountPath: /var/lib/kubelet/device-plugins
- name: dev-dir
mountPath: /dev
- name: sys-dir
mountPath: /sys
- name: config
mountPath: /root/config
volumes:
- name: device-plugin
hostPath:
path: /var/lib/kubelet/device-plugins
- name: dev-dir
hostPath:
path: /dev
- name: sys-dir
hostPath:
path: /sys
- name: config
configMap:
name: smarter-device-manager
terminationGracePeriodSeconds: 30
# kubectl apply -f smarter-device-manager.yaml
Note
We use the v1.20.11 stable version of smarter-device-manager
plugin.
Apply the following configuration
:
apiVersion: v1
kind: ConfigMap
metadata:
name: smarter-device-manager
namespace: smarter-device-manager
data:
conf.yaml: |
- devicematch: ^ppp$
nummaxdevices: 100
# kubectl apply -f smarter-device-manager-config.yaml
The associated resource will be requested in the Virtual Service Router deployment file, as described below.
Deploy Virtual Service Router into the cluster¶
Authenticate to 6WIND container registry¶
The Docker image is available at:
download.6wind.com/vsr/x86_64-ce/3.9:3.9.1
First, create a Kubernetes secret to authenticate to the 6WIND registry, using the credentials provided by 6WIND support:
# kubectl create secret docker-registry regcred \
--docker-server=download.6wind.com \
--docker-username=$LOGIN --docker-password=$PASSWORD
Warning
replace $LOGIN and $PASSWORD with the credentials provided by 6WIND support.
Create the Virtual Service Router pod template¶
The pod is declared in a YAML file describing its properties and the way it is deployed.
For this example, we use a file named vsr.yaml
(available for download
here
):
apiVersion: apps/v1
kind: Deployment
metadata:
name: vsr
spec:
replicas: 1
selector:
matchLabels:
role: vsr
template:
metadata:
labels:
role: vsr
annotations:
k8s.v1.cni.cncf.io/networks: multus-intel-sriov-nic-vsr1,multus-intel-sriov-nic-vsr2
container.apparmor.security.beta.kubernetes.io/vsr: unconfined
spec:
restartPolicy: Always
securityContext:
sysctls:
- name: net.ipv4.conf.default.disable_policy
value: "1"
- name: net.ipv4.ip_local_port_range
value: "30000 40000"
- name: net.ipv4.ip_forward
value: "1"
- name: net.ipv6.conf.all.forwarding
value: "1"
containers:
- image: download.6wind.com/vsr/x86_64-ce/3.9:3.9.1
imagePullPolicy: IfNotPresent
name: vsr
resources:
limits:
cpu: "4"
memory: "2Gi"
hugepages-1Gi: 8Gi
intel.com/intel_sriov_nic_vsr1: '1'
intel.com/intel_sriov_nic_vsr2: '1'
smarter-devices/ppp: 1
requests:
cpu: "4"
memory: "2Gi"
hugepages-1Gi: 8Gi
intel.com/intel_sriov_nic_vsr1: '1'
intel.com/intel_sriov_nic_vsr2: '1'
smarter-devices/ppp: 1
env:
securityContext:
capabilities:
add: ["NET_ADMIN", "NET_RAW", "SYS_ADMIN", "SYS_NICE", "IPC_LOCK", "NET_BROADCAST", "SYSLOG"]
volumeMounts:
- mountPath: /dev/hugepages
name: hugepage
- mountPath: /dev/shm
name: shm
- mountPath: /dev/net/tun
name: net
- mountPath: /tmp
name: tmp
- mountPath: /run
name: run
- mountPath: /run/lock
name: run-lock
stdin: true
tty: true
imagePullSecrets:
- name: regcred
volumes:
- emptyDir:
medium: HugePages
sizeLimit: 8Gi
name: hugepage
- name: shm
emptyDir:
sizeLimit: "512Mi"
medium: "Memory"
- hostPath:
path: /dev/net/tun
type: ""
name: net
- emptyDir:
sizeLimit: "200Mi"
medium: "Memory"
name: tmp
- emptyDir:
sizeLimit: "200Mi"
medium: "Memory"
name: run
- emptyDir:
sizeLimit: "200Mi"
medium: "Memory"
name: run-lock
Note
If the IOMMU of your server is disabled (which is not advised),
the CAP_SYS_RAWIO
capability must also be enabled when using
a VF or PF interface in the container.
See Required capabilities paragraph.
This file contains one pod named vsr
that contains the Virtual Service Router container.
The pod is started with reduced capabilities and has both SR-IOV and flannel
network interfaces attached.
The pod template declares as an annotation the network resource attachments from multus that it requests. You can add more than one attachment if you need several interfaces.
Next the container itself is described. The pod has only one container, that runs the Virtual Service Router image fetched from 6WIND docker repository.
Then the container declares the resources that it requests. The Kubernetes
cluster will then find a node satisfying these requirements to schedule the pod.
By setting the limits equal to the requests, we ask for this exact amount of
resources. For example for the vsr
pod:
cpu
: number of vCPUs resources to be allocated. This reservation does not necessarily imply a CPU pinning. The cluster provides an amount of CPU cycles from all the host’s CPUs that is equivalent to 4 logical CPUs.memory
: the pod requests 2GB of RAM.hugepages-1Gi
: the pod requests 8GB of memory from 1GB-hugepages. You can use hugepages of 2MB instead, provided that you allocate them at boot time and use the keywordhugepages-2Mi
instead.
Note
Setting the CPU and memory limits respectively equal to the CPU and
memory requests makes the pod qualify for the Guaranteed
QoS class.
See the configuration of the QoS for pods
page of Kubernetes documentation.
Warning
Make sure to request enough CPU resources to cover the CPU cores that you plan to configure in your fast-path.env and the processes of the control plane. Otherwise your fast path cores will be throttled resulting in a huge performance degradation.
The following capabilities are granted to the container:
Capability |
Role |
---|---|
CAP_SYS_ADMIN |
eBPF exception path, VRF, tcpdump |
CAP_NET_ADMIN |
General Linux networking |
CAP_NET_RAW |
Support of filtering, tcpdump, … |
CAP_IPC_LOCK |
Memory allocation for DPDK |
CAP_SYS_NICE |
Get NUMA information from memory |
CAP_NET_BROADCAST |
VRF support (notifications) |
CAP_SYSLOG |
Use syslog from pod |
Finally we make sure to mount the following paths from the host into the container:
Path |
Role |
---|---|
/dev/hugepages, /dev/shm |
The fastpath requires access to hugepages for its shared memory |
/dev/net/tun |
Required for FPVI interfaces (exception path) |
/dev/ppp |
Required for PPP configuration |
/tmp |
Mount /tmp as tmpfs, may be required by some applications using the O_TMPFILE open flag |
/run |
Required by systemd |
/run/lock |
Required by systemd |
Deploy the pod¶
# kubectl apply -f vsr.yaml
You can then see your pod running:
# kubectl get pods
NAME READY STATUS RESTARTS AGE
vsr-55d6f69dcc-2tl76 1/1 Running 0 6m17s
You can get information about the running pod with:
$ kubectl describe pod vsr-55d6f69dcc-2tl76
Note
The pod can be deleted with kubectl delete -f vsr.yaml
.
Connect to the pod¶
You can connect to the pod command line interface with the command
kubectl exec -it <pod name> -- login
. For example:
$ kubectl exec vsr-55d6f69dcc-2tl76 -it -- login