Quickstart¶
A monitoring.sh
script is available to start the monitoring module.
The start, status and stop commands are available:
# monitoring.sh start
Starting Monitoring...
Monitoring successfully started
# monitoring.sh status
* kpid.service - KPI Daemon
Loaded: loaded (/lib/systemd/system/kpid.service; disabled; vendor preset: enabled)
Active: active (running) since Tue 2020-06-16 09:31:53 UTC; 3s ago
Main PID: 3526 (kpid)
Tasks: 1 (limit: 1088)
CGroup: /system.slice/kpid.service
`-3526 /usr/bin/python3 /usr/bin/kpid
* netopeer2-server.service - NETCONF Server
Loaded: loaded (/lib/systemd/system/netopeer2-server.service; enabled; vendor preset: enabled)
Active: active (running) since Mon 2020-06-16 09:31:52 UTC; 4s ago
Main PID: 1067 (netopeer2-serve)
Tasks: 7 (limit: 1088)
CGroup: /system.slice/netopeer2-server.service
`-1067 /usr/bin/netopeer2-server -U -g netconf -m 660
# monitoring.sh stop
Stopping Monitoring...
Monitoring successfully stopped
Netopeer options can be configured in /etc/default/netopeer
and kpid
options can be configured in /etc/default/kpid
.
On the monitored device, fetch the fp-cpu-usage kpi:
# kpi-get fp-cpu-usage -o json
{
"sixwind-router:monitoring": {
"fp:cpu-usage": [
{
"cpu": "cpu1",
"busy": 0
},
{
"cpu": "cpu2",
"busy": 0
},
{
"cpu": "cpu3",
"busy": 0
}
]
}
}
KPI definition¶
YANG¶
6WIND monitoring uses YANG to describe its KPIs. YANG is an IETF data modeling language (RFC 6020). This section will briefly introduce this format.
Here is an example of YANG model for 6WIND monitoring sixwind-router module and product group.
module sixwind-router {
namespace "urn:6wind:router";
prefix router;
organization "6WIND";
description
"6WIND router data model";
contact
"support@6wind.com";
revision "2017-12-04" {
description
"Initial revision.";
}
container monitoring {
config false;
description
"6WIND monitoring data model.";
}
}
module product {
namespace "urn:6wind:router:monitoring:product";
prefix product;
import sixwind-router {
prefix sixwind-router;
}
description
"This module provides support for showing product information.";
revision "2017-11-14" {
description
"Initial revision.";
}
augment /sixwind-router:monitoring {
leaf version {
type string;
description
"The product version.";
}
container license {
description
"The license detailed info.";
leaf enabled {
type boolean;
description
"True if the license daemon is enabled on the system.";
}
leaf valid {
type boolean;
description
"True if the license is valid.";
}
leaf short-license-type {
type enumeration {
enum evaluation {
description
"The license is an evaluation.";
}
enum perpetual {
description
"The license is perpetual.";
}
enum subscription {
description
"The license is a subscription.";
}
}
description
"A shorter version of the license type.";
}
leaf support-end-date {
type string;
description
"The support end date.";
}
leaf remaining-days {
type union {
type string;
type enumeration {
enum unset {
description
"The end date is not set.";
}
enum expired {
description
"The date has expired.";
}
}
}
description
"The number of days remaining.";
}
leaf throughput-allowed {
description
"The allowed throughput.";
type decimal64 {
fraction-digits 2;
}
units "Gb";
}
leaf throughput-used {
description
"The throughput currently in use.";
type decimal64 {
fraction-digits 2;
}
units "Gb";
}
leaf cgnat-conntracks-allowed {
type uint32;
description
"The number of CG-NAT conntracks allowed.";
}
leaf cgnat-conntracks-used {
type uint32;
description
"The number of CG-NAT conntracks currently in use.";
}
leaf ipsec-tunnels-allowed {
type uint32;
description
"The number of IPsec tunnels allowed.";
}
leaf ipsec-tunnels-used {
type uint32;
description
"The number of IPsec tunnels currently in use.";
}
leaf connected {
type boolean;
description
"The daemon is connected to the server.";
}
leaf lease-end-date {
type string;
description
"The date at which the lease expires.";
}
leaf remaining-lease-days {
type union {
type string;
type enumeration {
enum unset {
description
"The end date is not set.";
}
enum expired {
description
"The date has expired.";
}
}
}
description
"The number of days before the lease ends.";
}
}
}
}
Let’s explain some of the YANG keywords:
module
is a self-contained top-level hierarchy of the following nodes.leaf
is where values are put. Each leaf has atype
(e.g: string, uint16_t, uint64_t, enumeration) and adescription
.augment <path>
means that the content of the augment node should be inserted inside <path>.list
defines a list, thekey
keyword defines its key.container
is a way of grouping elements together.description
is the current element description.revision
is the version of the YANG model.prefix
tells how the module elements should be prefixed.
KPI structure¶
The KPI list is broken into 4 groups:
product, including version, or license state
system, including load, uptime
network, including interface statistics
fp, for statistics about fast path
In each group, several services are available. A service is a group of statistics of the same kind.
For instance, the product group contains:
product-version
product-license
The fp group contains:
fp-ip-stats
fp-ip6-stats
fp-cpu-usage
…
The list of services and their parameters is available using the
kpi-list-services
command on the device.
KPI identification¶
To identify a KPI in 6WIND monitoring module, two ways can be used.
The first one is to use a service name, as defined in the previous section:
service |
data accessed |
---|---|
fp-ip-stats |
the fast path IPv4 statistics |
fp-cpu-usage |
the fast path cpu usage |
product-version |
the version of the product |
The second is to use a path, similar to xpath (XML path language). Paths can be deduced from the YANG model. Some examples follow:
xpath |
data accessed |
---|---|
/sixwind-router:monitoring |
everything |
/sixwind-router:monitoring/fp:statistics |
all the fast path statistics |
/sixwind-router:monitoring/fp:statistics/fp:ip |
the fast path IPv4 statistics |
/sixwind-router:monitoring/system:cpu-usage[cpu=’cpu3’] |
the system cpu load for cpu3 |
Depending on the product and the module, some statistics might not be available.
Fetching KPIs¶
This section explains how the data can be fetched from the device.
kpi-get¶
The kpi-get embedded tool can be used to query the monitoring data, using the local API or NETCONF API, and output in json or influx format. It can also make HTTP POST request to a remote server.
The tool -h option explains how it works.
# kpi-get -h
usage: kpi-get [-h] [-i FMT[:opts]] -o FMT[:opts] DATA
Tool to dump KPIs
positional arguments:
DATA either an xpath or a service. To get everything, use
/sixwind-router:monitoring
optional arguments:
-h, --help show this help message and exit
-i FMT[:opts], --input-format FMT[:opts]
use "FMT:help" to get details about a format (default:
sysrepo)
-o FMT[:opts], --output-format FMT[:opts]
use "FMT:help" to get details about a format
Available services:
fp-context-switch-stats, fp-cpu-usage, fp-drop-stats, fp-exception-queue-stats,
fp-exceptions-stats, fp-ip-stats, fp-ip6-stats, fp-ipsec-stats, fp-
ipsec6-stats, fp-l2-stats, network-nic-eth-stats, network-nic-traffic-stats,
product-license, product-version, system-cpu-usage, system-numa-stats,
system-soft-interrupts-stats, system-uptime
Available input formats:
netconf, sysrepo
Available output formats:
http-influx, http-json, influx, json, raw
A few examples follow to explain how the tool works.
Get all the available data in json format:
# kpi-get /sixwind-router:monitoring -o json
{
"sixwind-router:monitoring": {
"system:numa-stats": [
{
"node": "node0",
"other-node": 0,
"numa-miss": 0,
"numa-foreign": 0,
"interleave-hit": 13969,
"local-node": 3030760,
"numa-hit": 3030760
}
],
"product:uptime": "5:12:15",
(...)
}
Get all the fast path IP statistics in influx format:
# kpi-get fp-ip-stats -o influx
fp-ip-stats,path=/sixwind-router:monitoring/fp:statistics/fp:ip,host=dut-vm IpDroppedBlackhole=0,IpDroppedForwarding=0,IpDroppedIPsec=0,IpDroppedInvalidInterface=0,IpDroppedNetfilter=0,IpDroppedNoArp=0,IpDroppedNoMemory=0,IpDroppedRouteException=0,IpForwDatagrams=0,IpFragCreates=0,IpFragFails=0,IpFragOKs=0,IpInAddrErrors=0,IpInDelivers=0,IpInHdrErrors=0,IpInReceives=0,IpReasmExceptions=0,IpReasmFails=0,IpReasmOKs=0,IpReasmReqds=0,IpReasmTimeout=0
Post the fast path cpu usage in json format to a http://a.b.c.d:8000/write:
# kpi-get fp-cpu-usage -o http-json:host=a.b.c.d:port=8000:path=write
Telegraf¶
Telegraf is an agent that collects
and reports metrics. Coupled with 6WIND’s tool kpi-get
, it can report all
the available statistics to a remote location.
Telegraf can report the monitoring data to multiple tools. The list of supported output plugins is maintained here.
To run telegraf, at least an input plugin and an output plugin should be
defined. The configuration is located in /etc/telegraf/telegraf.conf
and
/etc/telegraf/telegraf.d/
.
In /etc/telegraf/telegraf.conf
, you can configure the agent
behavior. Here is an example:
[agent]
debug = false
flush_buffer_when_full = true
flush_interval = "15s"
flush_jitter = "0s"
hostname = "myhostname"
interval = "15s"
round_interval = true
In the /etc/telegraf/telegraf.d/
directory, you can add the
plugins. Here is an example of making use of the input exec plugin, that will
query into 6WIND monitoring using its local API, and output to influx format:
[[inputs.exec]]
commands = [ "kpi-get /sixwind-router:monitoring -o influx"]
Here is an example of influxdb output plugin:
[[outputs.influxdb]]
database = "telegraf"
urls = [ "http://a.b.c.d:8086" ]
username = "telegraf"
password = "mypassword"
Note
We encourage the use of /etc/telegraf/telegraf.d/
for plugins,
but the configuration for those plugins can be put in
/etc/telegraf/telegraf.conf
. The plugins are self-documented in this
file.
Configuration samples for input and output can be found in the
/usr/share/6WIND/telegraf
directory.
Once configured, telegraf is controlled by a service file.
It is disabled by default, because some configuration is needed to make it work (the output plugin). To enable telegraf on boot and start it:
# systemctl enable telegraf
# systemctl start telegraf
To disable it from boot and stop it:
# systemctl disable telegraf
# systemctl stop telegraf
NETCONF¶
NETCONF is an IETF standard used for configuration. It can also monitor the state of a system.
Any tool supporting NETCONF can get data from the device by connecting to port
830
. Any user configured on the machine can access the monitoring
data. YANG models are available at the end of this document, and in the
/usr/share/6WIND/yang
directory on the device.