KPIs¶
6WIND KPI monitoring provides the ability to monitor and export Virtual Service Router KPIs to an InfluxDB time-series database, which can then be integrated with an analytics frontend, such as Grafana. An example of InfluxDB/Grafana setup is described on 6WIND’s github.
Configuration¶
Configuring KPIs requires to:
enable and configure the KPIs daemon to specify which KPIs to collect such as licensing or network interface
enable and configure the Telegraf agent to export the specified KPIs to a remote InfluxDB database
To configure the KPIs daemon with everything it can collect, and the
Telegraf
agent to send data to the InfluxDB
server located at
http://1.1.1.1:8086
, in the test
database, do:
vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test
vsr running config# / vrf main kpi telegraf metrics template all
vsr running config# commit
Note
To connect Telegraf to a secured InfluxDB instance (https URL) that is using
a self-signed certificate, you must enable insecure-skip-verify
.
By default, the network interfaces that are exported to the InfluxDB
server are the
fast path ports. This can be changed be specifying the list of interfaces:
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan0 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name mgmt0 vrf mgmt
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan1 vrf main
vsr running config/# commit
Note
If a list of interfaces is specified, the default will not apply anymore and the fast path ports will have to be added manually to the kpi interface list to be exported.
To display the state:
vsr running config# show state vrf main kpi
kpi
telegraf
enabled true
metrics
enabled true
monitored-interface vrf mgmt name mgmt0
monitored-interface vrf main name vlan0
monitored-interface vrf main name vlan1
template all
metric fp-bridge-stats enabled true period 60
metric fp-cg-nat-stats enabled true period 60
metric fp-conntrack-stats enabled true period 60
metric fp-cpu-usage enabled true period 10
metric fp-exceptions-stats enabled true period 60
metric fp-filling enabled true period 10
metric fp-firewall-stats enabled true period 60
metric fp-gre-stats enabled true period 60
metric fp-global-stats enabled true period 60
metric fp-ip-stats enabled true period 60
metric fp-ip6-stats enabled true period 60
metric fp-ipsec-stats enabled true period 60
metric fp-status enabled true period 60
metric fp-vlan-stats enabled true period 60
metric fp-vxlan-stats enabled true period 60
metric fp-status enabled true period 10
metric network-nic-hw-info enabled true period 60
metric network-nic-traffic-stats enabled true period 10
metric network-twamp-stats enabled true period 60
metric product-license enabled true period 60
metric product-version enabled true period 120
metric system-cpu-times enabled true period 10
metric system-cpu-usage enabled true period 10
metric system-disk-usage enabled true period 60
metric system-memory enabled true period 60
metric system-numa-stats enabled true period 60
metric system-processes enabled true period 10
metric system-soft-interrupts-stats enabled true period 60
metric system-uptime enabled true period 10
metric system-user-count enabled true period 10
metric system-users enabled true period 10
..
..
..
The same configuration can be made using this NETCONF XML configuration:
vsr running config# show config xml absolute
<config xmlns="urn:6wind:vrouter">
<vrf>
<name>main</name>
<kpi xmlns="urn:6wind:vrouter/kpi">
<telegraf xmlns="urn:6wind:vrouter/kpi/telegraf">
<enabled>true</enabled>
<metrics>
<enabled>true</enabled>
<monitored-interface>
<vrf>main</vrf>
<name>vlan0</name>
</monitored-interface>
<monitored-interface>
<vrf>mgmt</vrf>
<name>mgmt0</name>
</monitored-interface>
<monitored-interface>
<vrf>main</vrf>
<name>vlan1</name>
</monitored-interface>
<template>all</template>
</metrics>
<interval>10</interval>
<influxdb-output>
<url>http://1.1.1.1:8086</url>
<database>test</database>
</influxdb-output>
</telegraf>
</kpi>
</vrf>
</config>
Migration from the legacy kpi system¶
Here is an example of migration from the legacy KPI system. Given this
configuration, that enables the default KPIs, and monitors vlan0
and
vlan1
in vrf main
, and mgmt0
in vrf mgmt
:
vsr running config# / system kpi
vsr running config# / vrf main kpi interface vlan0
vsr running config# / vrf main kpi interface vlan1
vsr running config# / vrf main mgmt interface mgmt0
vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test
The following commands should be issued:
vsr running config# del / system kpi
vsr running config# del / vrf main kpi interface
vsr running config# del / vrf mgmt kpi interface
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan0 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan1 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name mgmt0 vrf mgmt
vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test
vsr running config/# commit
After the migration, there can be conflicts happening in the InfluxDB database. A warning similar to this log will be displayed in the Virtual Service Router logs:
(...) telegraf[2440]: (...) [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request):
partial write: field type conflict: input field "XXX" on measurement "YYY" is type float, already exists
as type string dropped=1
To fix those conflicts, the measurement triggering the problem must be dropped on the InfluxDB server. One way of doing it is to run this command on a machine that has access to the InfluxDB server:
# curl -XPOST "http://<influxdb>:8086/query?db=<database>" --data-urlencode "q=drop measurement \"YYY\""