KPIs

6WIND KPI monitoring provides the ability to monitor and export Virtual Service Router KPIs to an InfluxDB time-series database, which can then be integrated with an analytics frontend, such as Grafana. An example of InfluxDB/Grafana setup is described on 6WIND’s github.

Configuration

Configuring KPIs requires to:

  • enable and configure the KPIs daemon to specify which KPIs to collect such as licensing or network interface

  • enable and configure the Telegraf agent to export the specified KPIs to a remote InfluxDB database

To configure the KPIs daemon with everything it can collect, and the Telegraf agent to send data to the InfluxDB server located at http://1.1.1.1:8086, in the test database, do:

vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test
vsr running config# / vrf main kpi telegraf metrics template all
vsr running config# commit

Note

To connect Telegraf to a secured InfluxDB instance (https URL) that is using a self-signed certificate, you must enable insecure-skip-verify.

By default, the network interfaces that are exported to the InfluxDB server are the fast path ports. This can be changed be specifying the list of interfaces:

vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan0 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name mgmt0 vrf mgmt
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan1 vrf main
vsr running config/# commit

Note

If a list of interfaces is specified, the default will not apply anymore and the fast path ports will have to be added manually to the kpi interface list to be exported.

To display the state:

vsr running config# show state vrf main kpi
kpi
    telegraf
        enabled true
        metrics
            enabled true
            monitored-interface vrf mgmt name mgmt0
            monitored-interface vrf main name vlan0
            monitored-interface vrf main name vlan1
            template all
            metric fp-bridge-stats enabled true period 60
            metric fp-cg-nat-stats enabled true period 60
            metric fp-conntrack-stats enabled true period 60
            metric fp-cpu-usage enabled true period 10
            metric fp-exceptions-stats enabled true period 60
            metric fp-filling enabled true period 10
            metric fp-firewall-stats enabled true period 60
            metric fp-gre-stats enabled true period 60
            metric fp-global-stats enabled true period 60
            metric fp-ip-stats enabled true period 60
            metric fp-ip6-stats enabled true period 60
            metric fp-ipsec-stats enabled true period 60
            metric fp-status enabled true period 60
            metric fp-vlan-stats enabled true period 60
            metric fp-vxlan-stats enabled true period 60
            metric fp-status enabled true period 10
            metric network-nic-hw-info enabled true period 60
            metric network-nic-traffic-stats enabled true period 10
            metric network-twamp-stats enabled true period 60
            metric product-license enabled true period 60
            metric product-version enabled true period 120
            metric system-cpu-times enabled true period 10
            metric system-cpu-usage enabled true period 10
            metric system-disk-usage enabled true period 60
            metric system-memory enabled true period 60
            metric system-numa-stats enabled true period 60
            metric system-processes enabled true period 10
            metric system-soft-interrupts-stats enabled true period 60
            metric system-uptime enabled true period 10
            metric system-user-count enabled true period 10
            metric system-users enabled true period 10
            ..
        ..
    ..

The same configuration can be made using this NETCONF XML configuration:

vsr running config# show config xml absolute
<config xmlns="urn:6wind:vrouter">
  <vrf>
    <name>main</name>
    <kpi xmlns="urn:6wind:vrouter/kpi">
      <telegraf xmlns="urn:6wind:vrouter/kpi/telegraf">
        <enabled>true</enabled>
        <metrics>
          <enabled>true</enabled>
          <monitored-interface>
            <vrf>main</vrf>
            <name>vlan0</name>
          </monitored-interface>
          <monitored-interface>
            <vrf>mgmt</vrf>
            <name>mgmt0</name>
          </monitored-interface>
          <monitored-interface>
            <vrf>main</vrf>
            <name>vlan1</name>
          </monitored-interface>
          <template>all</template>
        </metrics>
        <interval>10</interval>
        <influxdb-output>
          <url>http://1.1.1.1:8086</url>
          <database>test</database>
        </influxdb-output>
      </telegraf>
    </kpi>
  </vrf>
</config>

Migration from the legacy kpi system

Here is an example of migration from the legacy KPI system. Given this configuration, that enables the default KPIs, and monitors vlan0 and vlan1 in vrf main, and mgmt0 in vrf mgmt:

vsr running config# / system kpi
vsr running config# / vrf main kpi interface vlan0
vsr running config# / vrf main kpi interface vlan1
vsr running config# / vrf main mgmt interface mgmt0
vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test

The following commands should be issued:

vsr running config# del / system kpi
vsr running config# del / vrf main kpi interface
vsr running config# del / vrf mgmt kpi interface
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan0 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name vlan1 vrf main
vsr running config# / vrf main kpi telegraf metrics monitored-interface name mgmt0 vrf mgmt
vsr running config# / vrf main kpi telegraf influxdb-output url http://1.1.1.1:8086 database test
vsr running config/# commit

After the migration, there can be conflicts happening in the InfluxDB database. A warning similar to this log will be displayed in the Virtual Service Router logs:

(...) telegraf[2440]: (...) [outputs.influxdb] Failed to write metric (will be dropped: 400 Bad Request):
partial write: field type conflict: input field "XXX" on measurement "YYY" is type float, already exists
as type string dropped=1

To fix those conflicts, the measurement triggering the problem must be dropped on the InfluxDB server. One way of doing it is to run this command on a machine that has access to the InfluxDB server:

# curl -XPOST "http://<influxdb>:8086/query?db=<database>" --data-urlencode "q=drop measurement \"YYY\""