Skip to Content

Monitoring Linux System with Telegraf Influxdb Grafana

Linux monitoring is not an easy task for Linux admins. Luckily we have open-source software to monitor these metrics now. Today we are going to learn Linux metrics we can monitor using this monitor bundle Telegraf/Influxdb/Grafana.

We can monitor the following metrics with TIG system.

  • Monitor Basic OS metrics For Linux system
  • Monitor Application Metrics
  • Monitor Custom Metrics

Monitor Basic OS metrics For Linux system

The telegraf agent collects basic system metrics including memory usage, CPU utilization, disk I/O statistics, and more. Most metrics are directly pulled from the OS /proc directory every 15 seconds, although it is possible to alter the collection interval.

cpu/mem metrics:

disk metrics:

network metrics:

Yes, these beautiful charts are generated by Grafana/InfluxDB/Telegraf. From these metrics, we can easily see what is going on for our Linux systems. This is critical for our system monitoring.

There are more metrics for each item. For example: we can get all the CPU metrics like usage_guest, usage_idle, usage_iowait etc.

We can get all the metrics from here.

https://github.com/influxdata/telegraf/tree/master/plugins/inputs

 

Monitor Application Metrics

Telegraf can collect database metrics like MongoDB, MySQL, Redis, and others and send metrics to Grafana.

Database monitoring is available through the Telegraf plugin.

Monitor Custom Metrics

We can write our own scripts like shell scripts or python scripts to generate metrics. Then insert these data to influxdb.

More info about this monitoring bundle

  • Grafana: Grafana is “The open platform for beautiful analytics and monitoring.” It makes it easy to create dashboards for displaying data from many sources, particularly time-series data. It works with several different data sources such as Graphite, Elasticsearch, InfluxDB, and OpenTSDB. We’re going to use this as our main front end for visualizing our network statistics.
  • InfluxDB: InfluxDB is “…a data store for any use case involving large amounts of timestamped data.” This is where we’re going to store our network statistics. It is designed for exactly this use-case, where metrics are collected over time.
  • Telegraf: Telegraf is “…a plugin-driven server agent for collecting and reporting metrics.” This can collect data from a wide variety of sources, e.g. system statistics, API calls, DB queries, and SNMP. It can then send those metrics to a variety of datastores, e.g. Graphite, OpenTSDB, Datadog, Librato. Telegraf is maintained by InfluxData, the people behind InfluxDB. So it has very good support for writing data to InfluxDB.