howtouselinux

Monitor Apache Cassandra cluster with free open source software

Table of Contents

Many people choose free open source software to monitor Cassandra cluster to save cost. In our env, we use telegraf/Influxdb/Grafana to monitor Cassandra performance. This monitoring package is open source. We can download and use them for free.

  • Use TIG to monitor Cassandra performance
  • Configuration file for Cassandra Monitoring
  • Cassandra performance metrics

Use TIG to monitor Cassandra performance

This system is very stable in our env. That is the main reason why we recommend it. The configuration is very easy. There is an official input jolokia2 for Cassandra cluster in telegraf agent.

We can use it directly. Here is a telegraf agent configuration file in our env. We can just get the below /etc/telegraf/telegraf.conf configuration file to your env.

Usually, there are many nodes in one cluster. We can use ansible to deploy this. That saves a lot of time.

How to Config Telegraf to Monitor Cassandra

We can use the following configuration file to monitor Cassandra’s performance.

[global_tags]
cluster_name = “{{ clustername }}”
dc = “{{ dc }}”
[agent]
interval = “30s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “”
debug = true
logfile = “/var/log/telegraf/telegraf.log”
logfile_rotation_max_size = “10MB”
logfile_rotation_max_archives = 10
hostname = “”
omit_hostname = false
[[outputs.influxdb]]
urls = [“http://xxxx:8086”] :repalce this with your influx db IP
database = “telegraf”
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = [“tmpfs”, “devtmpfs”, “devfs”, “iso9660”, “overlay”, “aufs”, “squashfs”]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.jolokia]]
context = “/jolokia/”
[[inputs.jolokia.servers]]
name = “as-server-01”
host = “127.0.0.1”
port = “8080”
[[inputs.jolokia.metrics]]
name = “heap_memory_usage”
mbean = “java.lang:type=Memory”
attribute = “HeapMemoryUsage”
[[inputs.jolokia.metrics]]
name = “thread_count”
mbean = “java.lang:type=Threading”
attribute = “TotalStartedThreadCount,ThreadCount,DaemonThreadCount,PeakThreadCount”
[[inputs.jolokia.metrics]]
name = “class_count”
mbean = “java.lang:type=ClassLoading”
attribute = “LoadedClassCount,UnloadedClassCount,TotalLoadedClassCount”
[[inputs.jolokia2_agent]]
urls = [“http://localhost:8778/jolokia”]
name_prefix = “java_”
[[inputs.jolokia2_agent.metric]]
name = “Memory”
mbean = “java.lang:type=Memory”
[[inputs.jolokia2_agent.metric]]
name = “GarbageCollector”
mbean = “java.lang:name=*,type=GarbageCollector”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent]]
urls = [“http://localhost:8778/jolokia”]
name_prefix = “cassandra_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”, “host”]
field_prefix = “ReadTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Cache”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=Cache”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Client”
mbean = “org.apache.cassandra.metrics:name=*,type=Client”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ReadRepair”
mbean = “org.apache.cassandra.metrics:name=*,type=ReadRepair”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Storage”
mbean = “org.apache.cassandra.metrics:name=*,type=Storage”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “CQL”
mbean = “org.apache.cassandra.metrics:name=*,type=CQL”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ClientRequest”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=ClientRequest”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “CommitLog”
mbean = “org.apache.cassandra.metrics:name=*,type=CommitLog”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Compaction”
mbean = “org.apache.cassandra.metrics:name=*,type=Compaction”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “DroppedMessage”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=DroppedMessage”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ThreadPools”
mbean = “org.apache.cassandra.metrics:name=*,path=*,scope=*,type=ThreadPools”
tag_keys = [“name”, “path”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “ReadTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteTotalLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “WriteTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=SnapshotsSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “SnapshotsSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=SSTablesPerReadHistogram,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “SSTablesPerReadHistogram_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=TombstoneScannedHistogram,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “TombstoneScannedHistogram_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=TotalDiskSpaceUsed,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “TotalDiskSpaceUsed_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MaxRowSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MaxRowSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MinRowSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MinRowSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MemtableSwitchCount,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MemtableSwitchCount_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=TotalDiskSpaceUsed,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “TotalDiskSpaceUsed_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=PendingFlushes,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “PendingFlushes_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=PendingCompactions,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “PendingCompactions_”
[[inputs.net]]

Customize Cassandra performance metrics

There are many Cassandra performance metrics we can use. We can customize which metrics we want to monitor. This link shows how to add your own metrics.

After this, we can see these metrics from Grafana. We can create our own dashboard based on our metrics. Here are some common metrics for Cassandra cluster. We can definitely add more metrics based on our requirements.

Here is a full list of metrics for Cassandra cluster.

These metrics are based on each node. We can easily see which node has problem during an issue time. Monitoring is a very critical thing for Cassandra.

David Cao
David Cao

Hey there! I am David, a Cloud & DevOps Enthusiast and 18 years of experience as a Linux engineer. I work with AWS, Git & GitHub, Linux, Python, Ansible, and Bash. I am a technical blogger and a Software Engineer, enjoy sharing my learning and contributing to open-source.