Skip to Content

Monitor Apache Cassandra with open-source Software

Apache Cassandra cluster is a distributed system. Monitoring this cluster is not an easy task. There are a few monitoring tools for Cassandra in the market. Most of them are commercial products. Today we are going to use open source software to monitor Cassandra cluster.

Monitor Cassandra with TIG System

In our env, we use telegraf/Influxdb/Grafana to monitor Cassandra performance. This monitoring package is open source software. We can download and use them for free.

There is an official input jolokia2 for Cassandra cluster in telegraf agent. We can use it directly. The following telegraf agent configuration file covers the basic OS metrics and Cassandra metrics.

Usually, there are many nodes in one Cassandra cluster. Using automation tool like ansible to deploy this can save us a lot of time.

[global_tags]
cluster_name = “{{ clustername }}”
dc = “{{ dc }}”
[agent]
interval = “30s”
round_interval = true
metric_batch_size = 1000
metric_buffer_limit = 10000
collection_jitter = “0s”
flush_interval = “10s”
flush_jitter = “0s”
precision = “”
debug = true
logfile = “/var/log/telegraf/telegraf.log”
logfile_rotation_max_size = “10MB”
logfile_rotation_max_archives = 10
hostname = “”
omit_hostname = false
[[outputs.influxdb]]
urls = [“http://xxxx:8086”] :repalce this with your influx db IP
database = “telegraf”
[[inputs.cpu]]
percpu = true
totalcpu = true
collect_cpu_time = false
report_active = false
[[inputs.disk]]
ignore_fs = [“tmpfs”, “devtmpfs”, “devfs”, “iso9660”, “overlay”, “aufs”, “squashfs”]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.jolokia]]
context = “/jolokia/”
[[inputs.jolokia.servers]]
name = “as-server-01”
host = “127.0.0.1”
port = “8080”
[[inputs.jolokia.metrics]]
name = “heap_memory_usage”
mbean = “java.lang:type=Memory”
attribute = “HeapMemoryUsage”
[[inputs.jolokia.metrics]]
name = “thread_count”
mbean = “java.lang:type=Threading”
attribute = “TotalStartedThreadCount,ThreadCount,DaemonThreadCount,PeakThreadCount”
[[inputs.jolokia.metrics]]
name = “class_count”
mbean = “java.lang:type=ClassLoading”
attribute = “LoadedClassCount,UnloadedClassCount,TotalLoadedClassCount”
[[inputs.jolokia2_agent]]
urls = [“http://localhost:8778/jolokia”]
name_prefix = “java_”
[[inputs.jolokia2_agent.metric]]
name = “Memory”
mbean = “java.lang:type=Memory”
[[inputs.jolokia2_agent.metric]]
name = “GarbageCollector”
mbean = “java.lang:name=*,type=GarbageCollector”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent]]
urls = [“http://localhost:8778/jolokia”]
name_prefix = “cassandra_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Keyspace”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,type=Keyspace”
tag_keys = [“keyspace”, “name”, “host”]
field_prefix = “ReadTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “Cache”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=Cache”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Client”
mbean = “org.apache.cassandra.metrics:name=*,type=Client”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ReadRepair”
mbean = “org.apache.cassandra.metrics:name=*,type=ReadRepair”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Storage”
mbean = “org.apache.cassandra.metrics:name=*,type=Storage”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “CQL”
mbean = “org.apache.cassandra.metrics:name=*,type=CQL”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ClientRequest”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=ClientRequest”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “CommitLog”
mbean = “org.apache.cassandra.metrics:name=*,type=CommitLog”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “Compaction”
mbean = “org.apache.cassandra.metrics:name=*,type=Compaction”
tag_keys = [“name”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “DroppedMessage”
mbean = “org.apache.cassandra.metrics:name=*,scope=*,type=DroppedMessage”
tag_keys = [“name”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ThreadPools”
mbean = “org.apache.cassandra.metrics:name=*,path=*,scope=*,type=ThreadPools”
tag_keys = [“name”, “path”, “scope”]
field_prefix = “$1_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “ReadTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteTotalLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “WriteTotalLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=SnapshotsSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “SnapshotsSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=SSTablesPerReadHistogram,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “SSTablesPerReadHistogram_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=TombstoneScannedHistogram,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “TombstoneScannedHistogram_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=TotalDiskSpaceUsed,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “TotalDiskSpaceUsed_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MaxRowSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MaxRowSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MinRowSize,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MinRowSize_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily”
mbean = “org.apache.cassandra.metrics:keyspace=*,name=MemtableSwitchCount,scope=*,type=ColumnFamily”
tag_keys = [“keyspace”, “name”, “scope”]
field_prefix = “MemtableSwitchCount_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “ReadLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “WriteLatency_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=TotalDiskSpaceUsed,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “TotalDiskSpaceUsed_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=PendingFlushes,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “PendingFlushes_”
[[inputs.jolokia2_agent.metric]]
name = “ColumnFamily_Global”
mbean = “org.apache.cassandra.metrics:name=PendingCompactions,type=ColumnFamily”
tag_keys = [“name”]
field_prefix = “PendingCompactions_”
[[inputs.net]]

Visualize Apache Cassandra Metrics

You can check this link for this input. There are many Cassandra performance metrics we can use. https://github.com/influxdata/telegraf/tree/master/plugins/inputs/jolokia2

From Grafana we can get these metrics. Following are some common metrics for Cassandra cluster. We can also create our own dashboard based on these metrics. We can add more metrics based on our requirements.

Here is a full list of metrics for Cassandra cluster.

https://cassandra.apache.org/doc/latest/operating/metrics.html

These metrics are based on each node. We can easily see which node has problem during an issue time. Monitoring is a very critical thing for Cassandra.