Monitor Apache Cassandra cluster with free open source software

Updated: Feb 8

Many people choose free open source software to monitor Cassandra cluster to save cost. In our env, we use telegraf/Influxdb/Grafana to monitor Cassandra performance. This monitoring package is open source. We can download and use them for free.



Use TIG to monitor Cassandra performance

This system is very stable in our env. That is the main reason why we recommend it. The configuration is very easy. There is an official input jolokia2 for Cassandra cluster in telegraf agent.


We can use it directly. Here is a telegraf agent configuration file in our env. We can just get the below /etc/telegraf/telegraf.conf configuration file to your env.


Usually, there are many nodes in one cluster. We can use ansible to deploy this. That saves a lot of time.



How to Config Telegraf to Monitor Cassandra

We can use the following configuration file to monitor Cassandra's performance.

[global_tags]
  cluster_name = "{{ clustername }}"
  dc = "{{ dc }}"
[agent]
  interval = "30s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_max_size = "10MB"
  logfile_rotation_max_archives = 10
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
 urls = ["http://xxxx:8086"] :repalce this with your influx db IP
  database = "telegraf"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.jolokia]]
   context = "/jolokia/"
   [[inputs.jolokia.servers]]
    name = "as-server-01"
     host = "127.0.0.1"
     port = "8080"
   [[inputs.jolokia.metrics]]
     name = "heap_memory_usage"
     mbean  = "java.lang:type=Memory"
     attribute = "HeapMemoryUsage"
   [[inputs.jolokia.metrics]]
    name = "thread_count"
     mbean  = "java.lang:type=Threading"
     attribute = "TotalStartedThreadCount,ThreadCount,DaemonThreadCount,PeakThreadCount"
   [[inputs.jolokia.metrics]]
     name = "class_count"
     mbean  = "java.lang:type=ClassLoading"
     attribute = "LoadedClassCount,UnloadedClassCount,TotalLoadedClassCount"
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8778/jolokia"]
  name_prefix = "java_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Memory"
    mbean = "java.lang:type=Memory"
  [[inputs.jolokia2_agent.metric]]
    name  = "GarbageCollector"
    mbean = "java.lang:name=*,type=GarbageCollector"
    tag_keys = ["name"]
    field_prefix = "$1_"
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8778/jolokia"]
  name_prefix = "cassandra_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,type=Keyspace"
    tag_keys = ["keyspace", "name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,type=Keyspace"
    tag_keys = ["keyspace", "name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,type=Keyspace"
    tag_keys = ["keyspace", "name", "host"]
    field_prefix = "ReadTotalLatency_"

  [[inputs.jolokia2_agent.metric]]
    name  = "Cache"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=Cache"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Client"
    mbean = "org.apache.cassandra.metrics:name=*,type=Client"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ReadRepair"
    mbean = "org.apache.cassandra.metrics:name=*,type=ReadRepair"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Storage"
    mbean = "org.apache.cassandra.metrics:name=*,type=Storage"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "CQL"
    mbean = "org.apache.cassandra.metrics:name=*,type=CQL"
    tag_keys = ["name"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "ClientRequest"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=ClientRequest"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "CommitLog"
    mbean = "org.apache.cassandra.metrics:name=*,type=CommitLog"
    tag_keys = ["name"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "Compaction"
    mbean = "org.apache.cassandra.metrics:name=*,type=Compaction"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "DroppedMessage"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=DroppedMessage"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "ThreadPools"
    mbean = "org.apache.cassandra.metrics:name=*,path=*,scope=*,type=ThreadPools"
    tag_keys = ["name", "path", "scope"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "ReadTotalLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteTotalLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "WriteTotalLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=SnapshotsSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "SnapshotsSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=SSTablesPerReadHistogram,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "SSTablesPerReadHistogram_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=TombstoneScannedHistogram,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "TombstoneScannedHistogram_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=TotalDiskSpaceUsed,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "TotalDiskSpaceUsed_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MaxRowSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MaxRowSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MinRowSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MinRowSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MemtableSwitchCount,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MemtableSwitchCount_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=TotalDiskSpaceUsed,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "TotalDiskSpaceUsed_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=PendingFlushes,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "PendingFlushes_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=PendingCompactions,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "PendingCompactions_"
[[inputs.net]]

Customize Cassandra performance metrics

There are many Cassandra performance metrics we can use. We can customize which metrics we want to monitor. This link shows how to add your own metrics.

After this, we can see these metrics from Grafana. We can create our own dashboard based on our metrics. Here are some common metrics for Cassandra cluster. We can definitely add more metrics based on our requirements.

Cassandra Active write request count
Cassandra Active write request count
Cassandra Read Request Count
Cassandra Read Request Count
Cassandra write latency
Cassandra write latency
Cassandra read latency
Cassandra read latency

Here is a full list of metrics for Cassandra cluster.

These metrics are based on each node. We can easily see which node has problem during an issue time. Monitoring is a very critical thing for Cassandra.



75 views

Join our newsletter. Get a free Linux account on Cloud.

Get a Free Cloud Server! 

We can use this cloud server to practice Linux commands. Never miss a post!

Thanks for submitting!