Monitor Apache Cassandra with open-source Software

Updated: Jun 9

Apache Cassandra cluster is a distributed system. Monitoring this cluster is not an easy task. There are a few monitoring tools for Cassandra in the market. Most of them are commercial products. Today we are going to use open source software to monitor Cassandra cluster.



Monitor Cassandra with TIG System

In our env, we use telegraf/Influxdb/Grafana to monitor Cassandra performance. This monitoring package is open source software. We can download and use them for free.


There is an official input jolokia2 for Cassandra cluster in telegraf agent. We can use it directly. The following telegraf agent configuration file covers the basic OS metrics and Cassandra metrics.


Usually, there are many nodes in one Cassandra cluster. Using automation tool like ansible to deploy this can save us a lot of time.




[global_tags]
  cluster_name = "{{ clustername }}"
  dc = "{{ dc }}"
[agent]
  interval = "30s"
  round_interval = true
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  collection_jitter = "0s"
  flush_interval = "10s"
  flush_jitter = "0s"
  precision = ""
  debug = true
  logfile = "/var/log/telegraf/telegraf.log"
  logfile_rotation_max_size = "10MB"
  logfile_rotation_max_archives = 10
  hostname = ""
  omit_hostname = false
[[outputs.influxdb]]
 urls = ["http://xxxx:8086"] :repalce this with your influx db IP
  database = "telegraf"
[[inputs.cpu]]
  percpu = true
  totalcpu = true
  collect_cpu_time = false
  report_active = false
[[inputs.disk]]
  ignore_fs = ["tmpfs", "devtmpfs", "devfs", "iso9660", "overlay", "aufs", "squashfs"]
[[inputs.diskio]]
[[inputs.kernel]]
[[inputs.mem]]
[[inputs.processes]]
[[inputs.swap]]
[[inputs.system]]
[[inputs.jolokia]]
   context = "/jolokia/"
   [[inputs.jolokia.servers]]
    name = "as-server-01"
     host = "127.0.0.1"
     port = "8080"
   [[inputs.jolokia.metrics]]
     name = "heap_memory_usage"
     mbean  = "java.lang:type=Memory"
     attribute = "HeapMemoryUsage"
   [[inputs.jolokia.metrics]]
    name = "thread_count"
     mbean  = "java.lang:type=Threading"
     attribute = "TotalStartedThreadCount,ThreadCount,DaemonThreadCount,PeakThreadCount"
   [[inputs.jolokia.metrics]]
     name = "class_count"
     mbean  = "java.lang:type=ClassLoading"
     attribute = "LoadedClassCount,UnloadedClassCount,TotalLoadedClassCount"
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8778/jolokia"]
  name_prefix = "java_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Memory"
    mbean = "java.lang:type=Memory"
  [[inputs.jolokia2_agent.metric]]
    name  = "GarbageCollector"
    mbean = "java.lang:name=*,type=GarbageCollector"
    tag_keys = ["name"]
    field_prefix = "$1_"
[[inputs.jolokia2_agent]]
  urls = ["http://localhost:8778/jolokia"]
  name_prefix = "cassandra_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,type=Keyspace"
    tag_keys = ["keyspace", "name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,type=Keyspace"
    tag_keys = ["keyspace", "name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Keyspace"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,type=Keyspace"
    tag_keys = ["keyspace", "name", "host"]
    field_prefix = "ReadTotalLatency_"

  [[inputs.jolokia2_agent.metric]]
    name  = "Cache"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=Cache"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Client"
    mbean = "org.apache.cassandra.metrics:name=*,type=Client"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ReadRepair"
    mbean = "org.apache.cassandra.metrics:name=*,type=ReadRepair"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "Storage"
    mbean = "org.apache.cassandra.metrics:name=*,type=Storage"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "CQL"
    mbean = "org.apache.cassandra.metrics:name=*,type=CQL"
    tag_keys = ["name"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "ClientRequest"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=ClientRequest"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "CommitLog"
    mbean = "org.apache.cassandra.metrics:name=*,type=CommitLog"
    tag_keys = ["name"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "Compaction"
    mbean = "org.apache.cassandra.metrics:name=*,type=Compaction"
    tag_keys = ["name"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "DroppedMessage"
    mbean = "org.apache.cassandra.metrics:name=*,scope=*,type=DroppedMessage"
    tag_keys = ["name", "scope"]
    field_prefix = "$1_"

  [[inputs.jolokia2_agent.metric]]
    name  = "ThreadPools"
    mbean = "org.apache.cassandra.metrics:name=*,path=*,scope=*,type=ThreadPools"
    tag_keys = ["name", "path", "scope"]
    field_prefix = "$1_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadTotalLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "ReadTotalLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteTotalLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "WriteTotalLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=ReadLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=WriteLatency,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=SnapshotsSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "SnapshotsSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=SSTablesPerReadHistogram,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "SSTablesPerReadHistogram_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=TombstoneScannedHistogram,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "TombstoneScannedHistogram_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=TotalDiskSpaceUsed,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "TotalDiskSpaceUsed_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MaxRowSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MaxRowSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MinRowSize,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MinRowSize_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily"
    mbean = "org.apache.cassandra.metrics:keyspace=*,name=MemtableSwitchCount,scope=*,type=ColumnFamily"
    tag_keys = ["keyspace", "name", "scope"]
    field_prefix = "MemtableSwitchCount_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=ReadLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "ReadLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=WriteLatency,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "WriteLatency_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=TotalDiskSpaceUsed,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "TotalDiskSpaceUsed_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=PendingFlushes,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "PendingFlushes_"
  [[inputs.jolokia2_agent.metric]]
    name  = "ColumnFamily_Global"
    mbean = "org.apache.cassandra.metrics:name=PendingCompactions,type=ColumnFamily"
    tag_keys = ["name"]
    field_prefix = "PendingCompactions_"
[[inputs.net]]


Visualize Apache Cassandra Metrics

You can check this link for this input. There are many Cassandra performance metrics we can use. https://github.com/influxdata/telegraf/tree/master/plugins/inputs/jolokia2


From Grafana we can get these metrics. Following are some common metrics for Cassandra cluster. We can also create our own dashboard based on these metrics. We can add more metrics based on our requirements.







Here is a full list of metrics for Cassandra cluster.

https://cassandra.apache.org/doc/latest/operating/metrics.html


These metrics are based on each node. We can easily see which node has problem during an issue time. Monitoring is a very critical thing for Cassandra.



178 views