Aug31

2017 Q3 Apache Cassandra User Survey

To get a better picture of how Apache Cassandra is used, we surveyed 70 expert Apache Cassandra users who actively use Apache Cassandra for in-production workloads. Their workloads represent more than 200 clusters running 600TB of combined data on Apache Cassandra.

Our summary of the findings includes these major highlights:

  • Performance, Tuning, and Administration accounted for 70% of total identified core Apache Cassandra challenges.
  • More than 30% of respondents don’t run repair over their Cassandra cluster.
  • Nearly 25% of respondents run more than five Cassandra clusters.
  • More than 12% of respondents run more than 100 nodes in their largest cluster.
  • Amazon Web Services is the most popular infrastructure with more than 60% of respondents using it.
  • Among monitoring solutions and tools, the community has not converged on a clear leader. However, DataStax OpsCenter is the most popular monitoring solution for Cassandra with more than 20% of respondents using it.
  • Cassandra Version 2.x (including DSE 4) is the most popular with more than 50% of deployments utilizing this version.

More than 50% of deployments are still on version 2.x (including DSE 4)**.

More than 70% of respondents develop their application with Java. Python, Node.js, and Scala follow behind**.

Linux packages and Docker are the most popular deployments for Apache Cassandra with 80% of respondents using either one or both in production.**

Amazon Web Services is the most popular infrastructure with more than 60% of respondents using it. The on-premises install base is more than 3x the size of Google Cloud Platform and Microsoft Azure combined**.

Open Source deployments are 30% more popular than DataStax with 60% of respondents running Open Source Apache Cassandra**.

Among monitoring solutions and tools, the community has not converged on a clear leader.

DataStax OpsCenter is the most popular monitoring solution for Cassandra with more than 20% of respondents using it followed by Splunk, AWS Cloud Watch, and Nagios**.

Multi-tenant deployments are used by nearly half of respondents.

While nearly half the users have some form of multi-tenancy, most run an average of 5 different clusters, indicating that users have different perceptions on what multi-tenancy is.

58% of respondents run Apache Cassandra across 2 data centers or more.

Nearly 25% of respondents run more than 5 Apache Cassandra clusters.

More than 50% of respondents run between 1-10 nodes in their largest Apache Cassandra cluster and more than 12% of respondents run more than 100.

More than 25% of respondents operate 10TB or more of data in Apache Cassandra.

SSD is the most popular storage type for Apache Cassandra with more than 50% of respondents using it.

Performance, Tuning, and Administration accounted for 70% of total identified core Apache Cassandra challenges.**

Nearly 50% of users reported using the time-series-friendly compaction strategies – TWCS+DTCS.

More than 60% of respondents use either Key Cache or Row Cache.

G1GC is the most popular JVM garbage collection algorithm used by 65% of respondents.

Nodetool and JMX are the most popular tools for tracing latency problems by more than 50%.

More than 11% use Materialized Views.

More than 30% of respondents don’t run repair on Apache Cassandra.

The goal of this survey is to provide the Cassandra community a benchmark with statistically significant data to make optimal database decisions. Questions or comments? Feel free to discuss over Twitter via @scylladb.

Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

**Question contained multiple answers

Scylla TeamAbout Scylla Team

Scylla is the world’s fastest column-store database: the functionality of Apache Cassandra with the speed of a light key/value store.

Tags: Cassandra, survey