To get a better picture of how Apache Cassandra is used, we surveyed 70 expert Apache Cassandra users who actively use Apache Cassandra for in-production workloads. Their workloads represent more than 200 clusters running 600TB of combined data on Apache Cassandra.
Our summary of the findings includes these major highlights:
- Performance, Tuning, and Administration accounted for 70% of total identified core Apache Cassandra challenges.
- More than 30% of respondents don’t run repair over their Cassandra cluster.
- Nearly 25% of respondents run more than five Cassandra clusters.
- More than 12% of respondents run more than 100 nodes in their largest cluster.
- Amazon Web Services is the most popular infrastructure with more than 60% of respondents using it.
- Among monitoring solutions and tools, the community has not converged on a clear leader. However, DataStax OpsCenter is the most popular monitoring solution for Cassandra with more than 20% of respondents using it.
- Cassandra Version 2.x (including DSE 4) is the most popular with more than 50% of deployments utilizing this version.
More than 50% of deployments are still on version 2.x (including DSE 4)**.
More than 70% of respondents develop their application with Java. Python, Node.js, and Scala follow behind**.
Linux packages and Docker are the most popular deployments for Apache Cassandra with 80% of respondents using either one or both in production.**
Amazon Web Services is the most popular infrastructure with more than 60% of respondents using it. The on-premises install base is more than 3x the size of Google Cloud Platform and Microsoft Azure combined**.
Open Source deployments are 30% more popular than DataStax with 60% of respondents running Open Source Apache Cassandra**.
Among monitoring solutions and tools, the community has not converged on a clear leader.
DataStax OpsCenter is the most popular monitoring solution for Cassandra with more than 20% of respondents using it followed by Splunk, AWS Cloud Watch, and Nagios**.
Multi-tenant deployments are used by nearly half of respondents.
While nearly half the users have some form of multi-tenancy, most run an average of 5 different clusters, indicating that users have different perceptions on what multi-tenancy is.
58% of respondents run Apache Cassandra across 2 data centers or more.
Nearly 25% of respondents run more than 5 Apache Cassandra clusters.
More than 50% of respondents run between 1-10 nodes in their largest Apache Cassandra cluster and more than 12% of respondents run more than 100.
More than 25% of respondents operate 10TB or more of data in Apache Cassandra.
SSD is the most popular storage type for Apache Cassandra with more than 50% of respondents using it.
Performance, Tuning, and Administration accounted for 70% of total identified core Apache Cassandra challenges.**
Nearly 50% of users reported using the time-series-friendly compaction strategies – TWCS+DTCS.
More than 60% of respondents use either Key Cache or Row Cache.
G1GC is the most popular JVM garbage collection algorithm used by 65% of respondents.
Nodetool and JMX are the most popular tools for tracing latency problems by more than 50%.
More than 11% use Materialized Views.
More than 30% of respondents don’t run repair on Apache Cassandra.
The goal of this survey is to provide the Cassandra community a benchmark with statistically significant data to make optimal database decisions. Questions or comments? Feel free to discuss over Twitter via @scylladb.
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.
**Question contained multiple answers