ScyllaDB vs DataStax™:
6 Important Differences

ScyllaDB Advantages and a Comparison of Features

ScyllaDB vs DataStax™ Overview

Many companies in Media, IoT, financial services, retail and other industries have migrated from DataStax EnterpriseTM (DSE) to ScyllaDB over the last few years. In doing so, these companies have realized faster, more consistent NoSQL performance for their mission-critical applications, saved millions in infrastructure and licensing costs, and freed up countless hours previously spent tuning their systems in attempts to get desired levels of performance. In this ScyllaDB vs DataStax breakdown, we explore ScyllaDB’s advantages over DSE, followed by a comprehensive DataStax feature comparison.

For starters, ScyllaDB is much less expensive than DSE. ScyllaDB was designed from the ground-up to deliver the best possible price-performance. Its low-level design squeezes every cycle from your CPU — from analyzing C++ compiled assembler code, to using the best kernel async interfaces for system calls. ScyllaDB even caches paged query pointers. It has its own memory allocator and its own schedulers for CPU and IO. And ScyllaDB is designed to run at 100% CPU utilization, with every operation classified to a priority class. There is no need to over-provision.
Hear about how FireEye found ScyllaDB to be the best option as a back-end to their massive graph database.
ScyllaDB delivers consistent, reliable real-time performance. That’s because its built-in schedulers prioritize reads and writes over maintenance tasks, such as repairs and compactions — eliminating latency spikes. With ScyllaDB, you never experience garbage collection stalls, which impact the performance of DSE. To avoid such interruptions, ScyllaDB adopts a comprehensive ‘shared-nothing’ design. No locks are taken, so latency is never affected.
Comcast was able to reduce P99 response times by 95% after migrating to ScyllaDB.
Speed doesn’t have to come at the price of complexity. ScyllaDB simplifies everything for its users. It automatically configures the RAID device for you with the right striping, and automatically assigns the NICs network queues to shards. ScyllaDB installs daemons in an isolated Linux control group to cap their memory/CPU usage. ScyllaDB’s setup tool runs a disk benchmark to maximize throughput while keeping latency low. With ScyllaDB, DevOps can leave tuning to the database itself.
GE Predix was able to greatly reduce the administrative burden to meet their SLAs after switching to ScyllaDB.
ScyllaDB has a notable maintainability advantage as well. It scales up to any number of cores and can stream data to a 60TB meganode just as fast as it does to smaller nodes. These capabilities enable you to shrink your DSE cluster by 10x. Rolling restarts, for example, become 10x faster. ScyllaDB add-node and decommission operations are restartable. You can pause them and resume them from the previous point. Compaction, a maintenance headache with DSE, is a solved problem in ScyllaDB.
Fanatics was able to replace 43 nodes of Cassandra with just 3 nodes of ScyllaDB.
You can simply do more with ScyllaDB than you can with DSE. ScyllaDB supports global and local secondary indexes simultaneously. You can use real, scalable indexes with your data model. Our Workload Prioritization enables you to assign relative priority to different user workloads in a simple role-based fashion. That way you can safely run operational workloads alongside analytics workloads. You can prioritize SLAs to your production queries and run your dev queries with the lowest priority, all while consolidating your data infrastructure and making it easier to manage. ScyllaDB supports change-data-capture as a CQL table, thus you can easily track your DB changes in a consistent way with the same query language you already know.
Grab found it very easy to use ScyllaDB for their real-time threat detection system.
Teams can easily migrate applications that use Cassandra or DSE to ScyllaDB. ScyllaDB and DSE are identical where it counts: The CQL protocol and queries, nodetool, SSTables and compaction strategies — even JMX support. To enable even more consolidation, ScyllaDB also supports a DynamoDB-compatible API, so you can migrate even more use cases. ScyllaDB supports many of the same open source projects as DSE, including JanusGraph, Spark, Kafka (using our optimized ScyllaDB connector), Presto, KairosDB, Kong and many others. ScyllaDB choses the best open source projects – Prometheus and Grafana for metrics, Wireshark for packet analysis, systemd for Linux daemons and a Kubernetes operator for provisioning.
The team at SAS was shocked that they didn’t need to change any code for their application.

ScyllaDB vs DataStax Enterprise: Feature Comparison

ScyllaDB Product Roadmaps

Performance
Performance Simplicity Functionality Operations

Shard-per-Core

ScyllaDB’s innovative shard-per-core design divides a server’s resources into shared-nothing units of CPU-core, RAM, persistent storage and network I/O. ScyllaDB runs at near maximum utilization on all available cores of multi-CPU hardware architectures. ScyllaDB’s end-to-end sharded architecture, along with shard-aware drivers means that each client writing or requesting data can send queries directly to the CPU core responsible for that shard of data. This eliminates hot shards and removes extra hops.

Three years after we first introduced this feature, the DSE 6.0 closed-source release followed our sharding design. However, DSE sharding is basic, operating only at the thread level (“thread-per-core”) and lacking the sophisticated schedulers and back pressure mechanisms found in ScyllaDB.

 

Everything Asynchronous

At the heart of ScyllaDB lies its core engine, the Seastar framework, a standalone library developed by ScyllaDB. Seastar has a specialized scheduler with a fully async programming paradigm of futures and promises and advanced concepts like coroutines, that can run a million lambda functions per core per second. Seastar is responsible for scheduling, networking, Direct Memory Access, shard-per-core memory allocators and so forth. Seastar is written in C++20 and uses every innovative trick and paradigm. Find out more about ScyllaDB’s “everything-async” architecture here.

Close-to-the-Metal Design

ScyllaDB uses C++20 and the best compiler techniques, as well as deep knowledge of the Linux kernel and per-core hyperthreading to maximize CPU utilization. ScyllaDB also automatically configures your network card interrupts to balance interrupt request (IRQ) processing across CPU cores. ScyllaDB explicitly chooses to read-ahead data from the drive when it expects a follow on disk access instead of blindly relying on the disk, as is the case with Cassandra and DSE. ScyllaDB controls all aspects of CPU execution and runs procedures to use the CPU idle time so memory layout will be optimized.

Shard-aware Drivers

Like DSE, ScyllaDB’s client driver is topology aware and will prefer a node that owns the keyrange under query. ScyllaDB takes this design one step further and enables the client to reach the specific CPU core within the replica that owns the data. This design improves load balancing among the servers and provides superior performance by avoiding the concurrency overhead. Shard-aware ScyllaDB drivers are available for Java, Go, and Python. However, ScyllaDB also fully supports standard Cassandra/DSE drivers for drop-in replacement compatibility.

Lightweight Transactions

ScyllaDB’s lightweight transactions (LWT) use a Paxos algorithm mechanism similar to Cassandra and DSE, but they involve one fewer round trips, making them more efficient, faster and with lower latency. While DSE issues a separate read query to fetch the old record, ScyllaDB piggybacks the read result on the response to the prepare round. ScyllaDB’s LWT has a special commitlog mode that automatically balances between the transaction durability flush requirement and fast, non-transactional operations.

Bypass Cache

Per-query cache bypass clause on SELECT statements allows for range scan queries that typically process large amounts of data to skip reading the cache and avoids populating the cache, flooding it with data. Bypass cache keeps your working set in-cache, enabling you to squeeze more performance from your cluster and minimizes RAM overhead, so real-time queries receive the best latency.

NUMA Optimized

ScyllaDB is designed to be Non-Uniform Memory Access (NUMA)-friendly, with highly optimized memory management down to the application binaries. Each shard of data is assigned its own chunk of RAM and CPU core,  bound within the same socket. This is referred to as “NUMA-local” processing. A non-NUMA friendly deployment causing memory access to be twice as expensive.

Row-Level Repair

ScyllaDB implements a different repair checksum algorithm that resembles rsync and runs the checksum at row granularity instead of partition granularity. The new algorithm conciliates repair faster and more efficiently, sends less data over the wire reducing network traffic and is less sensitive to large partitions.

DSE instead uses a Merkle tree mechanism for anti-entropy repairs. It has no row-level repair mechanism, and thus repairs take longer and are far less efficient, generating many times more network traffic and degrading overall performance.

Compaction is a Solved Problem

ScyllaDB’s I/O scheduler prioritizes read/write operations class over compactions. When reads and writes spike, ScyllaDB automatically queues compaction activity so that they do not impact your throughputs or latencies. ScyllaDB runs compaction at full speed only when there is CPU/disk idle time, with all cores running compactions in parallel.

With DSE, with their heavier utilization of system resources, you have to worry about and track compactions. Unlike DSE, ScyllaDB requires no brittle tuning.

Workload Prioritization

ScyllaDB allows for operational and analytics workloads to run against a shared cluster — a unique feature we call Workload Prioritization. ScyllaDB’s built-in scheduler prioritizes transactions and tasks based on ‘shares’ of system resources assigned per-user, balancing requests to maintain desired service level agreements (SLAs) for each service. This enables you to run a single cluster that is scaled to support both types of operations, simplifying your architecture and saving on hardware provisioning.

With DSE, you have no control over system utilization so that, for example, full scan queries could cause timeouts, blocking ongoing write transactions.

No JVM

ScyllaDB is written in natively compiled C++, and thus does not use a Java Virtual Machine (JVM). There is no need to optimize JVM flags, and you can forget about Garbage Collection (GC) stalls, tuning and surprises. With ScyllaDB, there is no need to compute heap sizes, no need to divide the RAM between the JVM, the off-heap and the page cache. DSE suffers from the worst of all possible worlds, having to manage memory (pools, off heap), ongoing tuning, and suffering slowdowns due to the JVM.

Self-Tuning

Forget about tuning your database! Upon installation ScyllaDB runs a benchmark to measure your disk and will make all of the Linux configurations on your behalf — from RAID setup to clock drift and fstrim disk scheduling. Using control theory, ScyllaDB continuously makes the database less fragile by dynamically tuning the way resources are used instead of requiring an operator to adjust an overwhelming number of configurations on the fly.

With DSE, you have to tune many aspects manually, including the JVM, bloom filters, row and key caches, and even memtable thresholds. Not to mention all the operating system and hardware settings.

Forget about Caching

ScyllaDB has a unified row-based cache system that automatically tunes itself, allowing it to adapt to different data access patterns and workloads. With ScyllaDB, with its inherent low-latency design, there’s no need for external caches, further simplifying the infrastructure.

DSE uses several separate competing caches (key cache, row cache, on/off heap, and Linux OS page cache) that require an operator to analyze and tune — a manual process that will never be able to keep up with users’ dynamic workloads.

Open Standards

ScyllaDB adopts open standards and allows you to use the best open source tools. ScyllaDB’s metrics are based on Prometheus for collection, Grafana dashboards for presentation and Grafana Loki for log aggregation. Both Prometheus and Grafana are Cloud Native Computing Foundation (CNCF) graduated projects. ScyllaDB contributed to Wireshark to add support for CQL and also for its internal RPC for better traceability. ScyllaDB strives to maintain Apache Cassandra compatibility.

DSE is a closed-source product, and even deviates from Apache Cassandra standards, such as using proprietary on-disk SSTable formats.

Local and Global Secondary Indexes 

ScyllaDB provides flexibility, supporting both local and global secondary indexes. ScyllaDB allows for tables to have global secondary indexes across all nodes in a cluster, not just locally on a single node. This is an efficient way to look up rows because you can find the node hosting the row by hashing the partition key. However, this also means that finding a row using a non-partition key requires a full table scan which is inefficient. That’s why we also support local secondary indexes too.

DSE only supports local indexes, which aren’t scalable and aren’t designed for performing full table scans.

Agile, Consistent Open Source Community

ScyllaDB employs the best open source experts and has a legacy of consistent open source contributions. We are committed to open source and, unlike DataStax, our commitment hasn’t wavered. Seastar, ScyllaDB’s core engine, is used at the core of other open source projects such as Redpanda data streaming engine, the Ceph storage engine and others. The shard-aware GoCQLX, Python and Rust drivers were developed by the ScyllaDB team, with contributions from the open source community. ScyllaDB has also offered contributions to the broader open source community, such as enhancements to Linux XFS in order to make it more asynchronous, contributed kernel code for system call efficiency, and made numerous other contributions.

Repair-based Node Operations

Starting with ScyllaDB 4.0, node operations such as streaming and decommission are based on repair algorithms under the hood. This enables you to pause or restart repair operations while going back to the same checkpoint position, saving a lot of time on administrative operations and keeping repair impact as low as possible on your cluster..

DSE has no such feature, but offers NodeSync for continuous background repairs. However, DataStax admits, “It is not a race-free lock; there is a possibility of duplicated work.”

Change Data Capture (CDC)

ScyllaDB enables you to easily and consistently track and stream table changes. Change data is stored in ScyllaDB as a standard CDC-readable table that developers can query using standard CQL. The data is consistent across the replica set and can provide the previous version of the data changed. ScyllaDB’s CDC surpasses DSE’s CDC in terms of ease of use and functionality. It supports capturing the pre- and post-image as well as the delta (specific changes) for the record. Query results are automatically de-duplicated.

DSE, on the other hand, stores CDC in a special commitlog-like structure on each node. It cannot be read using CQL queries, and requires special applications to be written to aggregate and de-duplicate the data.

Better Elasticity

ScyllaDB enables you to scale up and out efficiently.  Its ability to fully utilize system resources allows ScyllaDB to run on smaller clusters of larger nodes. ScyllaDB is adding Raft consensus protocol to permit scaling out of multiple nodes at a time.

When the time comes to scale your DSE deployment, you will find that Cassandra/DSE can add only one node at a time. Plus, since DSE nodes will generally be smaller instances, each addition only brings on a lower amount of storage and compute. As a result, it will take at least 10x longer to expand your cluster. Since expanding your cluster hinders your ability to quickly react to changes, you’re forced to overprovision up-front.

DynamoDB API

ScyllaDB supports Amazon DynamoDB-compatible operations, including a version of DynamoDB Streams, which we call Project Alternator. This provides users more flexibility in data models and query APIs.. ScyllaDB allows you to avoid vendor lock-in. You are free to choose among multiple database APIs and at any point change your physical deployment, from on-premises to the cloud vendor of your choice.

DSE only provides a CQL-compatible API. It offers no DynamoDB compatibility.

Heat-Weighted Load Balancing

Heat-weighted load balancing efficiently performs rolling node upgrades and reboots by allowing cold nodes to slowly ramp up into requests as its cache is being populated. This aids after restarts or any other time when a cluster loses its cache.

DSE does not support heat-weighted load balancing.

Real Meganodes

ScyllaDB can linearly scale-up performance, even on the largest machines available today, such as the AWS i3en.24xlarge, which provides 60TB of local NVMe SSD storage. Custom-built on-premises equipment can scale even larger. On ScyllaDB, compaction and streaming on such a large instance takes exactly the same amount of time as on a small i3en.xlarge.

DSE, written in Java and limited by JVM performance, has issues with nodes larger than 2TB. Exceeding this size per node may result in delays in bootstrapping, repairs, compactions, and recovery.

Incremental Compaction Strategy

ScyllaDB’s incremental compaction strategy (ICS) enhances Cassandra’s existing Size-tiered Compaction Strategy (STCS) strategy by dividing SStables into increments. ICS greatly reduces the temporary space amplification which is typical of STCS, resulting in more disk space being available for storing user data, eliminating the typical requirement of 50% free space in your drive.

DataStax has no comparable feature.

See How ScyllaDB Compares

See ScyllaDB vs Apache Cassandra, Amazon DynamoDB and Google BigTable