Scylla vs DataStax: Six Reasons to Switch

Six reasons to make the move to Scylla (and the DataStax features that make us the better choice)

scylla-vs-datastax-
scylla-vs-datastax

Optimal Price-Performance

Scylla achieves the ideal balance between cost and high performance. Scylla is designed from the ground up  with performance in mind, squeezing every possible cycle from the available hardware — from analyzing C++ compiled assembler code, to using the best kernel async interfaces for system calls. Scylla even caches paged query pointers. It has its own memory allocator and its own schedulers for CPU and IO. Scylla completely eliminates overprovisioning, running at 100% CPU utilization, with every operation classified by priority.

FireEye found Scylla to be the best option among DataStax competitors as a back-end to their massive graph database.

More Consistent Performance

Scylla delivers low-latency performance without spikes or surprises. With  built-in schedulers, Scylla guarantees that customer-facing workloads are prioritized over internal maintenance tasks, such as repairs and compactions. Unlike DataStax architecture, Scylla’s implements  a comprehensive  shared-nothing design that eliminates locks and garbage collection stalls.

Comcast reduced P99 response times by 95% by migrating to Scylla.

scylla-vs-datastax

Reduced Complexity

High performance and global availability  don’t  have to come at the price of  simplicity. Unlike Datastax, Scylla simplifies every aspect of database configuration and setup. . Scylla automates setup and frees DevOps from agonizing tuning parameters. It automatically configures RAID devices and independently assigns the NICs network queues to shards. Scylla installs daemons in an isolated Linux control group to cap their memory/CPU usage. Setup runs a disk benchmark to pinpoint  settings that  maximize throughput while minimizing latency. Scylla delivers a hands-free operational experience.

GE Predix reduces operational overhead  to meet their SLAs after switching to Scylla.

Better Maintainability

Stability and ease of maintenance are often more important than performance/cost. Scylla has a notable maintainability advantage as a distributed database. Since Scylla scales up to any number of cores and can stream data to a 60TB(!) meganode (at the same speed it streams to smaller nodes), you can decrease your cluster by 10x. So, for example, rolling restarts become 10x faster. Scylla add-node and decommission operations are *restartable*, you can pause them, resume them from the previous point. Compaction is a solved problem in Scylla.

Fanatics  reduced their infrastructure footprint from 43 nodes of Cassandra to 3 nodes of Scylla.

scylla-vs-datastax

Richer Functionality

A logical alternative to DataStax – Do more with Scylla than you can with DataStax and Cassandra. Scylla supports global and local indexes — even at the same time. Finally, real, scalable indexes can be used with your model. Scylla supports workload prioritization, enabling you to provide a different priority to different user workloads in a simple role-based fashion. You can provide a superior SLA to your production queries and run your dev queries with the lowest priority. Scylla supports change-data-capture as a CQL table, thus you can easily track your DB changes in a consistent way with the same query language you already know.

Grab found it very easy to use Scylla for their real-time threat detection system.

Frictionless Migration

Many Scylla users have made the switch from Datastax rapidly and with no downtime. As such, Scylla eliminates vendor lock-in via API-level compatibility with Apache Cassandra. Teams can easily migrate applications and instantly enjoy the benefits of a fundamentally better technology. Scylla provides full support for the CQL protocol and queries, nodetool, SSTables and compaction strategies; even JMX is supported. Scylla also exposes a DynamoDB compatible API, enabling consolidation across even more topologies and use cases. 

Scylla plays well in the big data ecosystem, supporting open source projects such as JanusGraph, Spark, Kafka, Presto, KairosDB, Kong and many others. Scylla targets the most widely adopted  open source projects, selecting Prometheus and Grafana for metrics, Wireshark for packet analysis, systemd for Linux daemons and a Kubernetes operator for provisioning.

The team at SAS was shocked that they were able to migrate to Scylla quickly, with no application downtime.

Scylla versus DataStax: Feature Comparison

Scylla’s innovative shard-per-core design divides a server’s resources into shared-nothing units of CPU-core, RAM, direct storage and network queue. Scylla runs on the highest amounts of cores on multiple CPU architectures, from x86 to arm, IBM Power and even mainframe. Scylla has an end-to-end sharded architecture, so each server core sends RPC to the right matching CPU core target on the remote replica machine. Additionally Scylla’s shard-aware drivers guarantee that the client is topology aware and will reach the CPU core shard that owns the data in order to eliminate hot shards and remove extra hops.

Datastax 6.0 close-source release followed our sharding design (3 years after we first introduced it). However, DSE sharding is basic, only on the thread level and lacks the sophisticated schedulers and back pressure mechanisms developed by Scylla.

At the heart of Scylla lies its core engine, Seastar. Seastar is a standalone Apache library developed by ScyllaDB. Several storage companies and the Ceph open source storage engine make use of Seastar technology. Seastar has a specialized scheduler, builds with a fully async programming paradigm of futures and promises and can run a million lambda functions per core per second. Seastar is responsible for the schedulers, the networking (it has a tcp stack in userspace but usually the Linux kernel is good enough), DMA to the disks, sharded memory allocators and so forth. Seastar is written in C++20 and uses every innovative trick and paradigm.

Beyond Seastar, Scylla uses C++20 and the best compiler techniques to maximize the cpu benefits. Scylla automatically configures your network card interrupts to balance IRQ processing across your cores. Scylla explicitly chooses to read-ahead data from the drive when it expects a follow on disk access instead of blindly relying on the disk like the case with Cassandra. Scylla controls all aspects of CPU execution and runs procedures to use the CPU idle time so memory layout will be optimized. 

Like Cassandra, the client driver is topology aware and will prefer a node that owns the keyrange under query. Scylla takes the design one step further and allows the client to reach the specific cpu core within the replica that owns the data. It improves the load balancing among the servers and improves the latency. Java, Go and Python have sharded Scylla drivers.

Scylla’s lightweight transactions are compatible with Cassandra’s but have one less round trip, and are therefore more efficient with better latency. Scylla’s LWT has a special commitlog mode that automatically balances between the transaction durability flush requirement and fast, non-transactional operations.

Per query cache bypass allows for range scan queries to skip the cache and not be stored in the cache. Bypass cache hints allow you to squeeze more performance from your cluster and to keep your working-set in-cache, so real-time queries receive the best latency.

Scylla is designed with highly optimized memory management down to the application binaries. Since each shard owns a chunk of rRAM and a CPU core, Scylla binds the CPU to the RAM within the same socket and makes sure that all accesses are done within the same socket. A non-NUMA friendly deployment causes memory access to be twice as expensive.

Scylla implements a different repair checksum algorithm that resembles rsync and runs the checksum at row granularity instead of partition granularity. The new algorithm conciliates repair faster, sends less data over the wire and is less sensitive to large partitions.

Stop worrying about and tracking compaction. Scylla’s I/O scheduler prioritizes compaction below the read/write operation class. When there’s a spike in queries, Scylla automatically queues compaction activity. When there is CPU/disk idle time, Scylla will run compaction at full speed. All cores run compaction in parallel. No tuning is required. Maximize your disk speed and improve your query latency.

Scylla allows for OLTP and OLAP workloads to share a cluster. Built-in scheduler prioritizes transactions and tasks based on shares of system resources assigned per-user, balancing requests to maintain desired service level agreements (SLAs) for each service. This allows you to run a single cluster scaled to support both types of operations, simplifying your architecture and saving you on hardware provisioning.

Stop optimizing flags, no more Garbage Collection (GC) tuning and surprises. JVMs are good for management applications but not for high speed infrastructure. No need to compute the heap size, no need to divide the RAM between the JVM, the off-heap and the page cache. Cassandra suffers from the worst of all worlds — having to manage memory (pools, off heap), ongoing tuning and suffering slow downs due to the JVM.

Using control theory, Scylla makes the database less fragile by dynamically tuning the way resources are used instead of requiring an operator to adjust an overwhelming number of configurations on the fly. Forget about tuning your database! Scylla runs a benchmark to measure your disk and will make all of the Linux configurations on your behalf — from RAID setup to clock drift and fstrim disk scheduling.

Cassandra uses several separate caches (key cache, row cache, on/off heap, and Linux OS cache) that require an operator to analyze and correctly size, a manual process that will never be able to keep up with users’ dynamic workloads. Scylla eliminates competing caches with a unified cache system that automatically tunes itself. There is no need to for external caches, either.

 

Scylla adopts open standards and allows you to use the best open source tools. Scylla’s metrics are based on Prometheus for collection and Grafana dashboards for presentation. Scylla contributed to Wireshark to add support for CQL and also for its internal RPC for better traceability. Scylla uses systemd and automatically configures Linux on your behalf.

Scylla allows for tables to have global secondary indexes, not just locally on a node. In Cassandra, only local indexes are supported which aren’t scalable. With Scylla you can query your cluster more and have a richer data model.

Scylla employs the best open source experts and has a legacy of consistent open source contributions. We are committed to open source and, unlike DataStax, our commitment doesn’t change. Seastar, Scylla’s core engine is used by the Ceph storage engine and many others. GoCQLX driver was developed by the Scylla Manager team. Scylla has enhanced the Linux XFS in order to make it more asynchronous. We contributed kernel code for system call efficiency and have made numerous other contributions.

Starting with Scylla 4.0, node operations such as streaming and decommission are based on repair algorithms under the hood. It allows you to pause or restart them while going back to the same position before the restart. It saves a lot of time just when you need it the most.

Scylla allows you to track and stream table changes in a consistent and easy manner. Change data is stored in Scylla as a table that developers can query like any other table. The data is consistent across the replica set and can provide the previous version of the data changed. It surpasses Cassandra’s CDC in terms of ease of use and functionality.

Scylla allows you to run on fewer, larger nodes. With DataStax, when the time comes to scale your deployment, it will take at least 10x longer to expand your cluster since Cassandra/DataStax can add only one node at a time. That’s too long to react to changes, so you’re forced to over provision.

Scylla allows you more choices with more compatible APIs. You are free to choose among multiple DB APIs and at any point change your physical deployment or even your database vendor and protocol. Our DynamoDB API is now GA.

Heat-weighted load balancing effectively performs rolling node upgrades and reboots by allowing cold nodes to slowly ramp up into requests as its cache is being populated.

Scylla can linearly scale-up performance, even on the largest machines you throw at it, such as the AWS i3en24.xlarge with 60TB of storage. It takes the same amount of time to compact or stream as a small i3en.xl. Cassandra has issues with nodes larger than 2TB and for DataStax a “meganode” has up to 5TB. The JVM cannot scale!

Scylla’s incremental compaction strategy (ICS) enhances the existing STCS strategy by dividing the SStables to increments and thus eliminating the typical requirement of 50% free space in your drive. ICS reduces your total storage by 37%.

Scylla University Mascot

Scylla University

Get started on the path to Scylla expertise.

Live Test CTA

Live Test

Spin up a 3-node Scylla cluster to see our light-speed performance

Virtual Workshop

Interactive sessions with our NoSQL solution architects.