ScyllaDB Vector Search powers real-time AI, recommendations, and RAG at scale. Learn more

Menu

Products
- Quick Links
  
  Why ScyllaDB?
  
  Close-to-the-metal architecture handles millions of OPS with predictable single-digit millisecond latencies.
  Learn More
  
  Is ScyllaDB right for me?
  
  ScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
  Learn More
- Products
- Related Technologies
Developers
Users
Resources
- Featured Resource
  
  ScyllaDB University
  
  Level up your skills with our free NoSQL database courses.
  Take a Course
  
  ScyllaDB Blog
  
  Our blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
  Read the Blog
- Resource Center
- Events
- Compare
Pricing
Contact Us
Contact Us
Get Started
Sign In
Search
Button Links

See all blog posts

Jepsen and ScyllaDB: Putting Consistency to the Test

By Kostja Osipov

December 23, 2020

Subscribe to Receive Blog Updates

Subscription Categories

All Posts
Events
Integrations
News
Performance
Releases
ScyllaDB
Seastar
Tutorials
User Stories
Developers Blog

At ScyllaDB, we take data consistency seriously. To ensure the correctness of our lightweight transaction implementation we developed a series of tests using ScyllaDB’s existing and new tools, such as ScyllaDB Cluster Test, Gemini and lightest. These tools evaluate data correctness in the face of acute failure scenarios: I/O errors, running out of memory, node crashes and network partitions. We published some of this work in our blog (which has a great explanation of the difference between our normal eventually consistent operations and lightweight transactions, if you need a refresher).

We have also run Jepsen tests in the past. We focused on regular operations since LWT wasn’t supported at the time. Jepsen is an open source project and a service run by Kyle “Aphyr” Kingsbury. Established in 2013, it has by now become the industry’s standard for distributed systems testing, with many SQL and NoSQL vendors using it as a checkmark that their consistency claims are production grade.

Having run our own tests, we had confidence we were going to pass the Jepsen test with flying colors. The result turned out to be surprising.

Finding consistency violations in databases is an NP-complete problem – the cost grows exponentially with the number of transactions. Sometimes it is outright impossible to do for an external observer. Multiple violations can interact and produce the right results, while masking the underlying problems. ScyllaDB tests create scenarios with data constraints and check these constraints after running a workload with fault injections. Jepsen takes a more holistic approach. It observes not only whether or not there was a consistency fault, but also, with a high degree of certainty, the steps that brought it about. One tool that proved invaluable in particular was Elle, which finds anomalies in linear time.

But Jepsen testing, as a process, goes beyond just its powerful toolset — Kyle’s personal contribution into the testing is invaluable. Where else can you find such an opinionated, independent expert and also the author of one of the most cited Cassandra evaluations? With Kyle, we had an objective set of eyes.

Kyle’s testing was so thorough it ended up going beyond testing lightweight transactions. ScyllaDB documentation for the eventual consistency model had grey areas in semantics of INSERTs (#7082), conflict resolution in case of duplicate timestamp (#7170), safe practices for topology changes, and the correct use of nodetool. All of these topics now have better documentation.

The vast majority of ScyllaDB (and Cassandra) use cases are absolutely satisfied with eventual consistency. Regular operations enjoy better availability and better performance, thus conflict-free replicated data types, last-write-wins and even lack of row isolation are the right compromises for these workloads. QUORUM writes ensure durability, while QUORUM reads guarantee the application never reads dirty data. Background repair processes are responsible for eventually weeding out conflicts. Lightweight transactions are still invaluable when it’s necessary to not just guarantee durability, but linearizability — a specific (strict) order of changes, conditioning a change on the state of the database.

Some nasty LWT consistency bugs were found, too. In one instance (#7116), a typo in the data checksum loop introduced in ScyllaDB 2.2 led to slight differences in data staying invisible to repair. LWT, as well as standard repair and streaming, depend greatly on the correctness of this code. Changing even one letter in the column name would make the bug disappear, meaning Kyle’s special talent for breaking systems had to play a role. Indeed, our own tests used null values in just the same way, but did not observe any errors just because the column holding nulls had a different name! This issue is fixed in ScyllaDB 4.2 and the fix is now backported to all ScyllaDB Enterprise versions.

Another ancient bug (#7611) was revealed in list append and prepend code, which historically generated its own timestamps for append/prepend values, ignoring Paxos timestamps. Using the Paxos timestamps was essential to preserve linearizability of list operations. And while the code was and still is working this way in Cassandra, that’s not good enough for us. The ScyllaDB fix will appear in an upcoming release.

Truthful to the eventual consistency philosophy, ScyllaDB does not run a strong coordination during topology changes. If you know what you’re doing, you can repair a cluster split across multiple availability zones, adding and removing nodes at will. This also allows you to shoot oneself in the foot.

The problem that manifested itself in the test was that Jepsen could start a new topology change after the previous one had failed – and by failure I also mean an operation timeout. Network partitioning and/or node failures, accompanied by a still ongoing node removal and a fresh node addition confused Paxos’s idea of what replicas constitute a quorum.

Despite our objections that this is a wrong way to operate ScyllaDB, Kyle was nonplussed. There are circumstances in which even a human operator could not know if the previous operation has ended. Besides, the rules were never explicit in the docs.

ScyllaDB guarantees that LWT operations will be strictly serializable during a single membership change. We refreshed the standard operational procedures with a new documentation page. If multiple changes are concurrent, the cluster is vulnerable to operator errors – a problem that is, fortunately, less important if one uses ScyllaDB Cloud (NoSQL DBaaS).

Jepsen testing spurred us to redouble our efforts to switch topology changes from eventual to strong consistency (#1247, #1207). With topology changes based on Raft, ScyllaDB will be able to add and remove multiple nodes at a time, as well as detect and reject unsafe topology transitions requested by a DBA. I’ll talk more about it at our upcoming ScyllaDB Summit.

Our confidence in LWT quality had some basis – the testing found no issues in the trickiest domain of lightweight transactions implementation, the Paxos algorithm. With Jepsen we learned yet again that database quality has to be all-round, and maturity is achieved through testing across many product features and their interactions, rather than focusing on a single piece.

To conclude, let us restate ScyllaDB’s commitment to ultimate product quality – specifically, to fixing all of the consistency issues found by Jepsen testing, and further improving the product and documentation to avoid usability traps. As of today, all the found issues have been resolved, and ScyllaDB documentation clarified. We run Jepsen on a regular basis to ensure no regressions in upcoming releases, and plan to revisit and extend the coverage once strongly consistent topology changes are in.

Thanks Kyle!

READ THE TEST RESULTS AT JEPSEN.IO

Learn More at ScyllaDB Summit

Both Kyle Kingsbury and Konstantin Osipov will be presenting at ScyllaDB Summit 2021, coming this January 12th – 14th, 2021. The event is free and online. Kyle will present his Jepsen test findings in depth, and Konstantin will explain our future plans for Raft. These are just two of many great sessions we have planned at our Summit. Make sure you register today!

REGISTER FOR SCYLLA SUMMIT 2021!

oreilly-book-ads

ScyllaDB Summit LWT Lightweight Transactions eventual consistency Jepsen ScyllaDB Summit 2021 ScyllaDB Summit speakers Kyle Kingsbury Aphyr

About Kostja Osipov

Software Team Leader. Kostja is a well-known expert in the DBMS world, spending most of his career developing open-source DBMS including Tarantool and MySQL. At ScyllaDB his focus is transaction support and synchronous replication.

View all posts by this author

Previous Post Next Post

Related Posts

ScyllaDB testing part 2: Extending Jepsen for testing ScyllaDB

Project Gemini: An Open Source Automated Random Testing Suite for ScyllaDB and Cassandra Clusters

Overheard at ScyllaDB Summit 2021, Day One

Inside ScyllaDB’s Continuous Optimizations for Reducing P99 Latency

Start scaling with the world's best high performance NoSQL database.

Product

Product Overview
Architecture
ScyllaDB Enterprise
ScyllaDB Cloud
Benchmarks
Solutions
Release Notes
Install
Tools
Pricing

Resources

Documentation
Online Training
NoSQL Guides
Resource Center
Events
Customer Support
Customer Portal
Community Projects
Blog

Company

About Us
Team
Customers
Partners
News
Media Kit
Careers
Contact Us

Social

GitHub
LinkedIn
Twitter
YouTube
Facebook
Forums
Slack

Legal

Terms of Service
Privacy Policy
Data Subject Request Form
Data Privacy Framework
CCPA Privacy Notice
Cookie Policy
Trust Center
Legal Center
Contact

2025 ©ScyllaDB | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.