See all blog posts

ScyllaDB Enterprise Release 2018.1.3

scylla release

Today we released ScyllaDB Enterprise 2018.1.3, a production-ready ScyllaDB Enterprise minor release. ScyllaDB Enterprise 2018.1.3 is a bug fix release for the 2018.1 branch, the latest stable branch of our real-time enterprise NoSQL solution.

More about ScyllaDB Enterprise here.

Critical Patch

ScyllaDB Enterprise 2018.1.3 fixes possible data loss when using Leveled Compaction Strategy.  The issue causes ScyllaDB to miss a small fraction of data in a full table scan. This was originally observed in decommission (which performs a full table scan internally), where some data (<1% in a test) was not streamed.

In addition to full scan query, scans are used internally as part of compaction and streaming, including decommissioning, adding a node, and repairs. Our investigation into the matter concluded that ScyllaDB can cause data loss while running any of these actions. The issue is limited to tables using LCS and does not affect tables using other compaction strategies.

If you are using LCS, you should upgrade to ScyllaDB Enterprise 2018.1.3 ASAP.

Action to Take

The problem may be mitigated by restoring backups of the relevant table. If you are using LCS and have relevant backups, please contact our support team for additional information on how to run the restore procedure.

How This Happened

We take data integrity very seriously and are investigating why this issue was not identified earlier. Our initial findings are that a low-level optimization around disjoint SSTable merging introduced the bug in the 2.1 release. It surfaced only in our 2.2 testing since it happened very rarely with 2.1 based code. The ScyllaDB cluster test suite did detect the issue, however, meeting quorum persistence papered over it together with the test suite itself – one of the roles of this suite is to run disruptors (corruption emulation, node and data center failures) against the cluster and to trigger corruptions and repairs. The bug was not identified since the test suite incorrectly concluded that it is part of the disruptor activity of the suite. We are now working to improve the cluster test suite’s ability to detect errors.

Please contact us with any questions or concerns. We will publish a full root cause analysis report as soon as possible and disclose enhancements to prevent such a case in the future.

Related Links

ScyllaDB Enterprise customers are encouraged to upgrade to ScyllaDB Enterprise 2018.1.3 in coordination with the ScyllaDB support team.

Additional Issues Solved in This Release (with Open Source Issue Reference When Applicable)

  • Additional issue solved in this release (with open source issue reference when one exist)
  • Ec2MultiRegionSnitch does not (always) honor prefer the local DC, which result with redundant requests to remote DC #3454
  • When using TLS for interconnect connections, shutting down a node generates errors on system_error (error system:32, Broken pipe) other nodes #3461

Next Steps

  • Learn more about ScyllaDB from our product page.
  • See what our users are saying about ScyllaDB.
  • Download ScyllaDB. Check out our download page to run ScyllaDB on AWS, install it locally in a Virtual Machine, or run it in Docker.

About Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.