See all blog posts

ScyllaDB Open Source Release 2.1.4

ScyllaDB Release

Today we released ScyllaDB Open Source 2.1.4, a bugfix release of the ScyllaDB 2.1 stable branch. Release 2.1.4, like all past and future 2.x.y releases, is backward compatible and supports rolling upgrades.

Critical Patch

ScyllaDB 2.1.4 fixes a possible data loss when using Leveled Compaction Strategy #3513. The issue causes ScyllaDB to miss a small fraction of data in a full table scan. This was originally observed in decommission (which performs a full table scan internally), where some data (<1% in a test) was not streamed.

In addition to a full scan query, scans are used internally as part of compaction and streaming, including decommission, adding a node, and repairs. Our investigation into the matter concluded that ScyllaDB can cause data loss while running any of these actions.

The issue is limited to tables using LCS and does not affect tables using other compaction strategies. If you are using LCS, you should upgrade to ScyllaDB 2.1.4 ASAP. 

Action to Take

The problem may be mitigated by restoring backups of the relevant table. If you are using LCS and have relevant backups, please contact our support team for additional information on how to run the restore procedure.

How This Happened

We take data integrity very seriously and are investigating why this issue was not identified earlier. Our initial findings are that a low-level optimization around disjoint SSTable merging introduced the bug in the 2.1 release. It surfaced only in our 2.2 testing since it happened very rarely with 2.1 based code. The ScyllaDB cluster test suite did detect the issue, however, meeting quorum persistence papered over it together with the test suite itself – one of the roles of this suite is to run disruptors (corruption emulation, node and data center failures) against the cluster and to trigger corruptions and repairs. The bug was not identified since the test suite incorrectly concluded that it is part of the disruptor activity of the suite. We are now working to improve the cluster test suite’s ability to detect errors.

Please contact us with any questions or concerns. We will publish a full root cause analysis report as soon as possible and disclose enhancements to prevent such a case in the future.

Related Links

Get ScyllaDB 2.1.4 – Docker, binary packages. AMI will be published soon.
Get started with ScyllaDB 2.1
Upgrade from 2.1.x to 2.1.y
Please let us know if you encounter any problems.

Additional bugs fixed in this release

  • ScyllaDB AMI error: “systemd: Unknown lvalue ‘Ambient / Unknown lvalue ‘AmbientCapabilities’ “ Issue is solved by moving to a new CentOS 7.4.1708 base image #3184
  • Upgrading to latest version of RHEL kernel causes ScyllaDB to lose access to the RAID 0 data directory #3437 (detailed notice has been sent to all relevant customers)
  • Wrong Commit log error handling may cause a core dump #3440
    Closing a secure connection (TLS) may cause a core dump #3459
  • When using TLS for interconnect connections, shutting down a node generates errors:”on system_error (error system:32, Broken pipe) other nodes” #3461
  • Ec2MultiRegionSnitch does not (always) honor or prefer the local DC, which results with redundant requests to remote DC #3454

Next Steps

  • Learn more about ScyllaDB from our product page.
  • See what our users are saying about ScyllaDB.
  • Download ScyllaDB. Check out our download page to run ScyllaDB on AWS, install it locally in a Virtual Machine, or run it in Docker.

About Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.