This is the first post in a series of four about the different compaction strategies available in Scylla. The series will look at the good and the bad properties of each compaction strategy, and how to choose the best compaction strategy for your workload. This first post will focus on Scylla’s default compaction strategy, size-tiered compaction strategy.
Before organizations go into production with Scylla, they must ensure that they are getting the best possible performance so their applications and services will run optimally. One of the many ways to optimize your Scylla deployment is to choose the right compaction strategy. One of the more popular talks at Scylla Summit 2017 was on this subject. Based on that talk, I will explain what compaction is and then explore the different strategies available in Scylla.
What’s the deal with prepared statements? A query itself is just a string of text. For example: INSERT INTO tb (key,val) VALUES (“key”, “value”) In this simple example, we inserted two strings in a two-column table. Before that can happen, the CQL statement string (INSERT INTO…) needs to be sent to Scylla, parsed, and assuming no errors in the query, executed. It’s the parsing part that we are concerned with here. Parsing a CQL query is a compute-intensive operation that consumes resources just like anything else you would have a computer do. What if we could do the parsing part […]
At ScyllaDB, our development team is all about performance with improved latency and throughput. Our speakers at our recent Scylla Summit provided many tips and tricks to make Scylla’s superior latency and performance even better. ScyllaDB’s VP of R&D, Schlomi Livne, added to the growing repertoire of these tips with his talk Planning your queries for maximum performance. In it, he outlined some of the how and why of Scylla performance, and concluded with seven rules to optimize your queries.
— When data is written to Scylla, one or more replicas may become unresponsive or unreachable. The reasons for that may range from a heavy load on a particular replica node, network congestion, hardware issues, etc. As a result, the write to a replica will fail, usually with the timeout error. To restore the consistency of the data across all replicas, a user will have to run a repair, which is a very expensive—and usually long—procedure.
The Scylla Summit includes many technical sessions that aren’t about Scylla at all. Alex Gallego, a principal engineer in Akamai’s Platform Group, gave one such talk, SMF: The Fastest RPC in the West. First, a bit of background on Alex. He was the founder and CTO behind the Concord.io distributed stream processing engine. Much in the same way that Scylla addressed the Java-based performance issues in Cassandra, Concord.io chose to build in C++ to deliver a stream processor with better predictability, performance, isolation, multi-tenancy, supervision, and failure recovery. As Alex explains it, “During my time at Concord.io, we saw that […]
When most server application developers think of I/O, they consider network I/O since most resources these days are accessed over the network: databases, object storage, and other microservices. The developer of a database, however, also has to consider file I/O. This article describes the available choices and their tradeoffs and why Scylla chose asynchronous direct I/O (AIO/DIO) as its access method.
Scylla 2.0’s New Feature in-depth: Heat Weighted Load Balancing With time, a Scylla cluster adapts to an application’s behavior. Given a steady read-mostly workload, after an initial warm-up period, all nodes will have their caches populated with a working set, and the workload will see a certain cache hit rate and enjoy a certain performance level (throughput and latency).
Originally published on The New Stack on July 28th, 2017. Recently AWS unleashed a managed cache solution, Amazon DynamoDB Accelerator (DAX), in front of its database. This blog post will discuss the pros and cons of external database caches.
A database like Scylla can be limited by the network, disk I/O or the processor. Which one it is often dynamic and depends on both the hardware configuration and the workload. The only way of dealing with that is to attempt to achieve good throughput and low latency regardless of what is the bottleneck. There are many things that can be done in each of these cases that range from high-level changes in the algorithms to very low-level tweaks. In this post, I am going to take a closer look at fairly recent changes to Scylla which improved the performance […]
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.