By default, Scylla SSTables will be compressed when they are written to disk. As mandated by the file format, data is compressed in chunks of a certain size – 4kB if not explicitly set. The size of the chunk is one of the parameters for the compression property to be set at table creation. Chunk-based compression presents trade-offs that users may not be aware of. In this post, I will try to explore what those trade-offs are and how to set them correctly for maximum benefit. As trade-offs imply different results for different loads, we will focus on single-partition read […]
Amazon recently unveiled a new class of machines—the AWS i3 family. Targeted at I/O intensive applications and featuring up to 15TB of fast storage, these machines offer unprecedented power with a great balance between I/O and CPU. At a lower price than the previous i2 family, we expect the i3 family to become the default class for NoSQL workloads. This article will cover i3 instances and provide information about the status of Scylla support for the hardware. Although we don’t yet officially provide i3 AMIs, customers are already running them in production with positive results. Scylla’s native architecture takes advantage […]
What to expect from Scylla’s performance on low-end hardware Scylla is a reimplementation of Apache Cassandra that has been demonstrated by us and third parties to perform up to 10x better than Apache Cassandra. These performance advantages stem from Scylla’s modern hardware-friendly and ultra-scalable architecture. As a result, Scylla’s performance grows as the hardware size grows. Scaling both up and out offers many advantages: from simplified cluster management to access to generally better hardware and economies of scale. We will address those choices in detail in an upcoming blog post. However, many users have compelling reasons to stay on low-end hardware, […]
What is Workload Conditioning? What is the best request rate I should throw at my cluster? What disk bandwidth should I make available for compactions? How many reader or writer threads should I have? What are the best size for my memtables?
ScyllaDB strives to offer its users predictable low latencies. However, in real life, things do not always go according to plan, and sometimes predictable low latencies become unpredictable big latencies. When that happens, it’s time to go into detective mode and go figure out what’s going on.
Last month we gave a talk at Scylla Summit that described the caveats and best practices for monitoring a live Scylla cluster. Once the cluster is ready to serve your requests, you will need to monitor it to understand its performance characteristics, its overall health, and should anything go wrong, understand what was it was that upset the cluster’s behavior.
This is the second and last part of this article. If you haven’t read the first part, you can do it here. In this part, we will look at the design of the Seastar I/O Scheduler that Scylla uses to manage its disk I/O and discuss how it can be used to not only provide predictable latencies as we saw in our previous installment, but to guarantee fairness and proper balancing among different actors.
In a datastore like Scylla, there are many actors competing for disk I/O. Examples of such actors are data writers (in Scylla’s parlance they can be either memtable or commitlog writers), and a disk reader fetching the data to serve a cache miss. To illustrate the role that competition plays, if we are just issuing disk I/O without resorting to any fairness or balancing consideration, a reader, for instance, could find itself behind a storm of writes. By the time it has the opportunity to run, all that wait would have translated into increased latency for the read.
Amazon EC2 is a virtual computer store with all sizes and types of server on display. We researched the top choices to find the best balanced, best-performing server for NoSQL.