A look at 5 “hidden gem” ScyllaDB capabilities that help ScyllaDB power users.
Technology typically evolves at a faster pace than what a normal person (like me) is able to track. And just as the specification of programming languages evolves, so does your database.
Skimming through release notes to keep track of new features and changes – plus understanding all the reasoning around them and how they might benefit your specific workloads – becomes a very challenging and time-consuming task, especially when your team is working with a wide range of distinct products.
Like most open source projects, ScyllaDB has a very fast development cycle. New features are introduced on a monthly basis, and they can easily get overlooked. Quite often, users are impressed when we introduce them to this cool “newer” feature that was introduced 2 years ago, and they realize how they can readily benefit from it! 🙂
In this post, let’s discuss 5 features that you might have overlooked and may be relevant to your workloads.
Workload Prioritization
Workload Prioritization is a ScyllaDB Enterprise feature for workload consolidation. Traditionally, running several workloads concurrently is often a challenging task. Each workload competes for resources, resulting in overall (and often unacceptable) elevated latencies.
Before Workload Prioritization, one of the approaches to run distinct workload types would typically involve logically or physically isolating each workload entirely – for example, having one datacenter for real-time traffic, and another for batch processing. This approach is sub-optimal, as it directly impacts costs. Other approaches would be running workloads that might impact latency during low-traffic peak periods (if any exist) or dropping the use case altogether.
We knew there could be a better way, and that’s why we introduced workload prioritization 4 years ago. The feature, which is built on top of ScyllaDB isolation mechanisms, allows you to define separate Service levels with a different percentage of resources to be allocated for each. That helps the database determine what to prioritize in the event of system contention. It also allows you to define different priorities for a single workload. Several customers define different levels for their reads as opposed to their writes, according to each one’s SLA requirements.
More recently, we introduced Workload Attributes into our Open Source version. This allows you to define different workload types (interactive or batch) and different timeouts for each type. Once these types are defined, the database has a better idea of how to handle the concurrency of each – and it will use this knowledge to operate accordingly. Note, however, that you cannot explicitly specify how you want the database to prioritize workloads, as you can with the enterprise-only workload prioritization feature.
Ever since its introduction, several teams have managed to reduce costs and consolidate their workloads on top of ScyllaDB. If you would like to learn more about it, see Eliran Sinvani’s talk on Workload Specific Optimizations.
Heat-Weighted Load Balancing
Heat-weighted load balancing (HWLB) is a feature that allows ScyllaDB to load balance traffic among replicas when nodes have a cold cache, and it is a key functionality to ensure that latencies are always in check. It’s not something you need to enable or interact with; it just works quietly in the background across both Open Source and Enterprise releases.
HWLB was introduced in ScyllaDB 2.0 and – as it has “always been there” – most users don’t know about its internals! Consider the situation in which a node loses its cache, such as during rolling upgrades, or any configuration or underlying infrastructure change followed by a service restart. After the node comes back up serving requests, it will start with a cold cache. If the database naively tried to load balance queries to all existing replicas, the “colder” replica would naturally take longer than other replicas to answer, thus compromising the workload latencies.
However, avoiding sending requests altogether to cold nodes would introduce another problem: it could take a considerable amount of time for the node’s cache to become hot. That’s exactly the problem that HWLB solves: it prevents coordinators from naively routing requests to the colder replica, thus avoiding hurting latencies and, instead, it balances a fraction of these requests to the replica in order to speed up its cache warmup.
Under the hood, Gossip propagates its cache hit rate information when exchanging data with other nodes. When other nodes are serving requests as coordinators, they use this data to determine the number of requests to propagate to the cold node.
If you have ever wondered why rebooting individual nodes doesn’t affect the cluster’s read latency, then attribute that peace of mind to Heat-weighted load balancing!
Per Shard Concurrency Limit
Traffic spikes are a common situation seen in real-time applications. A surge of requests is all it takes for an application to overwhelm the database beyond what it is able to sustain.
Whenever the database receives a burst of requests, it is natural that this burst will queue up, leading to increased latencies. In extreme circumstances, it could even cause upstream service interruption.
ScyllaDB users’ workloads can be very dynamic, and the definition of what’s an acceptable p99 latency for one may not be the same for another. Therefore, by default we don’t enforce limits in any way unless, of course, we need to preserve the internal database’s stability.
Per-shard concurrency limiting was introduced in ScyllaDB 4.3 via the max_concurrent_requests_per_shard option. As a live updatable setting, it allows you to dynamically play with different values during the database execution, thus helping you to find the sweet spot according to your SLAs. Once a value is set, a coordinator node will automatically start to shed outstanding requests past the configured threshold. This effectively notifies the application to throttle down and, most importantly, ensures that your latencies remain under control.
This option can be very useful for protecting your database latencies against sudden and often unpredictable spikes. Although the recommended approach is typically to throttle requests on the userspace side of things, making use of this function fully guarantees that your database latencies will always be in check!
Per Partition Rate Limit
But what if you have a specific partition, in specific tables, that’s subject to spikes (such as spam attacks, or when an item goes viral and is subject to increased access rates)? Enter Per partition rate limit!
While the per-shard-concurrency limiting overload protection mechanism has a global scope per shard, per partition limiting applies to the table level. Another difference is that the former is a concurrency limiter (outstanding requests), whereas the latter is a rate limiter (requests per second).
The per partition rate limit setting is very flexible. It supports different settings for different tables and allows you to specify the number of reads and writes allowed per item per second with a simple ALTER TABLE command. Then, when clients exceed the configured limits, ScyllaDB will reject requests with a “Rate limit exceeded” error.
The feature also allows you to enable billing and metering use cases fairly easily when running on top of ScyllaDB, and ensure that users are not allowed to run past their specific quotas! We introduced the partition level rate limit feature in ScyllaDB 5.1 this year. Goodbye spammers!
If you would like to learn more about this capability, see Piotr Dulikowski’s blog on Retaining Database Goodput Under Stress with Per-Partition Query Rate Limiting.
Bypass Cache
Did you ever find yourself in a situation where you needed to run a full table scan on your data? For example, to pull long reports, count the total number of records, or perhaps simply query for data outside of your defined PRIMARY KEY restrictions?
These examples are fairly common use cases that one would expect to run from within a database. However, there’s a problem: As the database reads data from disk and starts to promote entries to its internal cache, the existing cache contents (which are likely needed by the application) would get evicted! The end result? An aggressive cache eviction cycle ready to undermine latencies.
BYPASS CACHE is a ScyllaDB extension to the CQL protocol. It was introduced in ScyllaDB 3.1 to inform the database that the data being read is unlikely to be read again in the future. Therefore, ScyllaDB won’t make any effort to populate the cache with its contents, thus preserving the existing cache entries and keeping your latencies under control.
It is generally a good idea to remember to append the clause to your queries whenever you are scanning bulk amounts of data. Another good strategy is to use it whenever you rely on a potentially expensive ALLOW FILTERING statement or range scans.
BYPASS CACHE is also extremely useful when used along with tracing and during benchmarking. It allows you to determine the read latency without considering ScyllaDB’s cache, which may be extremely useful in order for you to predict the “worst case scenario” for your workload.