The data model in Scylla and Apache Cassandra partitions data between cluster nodes using a partition key, which is defined by the database schema. Using a partition key provides an efficient way to look up rows using the partition key because you can find the node that owns the row by hashing the partition key. Unfortunately, this also means that finding a row using a non-partition key requires a full table scan which is inefficient. Secondary Indexes are a mechanism in Apache Cassandra that allows efficient searches on non-partition keys by creating an index.
When most server application developers think of I/O, they consider network I/O since most resources these days are accessed over the network: databases, object storage, and other microservices. The developer of a database, however, also has to consider file I/O. This article describes the available choices and their tradeoffs and why Scylla chose asynchronous direct I/O (AIO/DIO) as its access method.
A fast in-memory database provides benefits that we all can appreciate such as optimal latency and throughput for workloads. What if you could utilize extremely fast NVMe drives to have similar latency and throughput results? The scope of this blog post is to examine the outcomes of using an in-memory like database combined with NVMe drives to resolve cold-cache and data persistence challenges. In this experiment, various testing scenarios were done with Scylla and Intel® Optane™ SSD DC P4800X drives with a goal of providing a solution with the performance of an in-memory like database without compromises on throughput, latency, […]
Scylla 2.0’s New Feature in-depth: Heat Weighted Load Balancing With time, a Scylla cluster adapts to an application’s behavior. Given a steady read-mostly workload, after an initial warm-up period, all nodes will have their caches populated with a working set, and the workload will see a certain cache hit rate and enjoy a certain performance level (throughput and latency).
For a long time, permanent storage has been the bottleneck in most computer systems. Scylla operates under that assumption and includes a fully-featured userspace disk I/O Scheduler that is used to guarantee that different tasks in the database get their fair share of the disk. The I/O Scheduler is the central component at the heart of Scylla’s workload conditioning promise: to automatically adjust the database’s distribution of requests to adapt to the incoming workload. It is capable of providing Quality-of-Service (QoS) among the various tasks in the database and isolating them from each other. Since database systems tend to be […]
Raphael S. Carvalho is a computer programmer here at ScyllaDB who loves open source software and kernel programming. He worked on the Syslinux project to bring new file system support and also worked on MultiFS to allow multiple file systems to co-exist. For his Scylla work, he has been mostly working on SSTable compaction handling and recently developed the support for the Time Window Compaction Strategy on Scylla. This strategy is a considerably better alternative to the DateTieredCompactionStrategy. Raphael has a passion for making products and solutions better with his programming experience. You can learn more about Raphael in his […]
In mid-2015, Intel and Micron jointly unveiled a new kind of non-volatile memory storage device named 3D XPoint (pronounced “cross-point”) that is 1000x faster than NAND. Now that 3D XPoint is generally available and has hit the broad market, we can start testing it. 3D XPoint uses electrical resistance and is considered to be bit addressable. It’s also worth mentioning that the endurance is much better with 3D XPoint because the stated wear leveling is 30 full drive writes per day for 5 years. 3D XPoint developers indicate that it is based on changes in resistance of the bulk material. […]
A database like Scylla can be limited by the network, disk I/O or the processor. Which one it is often dynamic and depends on both the hardware configuration and the workload. The only way of dealing with that is to attempt to achieve good throughput and low latency regardless of what is the bottleneck. There are many things that can be done in each of these cases that range from high-level changes in the algorithms to very low-level tweaks. In this post, I am going to take a closer look at fairly recent changes to Scylla which improved the performance […]
Counters are a special type of column that allows its value to only be incremented, decremented, read or deleted. Updates to counters are atomic, which makes them a perfect solution for counting—something that is otherwise difficult to do efficiently.
Introduction The most common operations with ScyllaDB are inserting, updating, and retrieving rows within a single partition: each operation specifies a single partition key, and the operation applies to that partition. While less commonly used, reads of all partitions, also known as full table scans are also useful, often in the context of data analytics. This post describes how to efficiently perform full table scans with ScyllaDB 1.6 and above.
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.