When it comes to monitoring a distributed platform, there are many ways to go about it. There are many tools and programs used to accomplish different monitoring tasks ranging from the operating system or disk statistics to database performance metrics. What we found is that users need an all-in-one document that covers these topics so they can reference it later when needed without having to dig through multiple documents.
Gocqlx is an extension to the Go Scylla / Apache Cassandra driver Gocql. It aims to boost developer productivity while not sacrificing query performance. It’s inspired by Sqlx, a tool for working with SQL databases, but it goes beyond what Sqlx provides. For this blog post, we will pretend we’re a microblogging service and use the following schema: Gocql is a very popular Cassandra driver for the Go programming language. Usually working with it looks more or less like this (source: Gocql README): At first glance, it looks ok but there are some problems: Gocql does not provide you with […]
The combination of a database and full-text search analytics becomes unavoidable these days. In this blog post, I will demonstrate a simple way to analyze data from a database with analytics software by using Scylla and Elasticsearch together to perform a simple data mining exercise that gathers data from Twitter. This demonstration will use a series of Docker containers that will run a Scylla and Elasticsearch cluster and a Node.js app that will feed data from Twitter into both platforms. This demo can be run on a laptop or production Docker server. To get started, let’s go over the […]
By default, Scylla SSTables will be compressed when they are written to disk. As mandated by the file format, data is compressed in chunks of a certain size – 4kB if not explicitly set. The size of the chunk is one of the parameters for the compression property to be set at table creation. Chunk-based compression presents trade-offs that users may not be aware of. In this post, I will try to explore what those trade-offs are and how to set them correctly for maximum benefit. As trade-offs imply different results for different loads, we will focus on single-partition read […]
The developers of Scylla are working hard so that Scylla will not only have unparalleled performance (see our benchmarks) and reliability, but also have the features that our users want or expect for compatibility with the latest version of Apache Cassandra. The latest of these new features is Materialized Views, which will be an experimental feature in the upcoming Scylla release 2.0. Because this feature is experimental, users are invited to try it in non-production environments. The initial implementation has limitations which are discussed at the end of this blog and will be addressed in later versions of Scylla. The […]
Let’s talk about a financial use case where streaming and near-real-time messaging is used through Kafka and Scylla. We will model a system that allows subscribers to follow stock prices for companies of their interest, similar to a simplified use of a trading terminal. Our system follows an architectural pattern in which updates of stock prices are pushed to a Kafka queue, and subscribers consume messages that contain company stock information. These consumed messages are then stored in Scylla instances, where they can be used later for more sophisticated analysis (for example, using an engine like Spark).
Introduction A highly available time-series solution requires an efficient tailored front-end framework and a backend database with a fast ingestion rate. KairosDB provides a simple and reliable way to ingest and retrieve sensors’ information or metrics, while Scylla provides a highly reliable and performant backend database that scales indefinitely, and can store large quantities of time-series data.
In this article, we will demonstrate how to use Spark Scala API with Scylla to get instant results. We will demonstrate how to extract the average arrival/departure delays of flights or cancellations during one year from the public dataset of RITA; namely, the average arrival delay, the average departure delay, the average departure/arrival delay, and flight cancellation for each air carrier.
A parallel full table scan is faster! By running a traditional serial full table scan on 475 million partitions (screenshot 2) from one client with a single connection per node, Scylla achieves only 42,110 rows per second. However, by using an efficient, parallel full table scan (screenshot 1), Scylla single client scans 475 million partitions in 510,752 rows per second rate—12x faster!
Introduction The most common operations with ScyllaDB are inserting, updating, and retrieving rows within a single partition: each operation specifies a single partition key, and the operation applies to that partition. While less commonly used, reads of all partitions, also known as full table scans are also useful, often in the context of data analytics. This post describes how to efficiently perform full table scans with ScyllaDB 1.6 and above.
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.