Scylla Blog

Stay up to date with recent news and updates on our Users Blog, and get under the hood on our Developers Blog.

Apr2

Spark, File Transfer, and More: Strategies for Migrating Data to and from a Cassandra or Scylla Cluster

Migration Methods

Scylla’s March 2019 webinar on database migration drew broad interest and will likely remain a popular topic for years to come. So, you’ve decided to adopt Scylla (or Cassandra). What’s the best way to get your Big Data uploaded into your new cluster? What strategies, tools and techniques can you use to get your terabytes or petabytes from point A to point B? Those were the questions of the day for Dan Yasny, Field Engineer of ScyllaDB.

Read full article

Mar12

Deep Dive into the Scylla Spark Migrator

Scylla and Spark

Another week, another Spark and Scylla post! This time, we’re back again with the Scylla Spark Migrator; we’ll take a short tour through its innards to see how it is implemented. Read why we implemented the Scylla Spark Migrator in this blog. Overview When developing the Migrator, we had several design goals in mind. First, the Migrator should be highly efficient in terms of resource usage. Resource efficiency in the land of Spark applications usually translates to avoiding data shuffles between nodes. Data shuffles are destructive to Spark’s performance, as they incur more I/O costs. Moreover, shuffles usually get slower […]

Read full article

Mar7

Scylla and Elasticsearch, Part Two: Practical Examples to Support Full-Text Search Workloads

Scylla and Elasticsearch

We covered the basics of Elasticsearch and how Scylla is a perfect complement for it in part one of this blog. Today we want to give you specific how-tos on connecting Scylla and Elasticsearch, including use cases and sample code. Use Case #1 If combining a persistent, highly available datastore with full text search engine is a market requirement, then implementing a single, integrated solution is an ultimate goal that requires time and resources. To answer this challenge we describe below a way for users to use best-of-breed solutions that support full text search workloads. We chose Elasticsearch open source together with […]

Read full article

Feb14

The Complex Path for a Simple Portable Python Interpreter, or Snakes on a Data Plane

Snakes on a Data Plane

We needed a Python interpreter that can be shipped everywhere. You won’t believe what happened next! “When I said I wanted portable Python, this is NOT what I meant!” In theory, Python is a portable language. You can write your script locally and distribute it to other machines with the Python interpreter. In practice, things can go wrong for a variety of reasons. The first and simpler problem is the module system: for a script to run, all of the modules it uses must be installed. For Python-savvy users, installing them is not a problem. But for a software vendor […]

Read full article

Feb7

Moving from Cassandra to Scylla via Apache Spark: The Scylla Migrator

Scylla and Spark

Welcome to a whole new chapter in our Spark and Scylla series! This post will introduce the Scylla Migrator project – a Spark-based application that will easily and efficiently migrate existing Cassandra tables into Scylla. Over the last few years, ScyllaDB has helped many customers migrate from existing Cassandra installations to a Scylla deployment. The migration approach is detailed in this document. Briefly, the process is comprised of several phases: Create an identical schema in Scylla to hold the data; Configure the application to perform dual writes; Snapshot the historical data from Cassandra and load it into Scylla; Configure the […]

Read full article

Dec19

Scylla and Confluent Integration for IoT Deployments

Worry-Free Ingestion: Flow Control of Writes in Scylla

  The Internet is not just connecting people around the world. Through the Internet of Things (IoT), it is also connecting humans to the machines all around us and directly connecting machines to other machines. In this blog post we’ll share an emerging machine-to-machine (M2M) architecture pattern in which MQTT, Apache Kafka and Scylla all work together to provide an end-to-end IoT solution. We’ll also provide demo code so you can try it out for yourself.   IoT Scale IoT is a fast-growing market, already known to be over $1.2 trillion in 2017 and anticipated to grow to over $6.5 trillion […]

Read full article

Dec4

Worry-Free Ingestion: Flow Control of Writes in Scylla

Worry-Free Ingestion: Flow Control of Writes in Scylla

This blog post is based on a talk I gave last month at the third annual Scylla Summit in San Francisco. It explains how Scylla ensures that ingestion of data proceeds as quickly as possible, but not quicker. It looks into the existing flow-control mechanism for tables without materialized views, and into the new mechanism for tables with materialized views, which is introduced in Scylla Open Source 3.0. Introduction In this post we look into ingestion of data into a Scylla cluster. What happens when we make a large volume of update (write) requests? We would like the ingestion to […]

Read full article

Nov13

Hooking up Spark and Scylla: Part 4

Spark Structured Streaming with Scylla Hello again! Following up on our previous post on saving data to Scylla, this time, we’ll discuss using Spark Structured Streaming with Scylla and see how streaming workloads can be written in to ScyllaDB. This is the fourth part of our four part series. Make sure you check out all the prior blogs! Our code samples repository for this post contains an example project along with a docker-compose.yaml file with the necessary infrastructure for running the it. We’re going to use the infrastructure to run the code samples throughout the post and run the project itself, […]

Read full article

Nov1

More Efficient Range Scan Paging with Scylla 3.0

More Efficient Range Scan Paging with Scylla 3.0

In a previous blog post we examined how Scylla’s paging works, explained the problems with it and introduced the new stateful paging in Scylla 2.2 that solves these problems for singular partition queries by making paging stateful. In this second blog post we are going to look into how stateful paging was extended to support range-scans as well in Scylla Open Source 3.0. We were able to increase the throughput of range scans by 30% and how we also significantly reduced the amount of data read from the disk by 39% and the amount of disk operations by 73%. A […]

Read full article

Oct8

Hooking up Spark and Scylla: Part 3

Spark and Scylla: Part 3

Spark and Scylla: Spark DataFrames in Scylla Welcome back! Last time, we discussed how Spark executes our queries and how Spark’s DataFrame and SQL APIs can be used to read data from Scylla. That concluded the querying data segment of the series; in this post, we will see how data from DataFrames can be written back to Scylla. As always, we have a code sample repository with a docker-compose.yaml file with all the necessary services we’ll need. After you’ve cloned it, start up the services with docker-compose: After that is done, launch the Spark shell as in the previous posts […]

Read full article

Subscribe to Our Blog

Popular Posts