InfoWorld editors conducted independent research into various information technology domains and recognized 12 leading technologies for their 2020 Technology of the Year awards. This award is a result of the hard work we have put into our database over the years.
This article presents the extensions done to the Scylla Migrator to also support data movement between an existing DynamoDB installation and Scylla.
As we prepare for Scylla Summit 2019, this is the first in a series of blogs highlighting this year’s featured presenters. Alexys Jacob, known to the developer community across social media, Github and Slack as @ultrabug, is the CTO of Numberly. A self-avowed Pythonista and staunch open source proponent, Alexys has long explored and expanded the frontiers of Big Data architecture and production systems.
Holden Karau is an open source developer advocate at Google. In her talk, Holden provided an overview of Spark, how it can fail and, based on those different failures, she outlined a number of strategies for how pipelines can be recovered.
With continued and growing interest in Apache Spark, we had two speakers present at Scylla Summit 2018 on the topic. This is the first of a two-part article, covering the talk by ScyllaDB’s Eyal Gutkind. The second part covers the talk by Google’s Holden Karau. With business demanding more actionable insight and ROI out of their big data, it is no surprise that analytics are a common workload on Scylla. Nor is it a surprise that Spark is a perennial favorite on the Scylla Summit agenda, and our annual gathering last year proved to be no exception. The focus was […]
“Scylla is the ideal database for IoT in the industry right now. Especially without the garbage collection that Cassandra has. Small footprint. It just does what you need it to do.” — Doug Stuns, GPS Insight Doug Stuns began his presentation at Scylla Summit 2018 by laying out his company’s goals. Founded in 2004, GPS Insight now tracks more than 140,000 vehicles and assets. The company collects a wide variety of data for every one of those vehicles. Battery levels, odometer readings, hard stops, acceleration, vehicle performance, emissions, and GPS data to determine route efficiency. By the time Doug was […]
Another week, another Spark and Scylla post! This time, we’re back again with the Scylla Spark Migrator; we’ll take a short tour through its innards to see how it is implemented. Read why we implemented the Scylla Spark Migrator in this blog. Overview When developing the Migrator, we had several design goals in mind. First, the Migrator should be highly efficient in terms of resource usage. Resource efficiency in the land of Spark applications usually translates to avoiding data shuffles between nodes. Data shuffles are destructive to Spark’s performance, as they incur more I/O costs. Moreover, shuffles usually get slower […]
Welcome to a whole new chapter in our Spark and Scylla series! This post will introduce the Scylla Migrator project – a Spark-based application that will easily and efficiently migrate existing Cassandra tables into Scylla. Over the last few years, ScyllaDB has helped many customers migrate from existing Cassandra installations to a Scylla deployment. The migration approach is detailed in this document. Briefly, the process is comprised of several phases: Create an identical schema in Scylla to hold the data; Configure the application to perform dual writes; Snapshot the historical data from Cassandra and load it into Scylla; Configure the […]
Spark Structured Streaming with Scylla Hello again! Following up on our previous post on saving data to Scylla, this time, we’ll discuss using Spark Structured Streaming with Scylla and see how streaming workloads can be written in to ScyllaDB. This is the fourth part of our four part series. Make sure you check out all the prior blogs! Our code samples repository for this post contains an example project along with a docker-compose.yaml file with the necessary infrastructure for running the it. We’re going to use the infrastructure to run the code samples throughout the post and run the project itself, […]
Spark and Scylla: Spark DataFrames in Scylla Welcome back! Last time, we discussed how Spark executes our queries and how Spark’s DataFrame and SQL APIs can be used to read data from Scylla. That concluded the querying data segment of the series; in this post, we will see how data from DataFrames can be written back to Scylla. As always, we have a code sample repository with a docker-compose.yaml file with all the necessary services we’ll need. After you’ve cloned it, start up the services with docker-compose: After that is done, launch the Spark shell as in the previous posts […]