Scylla’s March 2019 webinar on database migration drew broad interest and will likely remain a popular topic for years to come. So, you’ve decided to adopt Scylla (or Cassandra). What’s the best way to get your Big Data uploaded into your new cluster? What strategies, tools and techniques can you use to get your terabytes or petabytes from point A to point B? Those were the questions of the day for Dan Yasny, Field Engineer of ScyllaDB.
Another week, another Spark and Scylla post! This time, we’re back again with the Scylla Spark Migrator; we’ll take a short tour through its innards to see how it is implemented. Read why we implemented the Scylla Spark Migrator in this blog. Overview When developing the Migrator, we had several design goals in mind. First, the Migrator should be highly efficient in terms of resource usage. Resource efficiency in the land of Spark applications usually translates to avoiding data shuffles between nodes. Data shuffles are destructive to Spark’s performance, as they incur more I/O costs. Moreover, shuffles usually get slower […]
We covered the basics of Elasticsearch and how Scylla is a perfect complement for it in part one of this blog. Today we want to give you specific how-tos on connecting Scylla and Elasticsearch, including use cases and sample code. Use Case #1 If combining a persistent, highly available datastore with full text search engine is a market requirement, then implementing a single, integrated solution is an ultimate goal that requires time and resources. To answer this challenge we describe below a way for users to use best-of-breed solutions that support full text search workloads. We chose Elasticsearch open source together with […]
Derek Ramsey, Software Engineering Manager at Sensaphone, gave an overview of ValuStor at Scylla Summit 2018. Sensaphone is a maker of remote monitoring solutions for the Industrial Internet of Things (IIoT). Their products are designed to watch over your physical plant and equipment — such as HVAC systems, oil and gas infrastructure, livestock facilities, greenhouses, food, beverage and medical cold storage. Yet there is a lot of software behind the hardware of IIoT. ValuStor is an example of ostensible “hardware guys” teaching the software guys a thing or two. Overview and Origins of ValuStor Derek began his Scylla Summit talk […]
Welcome to a whole new chapter in our Spark and Scylla series! This post will introduce the Scylla Migrator project – a Spark-based application that will easily and efficiently migrate existing Cassandra tables into Scylla. Over the last few years, ScyllaDB has helped many customers migrate from existing Cassandra installations to a Scylla deployment. The migration approach is detailed in this document. Briefly, the process is comprised of several phases: Create an identical schema in Scylla to hold the data; Configure the application to perform dual writes; Snapshot the historical data from Cassandra and load it into Scylla; Configure the […]
The Internet is not just connecting people around the world. Through the Internet of Things (IoT), it is also connecting humans to the machines all around us and directly connecting machines to other machines. In this blog post we’ll share an emerging machine-to-machine (M2M) architecture pattern in which MQTT, Apache Kafka and Scylla all work together to provide an end-to-end IoT solution. We’ll also provide demo code so you can try it out for yourself. IoT Scale IoT is a fast-growing market, already known to be over $1.2 trillion in 2017 and anticipated to grow to over $6.5 trillion […]
Full text search is required in many human-facing applications, such as where users need to interact with a datastore to retrieve and insert data based on partial, wildcard information, spell correction and autocompletion. Additional benefits of full text search is the ability to retrieve multiple results sorted by their relevance. Lucene, the common parent to Solr and Elasticsearch The most popular textual search engine in the market is Lucene. It is used by Solr, Elasticsearch, Lucidworks and other text search tools. Lucene is a great search engine. It is extremely fast, stable, and you probably can’t get much better than […]
In part 2 of our Scylla and Spark series, we will delve more deeply into the way data transformations are executed by Spark, and then move on to the higher-level SQL and DataFrame interfaces.
This article will shed light on the performance penalties of running Scylla on Docker and discuss the tuning steps to solve them.
Spark and Scylla Welcome to part 1 of an in-depth series of posts revolving around the integration of Spark and Scylla. In this series, we will delve into many aspects of a Spark and Scylla solution: from the architectures and data models of the two products, through strategies to transfer data between them and up to optimization techniques and operational best practices. The series will include many code samples which you are encouraged to run locally, modify and tinker with. The Github repo contains the docker-compose.yaml file which you can use to easily run everything locally. In this post, we […]