In our latest Summer Tech Talks series webinar ScyllaDB Field Engineer Juliana Oliveira guided virtual attendees through a series of best practices on data modeling for Scylla.
Repair is one of several anti-entropy mechanisms in Scylla. It is used to synchronize data across replicas. In this post, we introduce a new repair algorithm coming with Scylla Open Source 3.1 that improves performance by operating at the row-level, rather than across entire partitions.
Holden Karau is an open source developer advocate at Google. In her talk, Holden provided an overview of Spark, how it can fail and, based on those different failures, she outlined a number of strategies for how pipelines can be recovered.
With continued and growing interest in Apache Spark, we had two speakers present at Scylla Summit 2018 on the topic. This is the first of a two-part article, covering the talk by ScyllaDB’s Eyal Gutkind. The second part covers the talk by Google’s Holden Karau. With business demanding more actionable insight and ROI out of their big data, it is no surprise that analytics are a common workload on Scylla. Nor is it a surprise that Spark is a perennial favorite on the Scylla Summit agenda, and our annual gathering last year proved to be no exception. The focus was […]
In this post we introduce the new Scylla workload prioritization mechanism, explaining the vision behind developing this feature and how it is implemented, and most importantly, we show you test results of how it performs in a real-world setting.
Anyone who’s tried to build such a solution knows that one of the chief difficulties is encompassing the sheer number and complexity of existing data sources. In order to deliver a true solution, we need to be able to bring this disparate data together. A graph data system, built with JanusGraph and backed by the power of Scylla, is a great fit for solving this problem.
Scylla’s March 2019 webinar on database migration drew broad interest and will likely remain a popular topic for years to come. So, you’ve decided to adopt Scylla (or Cassandra). What’s the best way to get your Big Data uploaded into your new cluster? What strategies, tools and techniques can you use to get your terabytes or petabytes from point A to point B? Those were the questions of the day for Dan Yasny, Field Engineer of ScyllaDB.
Another week, another Spark and Scylla post! This time, we’re back again with the Scylla Spark Migrator; we’ll take a short tour through its innards to see how it is implemented. Read why we implemented the Scylla Spark Migrator in this blog. Overview When developing the Migrator, we had several design goals in mind. First, the Migrator should be highly efficient in terms of resource usage. Resource efficiency in the land of Spark applications usually translates to avoiding data shuffles between nodes. Data shuffles are destructive to Spark’s performance, as they incur more I/O costs. Moreover, shuffles usually get slower […]
CHECK OUT PART ONE OF THIS BLOG We covered the basics of Elasticsearch and how Scylla is a perfect complement for it in part one of this blog. Today we want to give you specific how-tos on connecting Scylla and Elasticsearch, including use cases and sample code. Use Case #1 If combining a persistent, highly available datastore with full text search engine is a market requirement, then implementing a single, integrated solution is an ultimate goal that requires time and resources. To answer this challenge we describe below a way for users to use best-of-breed solutions that support full text […]
We needed a Python interpreter that can be shipped everywhere. You won’t believe what happened next! “When I said I wanted portable Python, this is NOT what I meant!” In theory, Python is a portable language. You can write your script locally and distribute it to other machines with the Python interpreter. In practice, things can go wrong for a variety of reasons. The first and simpler problem is the module system: for a script to run, all of the modules it uses must be installed. For Python-savvy users, installing them is not a problem. But for a software vendor […]