Wireshark is a well known utility that offers snooping on all kinds of network protocols. You can use it to dissect Scylla’s internal protocol, including reading and writing rows, exchanging schema information, gossiping and repairs.
After implementing both APIs — CQL and DynamoDB — we, the Scylla developers, are in a unique position to be able to provide an unbiased technical comparison between the two APIs. The goal of this post is to explain some of the more interesting differences between the two APIs, and how these differences affect users and implementers of these APIs.
Today we are releasing a new data integrity testing suite for the open source community. Those who will have the most direct utility for this software will be those testing Scylla and Cassandra databases, or, more broadly, other CQL-compliant databases.
Another week, another Spark and Scylla post! This time, we’re back again with the Scylla Spark Migrator; we’ll take a short tour through its innards to see how it is implemented. Read why we implemented the Scylla Spark Migrator in this blog. Overview When developing the Migrator, we had several design goals in mind. First, the Migrator should be highly efficient in terms of resource usage. Resource efficiency in the land of Spark applications usually translates to avoiding data shuffles between nodes. Data shuffles are destructive to Spark’s performance, as they incur more I/O costs. Moreover, shuffles usually get slower […]
CHECK OUT PART ONE OF THIS BLOG We covered the basics of Elasticsearch and how Scylla is a perfect complement for it in part one of this blog. Today we want to give you specific how-tos on connecting Scylla and Elasticsearch, including use cases and sample code. Use Case #1 If combining a persistent, highly available datastore with full text search engine is a market requirement, then implementing a single, integrated solution is an ultimate goal that requires time and resources. To answer this challenge we describe below a way for users to use best-of-breed solutions that support full text […]
In the run-up to Scylla Summit 2018, we’ll be featuring our speakers and providing sneak peeks at their presentations. This interview in our ongoing series is with Hojjat Jafarpour of Confluent. His presentation at Scylla Summit is entitled Scalable Stream Processing with KSQL, Kafka and Scylla. Hojjat, before we get into your talk, tell us a little about yourself. What do you like to do for fun? I’m a Software Engineer at Confluent and the creator of KSQL, the Streaming SQL engine for Apache Kafka. In addition to challenging problems in scalable data management, I like traveling and outdoors. We are so […]
The Scylla team is pleased to announce the release of Scylla 2.3, a production-ready Scylla Open Source minor release. The Scylla 2.3 release includes CQL enhancements, new troubleshooting tools, performance improvements and more. Experimental features include Materialized Views, Secondary Indexes, and Hinted Handoff (details below). Starting from Scylla 2.3, packages are also available for Ubuntu 18.04 and Debian 9. Scylla is an open source, Apache Cassandra-compatible NoSQL database, with superior performance and consistently low latency. Find the Scylla 2.3 repository for your Linux distribution here. Our open source policy is to support only the current active release and its predecessor. […]
Large partitions, although supported by Scylla, are also well known for causing performance issues. Fortunately, release 2.3 comes with a helping hand for discovering and investigating large partitions present in a cluster — system.large_partitions table. Large partitions CQL, as a data modeling language, aims towards very good readability and hiding unneeded implementation details from users. As a result, sometimes it’s not clear why a very simple data model suffers from unexpected performance problems. One of the potential suspects might be large partitions. Our blog entry on large partitions contains a detailed explanation on why coping with large partitions is important. […]
What’s the deal with prepared statements? A query itself is just a string of text. For example: INSERT INTO tb (key,val) VALUES (“key”, “value”) In this simple example, we inserted two strings in a two-column table. Before that can happen, the CQL statement string (INSERT INTO…) needs to be sent to Scylla, parsed, and assuming no errors in the query, executed. It’s the parsing part that we are concerned with here. Parsing a CQL query is a compute-intensive operation that consumes resources just like anything else you would have a computer do. What if we could do the parsing part […]
At ScyllaDB, our development team is all about performance with improved latency and throughput. Our speakers at our recent Scylla Summit provided many tips and tricks to make Scylla’s superior latency and performance even better. ScyllaDB’s VP of R&D, Schlomi Livne, added to the growing repertoire of these tips with his talk Planning your queries for maximum performance. In it, he outlined some of the how and why of Scylla performance, and concluded with seven rules to optimize your queries.