Scylla Summit 2019 Training Day, a Recap
Our training day at last year’s Scylla Summit had our biggest classes to date. With having all the material available online at Scylla University, hands-on exercises and quiz questions, trainees were able to sharpen their Scylla and NoSQL skills.
Let’s go through a recap of what was covered at the Training Day, both the Novice and Advanced tracks. And don’t worry if you missed it! We’ve embedded links to related content in our documentation, blogs and related tech talks, and you can even see many of the videos from the Training Day sessions directly in Scylla University.
- In the Architecture session, I covered basic topics such as the CAP theorem. In Scylla, high availability is given preference over consistency. The design goals for Scylla were: High Availability, High Scalability, High Performance, Low Maintenance and being API-level compliant with Cassandra (and now DynamoDB). I explained concepts such as Node, Keyspace, Consistency Level, Replication Factor, Gossip, Token Ranges, Cluster, Shards, vNodes and more.
- Tomer Sandler, Scylla’s Technical Customer Success Manager, talked about Data Modeling and how to do it correctly in Scylla. In Scylla, as opposed to relational databases, the data model is based around the queries and not just around the domain entities. When creating the data model, we take into account both the conceptual data model and the application workflow: which queries will be performed by which users and how often. Tomer talked about Keyspaces, Tables, Partitions, the Clustering and Partition key, Datatypes and using drivers.
- The next session, also by Tomer Sandler, dealt with Migrating to Scylla. Tomer discussed offline and online migrations and actual steps with an example of how to perform each one. He also talked about how to use Kafka for the migration, the Scylla Spark Migrator, different options for migrating existing data and some best practices for migration. Some of Tomer’s recommendations were: creating a backup of the current database, cleaning up the current database from stale data, taking into account the fact that migration can take days with very large data sets, performing a test drive by migrating a small part of the data and validating it before executing a full migration and testing the rollback procedure.
- ScyllaDB solution architect Moreno Garcia covered Basic Admin Procedures and Monitoring. He discussed the tools Scylla uses to work, test and monitor the nodes and cluster performance. Among them: Nodetool, logging, CQLsh, the Scylla Monitoring Stack, Cassandra-stress, and tracing. Moreno also covered some basic procedures such as removing a node and checking the cluster status. He showed an example of how to check slow queries and how to solve some common issues.
- Glauber Costa, ScyllaDB Distinguished Engineer, talked about Repair, Tombstones and Scylla Manager. He introduced the topic of repairs, the different kinds of repairs and why and when they are needed. Tombstones can be formed when data is deleted. They generally disappear after a compaction or after
gc_grace_seconds. Data resurrection can be an issue but there are steps to prevent this, which Glauber covered. The next topic was the Scylla Manager which is a Cluster Operations Automation Tool (COAT). It enables centralized cluster administration and recurrent task automation, for tasks such as repairs and backups.
- ScyllaDB Solution Architect Maheedhar Gunturu covered some Advanced Topics, including Advanced Data Modeling. Materialized Views, Secondary Indexes (MV + 2i) and Compaction.
- Tzach Livyatan, our VP of Products, kicked off the advanced track with a talk on Advanced Data Modeling. He started with a recap of some basic data modeling concepts. Tzach gave an example of a large partition, how it’s created, how it can be tracked and how better data modeling can mitigate the problem. The following topic was Counters. They are a Conflict-free Replicated Data Type (CRDT). Concurrent updates converge to a stable value. Counters support increment and decrement and are implemented as a set of triplets (node ID, vector clock, value). Next, Tzach talked about Sets, Lists, and Maps, giving an example of each type of collection. Then he talked about User-Defined Types (UDTs) and presented some code showing how they can be used. Finally, Tzach discussed Time To Live (TTL) and how and when to use it.
- The next talk was by Moreno Garcia covering Materialized Views and Secondary Indexes (MV + 2i). A View is a table containing a copy of the results of some query performed on a base table. Some common use cases are indexing with denormalization, different sort order, and filtering (pre-computed queries). Moreno went on to give an example of MV, discuss what actually happens when a MV is created and how Materialized Views are implemented in Scylla before moving on to the next topic, Global Secondary Indexes. These are a table containing a copy of the key of a base table, with a single partition key corresponding to the indexed column. Moreno gave an example of how to use them and explained how they are implemented under the hood. Finally, Moreno covered a new feature, Local Secondary Indexes, and guidance on when to use each of the above.
- Glauber Costa gave a talk about Compaction Strategies, including the all-new Incremental Compaction Strategy (ICS). The talk started with the Scylla storage write path and some general concepts required to understand the different compaction strategies. Glauber then went on to explain Size-Tiered Compaction Strategy (STCS), Leveled Compaction Strategy (LCS), Time-Window Compaction Strategy (TWCS), and the new Incremental Compaction Strategy (ICS), unique to Scylla. Finally, he explained use cases and which strategy to use.
- Next was a talk by Dan Yasny about Cluster Management, Repair, and Scylla Manager. Dan started by talking about Scylla Monitoring Stack, how it can be deployed (Docker / Native), the alerts it generates, and how to identify common Scylla pitfalls using the monitoring stack. Next, he talked about Scylla Manager, starting from an intro and how it works, then discussing different repair types, why they are needed and different approaches to repair.
- ScyllaDB VP of R&D Shlomi Livne’s talk was titled Advanced Monitoring + Maximize Performance. He easily won the most extensive slide deck competition, with 122 slides! Shlomi shared his experience in discovering and solving performance and other issues and pitfalls. Some of the topics he covered were: how to monitor Scylla, the Monitoring and Manager dashboards, how to debug an issue, stalls, memory management, scheduling, disk scheduling, CPU scheduling, Workload Prioritization, Controllers and Backpressure vs Overload.
- Finally, Yannis Zarkadas, winner of our Scylla Community Member of the Year award, presented the Kubernetes Scylla Operator. The talk used an example to show how it’s possible to leverage Kubernetes to write a great management layer for Scylla. Yannis explained some core Kubernetes principles, the design, and features of the Scylla Operator. He then presented a hands-on practice in a playground environment. Finally, Yannis talked about how to achieve high performance in production.
Thanks to all the participants! You can view some of the videos from the training day along with more training material in our new course, Scylla Operations.
Participants that completed the training day received a certificate. You can also receive certificates for courses you complete at Scylla University. These can be shared on your Linkedin profile. If you haven’t done so yet, create a user account and start learning, it’s free!
Tags: cassandra-stress, compaction, counters, Dan Yasny, data modeling, Glauber Costa, Guy Shtub, Incremental Compaction Strategy, materialized views, Moreno Garcia, nodetool, Scylla Manager, Scylla Monitoring Stack, Scylla Summit, Scylla Summit 2019, Scylla University, secondary inde, secondary indexing, Shlomi Livne, tombstones, Tomer Sandler, UDTs, User Defined Types, Workload Prioritization, Yannis Zarkadas