ScyllaDB is all about performance, so it’s no surprise that this was the theme of day two at our Scylla Summit.
Joseph Fullop of Los Alamos National Labs kicked things off with a discussion on the evolution of high-powered computing (HPC), and Miguel Martinez Pedreira of CERN translated the ALICE (A Large Ion Collider Experiment) discussion into one about migrating to Apache Cassandra/Scylla. We also explored the performance and scalability improvements of their main database. Did you know that the Large Hadron Collider’s 600 Million Collisions per second translates into 100-150k concurrent jobs on ALICE?
Keeping with the theme, we then heard from Shlomi Livne, ScyllaDB’s VP of R&D, about planning your queries for maximum performance. Want to know the rules? 1) Use prepared statements; 2) Use Paging; 3) Use the correct page size; 4) Beware of multi-partition CQL in queries; 5) Beware of single-partition CQL in queries; 6) Know the faster way to do full scans (and use it!); 7) Use the tools!
Speaking of tools, the next presentation was “Scylla on Kubernetes” by ScyllaDB developer Jesse Haber-Kucharsky. He gave a great high-level view of Kubernetes before walking through an excellent and accessible demo of a small Scylla cluster running in Google Compute engine via Kubernetes. You can find the Docker images we used here.
Arash Rezaei, Senior Performance Architect at Samsung talked about Scylla performance and tuning on Samsung’s latest NVMe SSDs with a focus on throughput vs. latency. Meanwhile, Confluent’s Hojjat Jafarpour shared a concise rundown on streaming Extract, Transform, and Load (ETL) on Apache Kafka. He covered the basics of KSQL and explained ETL, using an applied web analytics demo.
Key to achieving benchmark performance is building a state-of-the-art QA process that focuses not just on functionality, but reliability, performance, and scalability. Scylla’s QA manager Roy Dahan shared what Scylla is doing to meet and exceed these benchmarks. Next, Databricks’ Burak Yavuz taught us about “Stateful Streaming Applications with Apache Spark,” with a focus on structured and stateful processing. Andrej Chu of Rocket Fuel talked about how they currently use Scylla for Page Context Categorization. With Scylla’s focus on high availability, we learned how this enabled Rocket Fuel’s decision to go with it.
After live demos and lunch, we heard from Twitter’s Boaz Avital about how they have been managing 10K node storage clusters. They both created and currently use Manhattan, an architecture very similar to Scylla. The lessons were many, but the upshot is that building a service required them to make custom tooling that thinks so that your operators don’t have to! We then heard from ScyllaDB’s Raphael Carvalho and Nadav Har’El about the best way to ruin your performance – choosing the wrong compaction strategy! Compaction is scary, and learning about anti-patterns (the wrong way to do a thing) is very informative. The duo wrapped up their talk with a hybrid strategy of compaction that combined the best aspects of the approaches previously discussed.
Zen.ly, a social map app, recently migrated from Elasticsearch to Scylla. Their head of infrastructure, Jean-Baptiste Dalido, explained how the team has been running Scylla for the past eight months, what their migration from Elasticsearch looked like, and their reasons for choosing Scylla. After the break, we heard from Brian Hawkins of Proofpoint about how to build a time-series database, including recommended schemas (useful to anyone modeling their data). He ended with a discussion on whether or not Scylla is the right stuff for time series. Conclusion? It truly is.
Rounding out the afternoon, Avi Kivity discussed tools for understanding Scylla in the field, including debugging slow query performance, troubleshooting data models on nodes, and how to isolate bottlenecks. Following up, Scylla’s solution architect Alexander Sicular walked the group through a migration to Scylla from Cassandra with no downtime. On a related note, Eyal Gutkind, another Scylla solution architect, taught us how to optimize, save money, and reduce inter-data center traffic. We also heard from Scylla’s principal architect, Glauber Costa, about repair, backup and restore – a key piece of the Scylla management puzzle.
We wrapped up the event with Shlomi Livne discussing the performance advantages of user-defined types (UDTs) and finally with Scylla’s software engineer Pekka Enberg giving his lightning talk on scalable secondary indexes.
All in all, the content packed a punch. The slide presentations are now available on SlideShare. The rest of the content from our 2017 Summit, including emerging feature documentation, videos, user interviews, and demos, will be available online, so check back soon!