ScyllaDB is all about performance, so it’s no surprise that this was the theme of day two at our ScyllaDB Summit.
Joseph Fullop of Los Alamos National Labs kicked things off with a discussion on the evolution of high-powered computing (HPC), and Miguel Martinez Pedreira of CERN translated the ALICE (A Large Ion Collider Experiment) discussion into one about migrating to Apache Cassandra/ScyllaDB. We also explored the performance and scalability improvements of their main database. Did you know that the Large Hadron Collider’s 600 Million Collisions per second translates into 100-150k concurrent jobs on ALICE?
Keeping with the theme, we then heard from Shlomi Livne, ScyllaDB’s VP of R&D, about planning your queries for maximum performance. Want to know the rules? 1) Use prepared statements; 2) Use Paging; 3) Use the correct page size; 4) Beware of multi-partition CQL in queries; 5) Beware of single-partition CQL in queries; 6) Know the faster way to do full scans (and use it!); 7) Use the tools!
Speaking of tools, the next presentation was “ScyllaDB on Kubernetes” by ScyllaDB developer Jesse Haber-Kucharsky. He gave a great high-level view of Kubernetes before walking through an excellent and accessible demo of a small ScyllaDB cluster running in Google Compute engine via Kubernetes. You can find the Docker images we used here.
Arash Rezaei, Senior Performance Architect at Samsung talked about ScyllaDB performance and tuning on Samsung’s latest NVMe SSDs with a focus on throughput vs. latency. Meanwhile, Confluent’s Hojjat Jafarpour shared a concise rundown on streaming Extract, Transform, and Load (ETL) on Apache Kafka. He covered the basics of KSQL and explained ETL, using an applied web analytics demo.
Key to achieving benchmark performance is building a state-of-the-art QA process that focuses not just on functionality, but reliability, performance, and scalability. ScyllaDB’s QA manager Roy Dahan shared what ScyllaDB is doing to meet and exceed these benchmarks. Next, Databricks’ Burak Yavuz taught us about “Stateful Streaming Applications with Apache Spark,” with a focus on structured and stateful processing. Andrej Chu of Rocket Fuel talked about how they currently use ScyllaDB for Page Context Categorization. With ScyllaDB’s focus on high availability, we learned how this enabled Rocket Fuel’s decision to go with it.
After live demos and lunch, we heard from Twitter’s Boaz Avital about how they have been managing 10K node storage clusters. They both created and currently use Manhattan, an architecture very similar to ScyllaDB. The lessons were many, but the upshot is that building a service required them to make custom tooling that thinks so that your operators don’t have to! We then heard from ScyllaDB’s Raphael Carvalho and Nadav Har’El about the best way to ruin your performance – choosing the wrong compaction strategy! Compaction is scary, and learning about anti-patterns (the wrong way to do a thing) is very informative. The duo wrapped up their talk with a hybrid strategy of compaction that combined the best aspects of the approaches previously discussed.
Zen.ly, a social map app, recently migrated from Elasticsearch to ScyllaDB. Their head of infrastructure, Jean-Baptiste Dalido, explained how the team has been running ScyllaDB for the past eight months, what their migration from Elasticsearch looked like, and their reasons for choosing ScyllaDB. After the break, we heard from Brian Hawkins of Proofpoint about how to build a time-series database, including recommended schemas (useful to anyone modeling their data). He ended with a discussion on whether or not ScyllaDB is the right stuff for time series. Conclusion? It truly is.
Rounding out the afternoon, Avi Kivity discussed tools for understanding ScyllaDB in the field, including debugging slow query performance, troubleshooting data models on nodes, and how to isolate bottlenecks. Following up, ScyllaDB’s solution architect Alexander Sicular walked the group through a migration to ScyllaDB from Cassandra with no downtime. On a related note, Eyal Gutkind, another ScyllaDB solution architect, taught us how to optimize, save money, and reduce inter-data center traffic. We also heard from ScyllaDB’s principal architect, Glauber Costa, about repair, backup and restore – a key piece of the ScyllaDB management puzzle.
We wrapped up the event with Shlomi Livne discussing the performance advantages of user-defined types (UDTs) and finally with ScyllaDB’s software engineer Pekka Enberg giving his lightning talk on scalable secondary indexes.
All in all, the content packed a punch. The slide presentations are now available on SlideShare. The rest of the content from our 2017 Summit, including emerging feature documentation, videos, user interviews, and demos, will be available online, so check back soon!