Kiwi.com Takes Flight on Scylla
Kiwi.com is an online flight booking platform that builds customized travel itineraries by assembling flight combinations from multiple airlines. Using this approach Kiwi.com saves travellers money on airline tickets by generating itineraries that mix-and match global airlines with local carriers, finding the best price for the trip as a whole.
At Scylla Summit 2018, we were joined by two speakers from Kiwi.com covering both the technical and business aspects of their migration from Cassandra to Scylla. The topics they covered include Cassandra to Scylla migration, benchmarking on two popular cloud providers’ of bare metal instances, and analysis of performance results that focus on full table scans.
In his presentation, Jan Plhak, Head of C++ Development, discussed how the nature of Kiwi.com’s data creates scaling challenges. Kiwi.com stores data on 100,000 flights a day, and 35 million flights a year. That’s not much data. In fact, as Jan pointed out, your phone can store that. What makes it challenging is that Kiwi.com stores flight combinations. This results in 7 billion flight entries, and a replicated dataset of 20 terabytes. A phone can’t store that.
With this background, Jan related Kiwi.com’s journey to Scylla. The team initially used PostgreSQL, but in order to scale, PostgreSQL required custom sharding, with 60 database instances and 60 Redis caches. Jan referred ironically described this topology as ‘pure joy’.
The team reasoned that a NoSQL database was more appropriate to their use case, so they turned to Apache Cassandra. Ultimately, Cassandra proved unable to scale up, even as the team added more and more nodes. Even worse, the team was required to write custom code to read Cassandra SStables, creating problems with maintenance and upgrades.
“If you’re considering moving from Cassandra to Scylla, I don’t know what’s holding you back!”
Martin Strycek, Engineering Manager, Kiwi.com
Massive Full Table Scans
After discussing the journey to Scylla, Jan went into some detail about the requirements for full table scans, and why Cassandra was not up to the task. Cassandra’s limitations forced the team to implement a custom scanning service to read newly created SStables and stream updates to the cache, Scylla made it easy and safe to do performant full-table scans.
Kiwi.com’s precomputation engine requires all of the data, updated every hour. That load, combined with secondary production and testing put a strain on the production databases. With Cassandra, the team saw CPU overload and massive latency spikes. Jan ascribed this to Cassandra’s underlying Java implementation, as well as the inability to write a query that would read only the most recently updated data.
The Kiwi.com team attempted a Cassanadra workaround. Since Cassandra stores immutable data in SSTables during compaction, they could create a service to parse new SSTables, and then stream that data to the cache. The data from the cache could in turn be used to feed the preprocessing engine while sidestepping Cassandra. Jan described this workaround as ‘opening a Pandora’s box’.
Luckily, Scylla made it possible to close this Pandora’s box for good. Scylla enables continuous full table scans that filter for last-update-timestamp. Scylla can also handle token ranges without overloading. This solved many of Kiwi.com’s problems, in particular building workarounds on Cassandra’s internal, undocumented, unsupported format.
The Migration from Cassandra
Martin Strycek, engineering manager at Kiwi.com spoke to the migration process from Cassandra to Scylla, and provided some context involving TCO. Martin said that Kiwi.com first migrated to Cassandra from a big PostgreSQL cluster to get better performance and scalability, but their demands never stopped growing.
Martin covered the way his team approached testing of Scylla, the migration plan, how it impacts the business and Kiwi.com’s high-level application and infrastructure architecture. In Martin’s view, Scylla has had a significant impact on disaster recovery and availability of the overall system.
According to Martin, Kiwi.com quickly settled on Scylla as a drop-in replacement for Cassandra, but they wanted to prove it out under real-life conditions before making the leap. With a healthy scepticism for vendor benchmarks, Kiwi.com set out to independently evaluate Cassandra versus Scylla. To do so, the team defined equivalent configurations, traffic volumes, and workloads based on the Cassandra benchmark.
The goal was to test Scylla raw speed and performance, along with Scylla’s support for Kiwi.com’s specific workloads. They also wanted some insight into running on bare metal or on a cloud platform, testing GCP versus OVH, popular cloud provider in Europe. The final goal of the POC was to evaluate Scylla’s cost relative to the Cassandra cluster they were running.
Overall, Martin used three approaches to testing:
- synthetic benchmarks
- shadowing production traffic
- internal benchmarking tool for reads
Kiwi.com worked closely with the Scylla team to establish success criteria for the POC. Once the test bed of five nodes each was set up, Kiwi.com ran a set of synthetic benchmarks, shadowed production traffic, and used internal monitoring tools for reads.
Their tests demonstrated a stark difference between the two databases. With a replication factor of 4, Cassandra required 100 nodes to achieve 40K reads per second. With only 21 nodes, Scylla was able to achieve 900K reads per second.
Best of all, Kiwi.com discovered that the running cost of Scylla would be about 25% the cost of Cassandra. Martin provided a detailed breakdown of the hardware costs of running Cassandra versus Scylla, on bare metal and Google Cloud Platform:
A comparison of Kiwi.com’s hardware costs between Cassandra and Scylla on cloud platforms
Having made the decision to go with Scylla, the team undertook the migration to GCP and OVH instances running in multiple cities and geographical regions. In fact, Martin’s team installed the final server in the Scylla cluster just before the presentation, displaying shadow traffic from the live system.
Martin pointed out that Kiwi.com is also excited about Scylla’s roadmap. The ability to prioritize production traffic over analytics will be a huge advantage, since the many algorithms that Kiwi.com runs against the Scylla clusters will have no discernable impact on the customer experience.
Martin wrapped up his Scylla Summit talk by encouraging the audience to “never stop innovating”, stating, “This is the bottom line. If you are considering going from Scylla to Cassanadra, I don’t know why you didn’t do that last week!”
You can watch Jan’s full presentation (with slides), Kiwi.com Takes Flight with Scylla, and Martin’s Kiwi.com’s Migration to Scylla: The Why, the How, the Fails and the Status, from Scylla Summit 2018 in our Tech Talks section. And if you enjoy these in-depth technical insights from Kiwi.com as other NoSQL industry leaders, this is also a good reminder that registration is now open for Scylla Summit 2019.
Register Now for Scylla Summit 2019!
If you enjoyed reading about Kiwi.com’s use case, and want to learn more about how to get the most out of Scylla and your big data infrastructure, sign up for Scylla Summit 2019, coming up this November 5-6, 2019 in San Francisco, California.