Outbrain is the world’s largest content discovery platform, with over 557 million unique visitors and 250 billion personalized content recommendations every month. One of the key components of Outbrain content delivery path is the document database, holding 2 billion records.
Why Outbrain initially chose Apache Cassandra
- Outbrain delivers content 24×7, literally around the globe
- Apache Cassandra’s strong high availability, and in particular their cross data centers high availability helps minimize the risk of downtime
- Apache Cassandra’s rich query language (CQL) allows it to be used for a variety of use cases, from Key/Value to time series data
Our Challenges with the Apache Cassandra decision and implementation
- Apache Cassandra latency issues forced us to use an in-memory (Memcache) cache in front of Apache Cassandra, making the entire solution more complex, and consistency harder to achieve high availability—even when caching, latency had not been satisfactory
- Apache Cassandra throughput requires a large number of nodes, resulting in large clusters which are more complex to manage and with high cost
- Replacing the cluster without further testing was not an option, so we looked for a way to run both DBs next to each other for testing
The Scylla challenge
To evaluate Scylla as a replacement for Apache Cassandra’s document use case, Outbrain has been running a Scylla cluster next to an existing Apache Cassandra cluster, with clients writing and reading from both DBs. The Scylla cluster was deployed in the same DC as Apache Cassandra serving the same traffic, but without an in-memory cache in front of Scylla. Both DB were monitored, for latency and throughput.
With Scylla handling the same amount of traffic as the Memcache+Apache Cassandra combination, we achieved:
- 4x latency improvement
- Reduction of the cluster size one-third
Watch this Tech Talk
to learn more about Outbrain’s testing and Scylla use case.