Outbrain is the world’s largest content discovery platform, with over 557 million unique visitors and 250 billion personalized content recommendations every month. One of the key components of Outbrain content delivery path is the document database, holding 2 billion records.
Why Outbrain initially chose Apache Cassandra
- Outbrain delivers content 24×7, literally around the globe
- Apache Cassandra’s strong high availability, and in particular their cross data centers high availability helps minimize the risk of downtime
- Apache Cassandra’s rich query language (CQL) allows it to be used for a variety of use cases, from Key/Value to time series data
Our Challenges with the Apache Cassandra decision and implementation
- Apache Cassandra latency issues forced us to use an in-memory (Memcache) cache in front of Apache Cassandra, making the entire solution more complex, and consistency harder to achieve high availability—even when caching, latency had not been satisfactory
- Apache Cassandra throughput requires a large number of nodes, resulting in large clusters which are more complex to manage and with high cost
- Replacing the cluster without further testing was not an option, so we looked for a way to run both DBs next to each other for testing
The ScyllaDB challenge
To evaluate ScyllaDB as a replacement for Apache Cassandra’s document use case, Outbrain has been running a ScyllaDB cluster next to an existing Apache Cassandra cluster, with clients writing and reading from both DBs. The ScyllaDB cluster was deployed in the same DC as Apache Cassandra serving the same traffic, but without an in-memory cache in front of ScyllaDB. Both DB were monitored, for latency and throughput.
With ScyllaDB handling the same amount of traffic as the Memcache+Apache Cassandra combination, we achieved:
- 4x latency improvement
- Reduction of the cluster size one-third
Watch this Tech Talk to learn more about Outbrain’s testing and ScyllaDB use case.