Yahoo! Japan provides a range of Internet search services, reaching 90% of domestic smartphone users, and serving over 50 billion unique visitors generating well over 100 billion page views per day. Operating in two business segments Yahoo! Japan is responsible for search advertising as well as commerce related services, such as auction, shopping and paid digital contents. On top of these, Yahoo Japan runs membership services, such as Yahoo Premium, Yahoo BB, and Yahoo Real Estate.
To support so many services at such scale, Yahoo! Japan runs it own datacenters. Given its scale, the company has a large infrastructure footprint. This is an ongoing problem in Japan, where land is at a premium. For this reason, Yahoo Japan is always looking for opportunities to reduce its physical presence. From the perspective of infrastructure this means fewer servers, fewer nodes, and ultimately fewer datacenters.
With usage growing at a consistent rate, however, the database topology began to sprawl as well. The only way for Cassandra to keep up was by adding more nodes. By 2018, Yahoo! Japan found themselves running 200 Cassandra clusters running 4,100 nodes. At that scale, the team encountered serious problems stemming from Java garbage collection, but also general maintenance issues from running so many machines and nodes.
The team knew that they needed an alternative and began evaluating other NoSQL options. They eventually landed on Scylla, with its promise of smaller clusters combined with better performance and reduced maintenance overhead.
“Scylla runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”
Murukesh Mohanan, DevOps Engineer, Yahoo! Japan
The team took several different approaches when evaluating Scylla. According to Murukesh Mohanan, DevOps Engineer at Yahoo Japan, “Scylla runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”
Testing operations per second, up to about 16 threads, Cassandra and Scylla were roughly equal. Beyond that, Cassandra topped out at about 11K operations per second, while Scylla continued up to 30K operations per second.
Testing latency, the team found a stark difference in the 99th percentile. According to the team’s tests, Scylla managed to handle 100 threads with the same latency as Cassandra handled 8 threads. “From what we saw in our tests, there is just no comparison,” said Mohanan.
The team also saw low levels of utilization with Cassandra. “It’s not able to use the resources available,” said Mohanan. “CPU utilization was barely 50% at any level, whereas Scylla always utilizes available resources at the highest percentage possible, even approaching 100%.”
Most importantly, Scylla helps Yahoo! Japan toward their goal of shrinking the datacenter footprint by running on fewer, more powerful nodes. “Scylla is far better equipped to make use of concentrated computing power,” added Monahan. “Since our goal is to minimize our consumption while also devoting fewer precious square feet to our datacenter footprint, Scylla is the easy choice for us.”