Yahoo! Japan provides a range of Internet search services, reaching 90% of domestic smartphone users, and serving over 50 billion unique visitors generating well over 100 billion page views per day. Operating in two business segments Yahoo! Japan is responsible for search advertising as well as commerce related services, such as auction, shopping and paid digital contents. On top of these, Yahoo! Japan runs membership services, such as Yahoo Premium, Yahoo BB, and Yahoo Real Estate.
To support so many services at such scale, Yahoo! Japan runs its own data centers. Given its scale, the company has a large infrastructure footprint. This is an ongoing problem in Japan, where land is at a premium. For this reason, Yahoo! Japan is always looking for opportunities to reduce its physical presence. From the perspective of infrastructure this means fewer servers, fewer nodes, and ultimately fewer data centers.
With usage growing at a consistent rate, however, the database topology began to sprawl as well. The only way for Apache Cassandra to keep up was by adding more nodes. By 2018, Yahoo! Japan found themselves running 200 Cassandra clusters running 4,100 nodes. At that scale, the team encountered serious problems stemming from Java garbage collection, but also general maintenance issues from running so many machines and nodes.
The team knew that they needed an alternative and began evaluating other NoSQL options. They eventually landed on Scylla, with its promise of smaller clusters combined with better performance and reduced maintenance overhead.
The team took several different approaches when evaluating Scylla. According to Murukesh Mohanan, DevOps Engineer at Yahoo! Japan, “Scylla runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”
“Scylla runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”
Murukesh Mohanan, DevOps Engineer, Yahoo! Japan
Testing operations per second, up to about 16 threads, Apache Cassandra and Scylla were roughly equal. Beyond that, Cassandra topped out at about 11K operations per second, while Scylla continued up to 30K operations per second.
Testing latency, the team found a stark difference in the 99th percentile. According to the team’s tests, Scylla managed to handle 100 threads with the same latency as Apache Cassandra handled 8 threads. “From what we saw in our tests, there is just no comparison,” said Mohanan.
The team also saw low levels of utilization with Cassandra. “It’s not able to use the resources available,” said Mohanan. “CPU utilization was barely 50% at any level, whereas Scylla always utilizes available resources at the highest percentage possible, even approaching 100%.”
Most importantly, Scylla helps Yahoo! Japan toward their goal of shrinking the data center footprint by running on fewer, more powerful nodes. “Scylla is far better equipped to make use of concentrated computing power,” added Monahan. “Since our goal is to minimize our consumption while also devoting fewer precious square feet to our data center footprint, Scylla is the easy choice for us.” These factors led to developing this case study on NoSQL database migration.