NoSQL Case Study: Yahoo! Japan

Yahoo! Japan Migrates to ScyllaDB to Save Time and Space

About Yahoo! Japan

Yahoo! Japan provides a range of Internet search services, reaching 90% of domestic smartphone users, and serving over 50 billion unique visitors generating well over 100 billion page views per day. Operating in two business segments Yahoo! Japan is responsible for search advertising as well as commerce related services, such as auction, shopping and paid digital contents. On top of these, Yahoo! Japan runs membership services, such as Yahoo Premium, Yahoo BB, and Yahoo Real Estate.

The Challenge

To support so many services at such scale, Yahoo! Japan runs its own data centers. Given its scale, the company has a large infrastructure footprint. This is an ongoing problem in Japan, where land is at a premium. For this reason, Yahoo! Japan is always looking for opportunities to reduce its physical presence. From the perspective of infrastructure this means fewer servers, fewer nodes, and ultimately fewer data centers.

With usage growing at a consistent rate, however, the database topology began to sprawl as well. The only way for Apache Cassandra to keep up was by adding more nodes. By 2018, Yahoo! Japan found themselves running 200 Cassandra clusters running 4,100 nodes. At that scale, the team encountered serious problems stemming from Java garbage collection, but also general maintenance issues from running so many machines and nodes.

The Solution

The team knew that they needed an alternative and began evaluating other NoSQL options. They eventually landed on ScyllaDB, with its promise of smaller clusters combined with better performance and reduced maintenance overhead.

The team took several different approaches when evaluating ScyllaDB. According to Murukesh Mohanan, DevOps Engineer at Yahoo! Japan, “ScyllaDB runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”

“ScyllaDB runs so much faster than Cassandra that it can be difficult to compare what’s going on between them.”

Murukesh Mohanan, DevOps Engineer, Yahoo! Japan

Testing operations per second, up to about 16 threads, Apache Cassandra and ScyllaDB were roughly equal. Beyond that, Cassandra topped out at about 11K operations per second, while ScyllaDB continued up to 30K operations per second.

Testing latency, the team found a stark difference in the 99th percentile. According to the team’s tests, ScyllaDB managed to handle 100 threads with the same latency as Apache Cassandra handled 8 threads. “From what we saw in our tests, there is just no comparison,” said Mohanan.

The team also saw low levels of utilization with Cassandra. “It’s not able to use the resources available,” said Mohanan. “CPU utilization was barely 50% at any level, whereas ScyllaDB always utilizes available resources at the highest percentage possible, even approaching 100%.”

Most importantly, ScyllaDB helps Yahoo! Japan toward their goal of shrinking the data center footprint by running on fewer, more powerful nodes. “ScyllaDB is far better equipped to make use of concentrated computing power,” added Monahan. “Since our goal is to minimize our consumption while also devoting fewer precious square feet to our data center footprint, ScyllaDB is the easy choice for us.” These factors led to developing this case study on NoSQL database migration.