Grab is one of the most frequently used mobile platforms in Southeast Asia, providing the everyday services that matter most to consumers. Its customers commute, eat, arrange shopping deliveries, and pay with one e-wallet. Grab believes that every Southeast Asian should benefit from the digital economy, and the company provides access to safe and affordable transport, food and package delivery, mobile payments and financial services. Grab currently offers services in Singapore, Indonesia, the Philippines, Malaysia, Thailand, Vietnam, Myanmar and Cambodia.
When handling operations for more than 6 million on-demand rides per day, there’s a lot that must happen in near-real time. Any latency issues could result in millions of dollars in losses.
Like many other on-demand transportation companies, Grab relies on Apache Kafka, the data streaming technology underlying all of Grab’s systems. The engineering teams within Grab aggregate these multiple Kafka streams – or a subset of streams – to meet various business use cases. Doing so calls for reading the streams, using a powerful, low-latency metadata store to perform aggregations, and then writing the aggregated data into another Kafka stream.
The Grab development team initially used Redis as its aggregation store, only to find that it couldn’t handle the load. “We started to notice lots of CPU spikes,” explained Aravind Srinivasan, Software Engineer at Grab. “So we kept scaling it vertically, kept adding more processing power, but eventually we said it’s time to look at another technology and that’s when we started looking at Scylla.”
Easier-to-Use and Less Expensive than Apache Cassandra and Other Solutions
In deciding on a NoSQL database, Grab evaluated Scylla, Apache Cassandra, and other solutions. They performed extensive tests with a focus on read and write performance and fault tolerance. Their test environment was a 3-node cluster that used basic AWS EC2 machines.
“Most of our use cases are write heavy,” said Srinivasan. “So we launched different writer groups to write to the Scylla cluster with 1,000,000 records and looked at the overall TPS and how many errors occurred. Scylla performed extremely well. Read performance was one of the major bottlenecks we had when using Redis, so we wanted to test this thoroughly. We launched multiple readers from the Scylla cluster and evaluated the overall throughput and how long it took to scan the entire table. We’d populate the table with 1,000,000 rows and then figure out how long the entire table scan took.”
“For fault-tolerance, we had a 5-node cluster and we’d bring down a node at the same time we were adding another node and doing other things to the cluster to see how it behaves. Scylla was able to handle everything we threw at it. On the operational side, we tested adding a new node to an existing cluster and that was good as well.”
“Running the same workload on other solutions would have cost us more than three times as much as Scylla.”
– Aravind Srinivasan, Software Engineer, Grab
Growing Use of Scylla at Grab
Scylla came out on top of extensive performance tests and is now in production at Grab. “Scylla is working really well as our aggregation metadata store,” says an enthusiastic Srinivasan. “It’s handling our peak load of 40K operations per second. It’s write-heavy right now but the latency numbers on both reads and writes are very, very impressive.”
The Grab team points to a few things that they especially like about Scylla:
- Performance: “Scylla is on par with Redis, which is in-memory. We are seeing write performances that are extremely good.”
- Cost: “We are running one of our heaviest streams on Scylla and we’re doing it with just a 5-node cluster using AWS i3.4xlarge instances. And that is very, very good for us in terms of resource efficiency. Running the same workload on other solutions would have cost more than three times as much.”
- Easier than Cassandra: “The administrative burden with Cassandra was too great. There were too many knobs I needed to tweak to get it performing properly. Adding and removing nodes was very heavy in Cassandra. With Scylla, everything has been easy – it works just like it’s supposed to.”
- No Hot Partitions: “This was one of the major issues with other solutions. We used to get hot partition/shard issues with other approaches which would take a long time to sort out. With Scylla, there are no hot partitions at all. It’s almost unbelievable when you look at the metrics because all the nodes are getting exactly the same amount of traffic.”
- Support: “Scylla’s support team has truly impressive response times. It shows their commitment to their users and to making ScyllaDB successful.”
Grab is now looking to extend its use of Scylla. Other teams at Grab are hearing about the success of using Scylla as an aggregation store and are looking to migrate additional use cases to Scylla, such as statistics tracking, as a time series database, and more.