Update: For a more recent and detailed look at how Disney+ Hotstar is using ScyllaDB, see Migration to ScyllaDB Cloud: Why Disney+ Hotstar Replaced Redis and Elasticsearch.
Disney+ Hotstar was originally launched in 2015 as Hotstar, the streaming service for Star India, which was later acquired by The Walt Disney Company. In March 2020 the digital service was rebranded as Disney+ Hotstar and subscriptions soared. By November 2020 they had increased to 18.5 million paid subscriptions, becoming the fastest growing segment of Disney+ global subscribers. They also expanded into Indonesia in September 2020.
Disney+ Hotstar in addition provides an ad supported content tier, which is growing even faster than paid subscribers. Total paid and unpaid subscribers account for over 300 million users. Disney+ Hotstar coverage of last year’s India Premier League cricket competition (IPL20) set a record for concurrent streaming viewership of any sporting event, with an audience of more than 25 million subscribers.
Here’s a short preview of their story migrating to ScyllaDB Cloud:
NoSQL Use Case: Continue Watching
We were thrilled to have the engineering team of Disney+ Hotstar join us to present at our online ScyllaDB Summit 2021 this January. The speakers included Vamsi Subash Achanta, the Architect behind their ScyllaDB deployment, and Balakrishnan Kaliyamoorthy, their Senior Data Engineer.
Their “Continue Watching” feature uses ScyllaDB to track every show for every user, remembering the timestamp where they were last watching their show; or, if you were done with one episode in a series, it will prompt you to watch the next episode. It can even alert you to continue watching when a new episode of your favorite show becomes available. This feature can be used cross-platform, so you might have begun watching on your computer, and then can continue watching from your mobile or tablet device.
Migrating to ScyllaDB Cloud
Prior to their adoption of ScyllaDB, their infrastructure was built on a combination of Redis and Elasticsearch, connected to an event processor for Kafka streaming data. Their Redis cluster held 500 GB of data, and the Elasticsearch cluster held 20 TB. Their key-value data ranged from 5kb to 10kb per event.
This presented the team with a few problems. First was that the multiple data stores also meant maintaining multiple data models. The data was scaling rapidly, which meant that costs were also rising dramatically.
The first redesign decision was to adopt a new data model. For the user content table, the userid acted as the primary key, the content ID as the secondary (clustering) key, plus a timestamp and additional fields.
The team considered a number of alternatives, from Apache Cassandra and Apache HBase to Amazon DynamoDB to ScyllaDB. Why did they choose ScyllaDB? Two important reasons: first and foremost, consistently low latencies for both reads and writes, which would ensure a snappy user experience. Secondly, ScyllaDB Cloud, our fully managed database as a service (NoSQL DBaaS), offered a much lower cost than the other options they considered.
Performance monitoring results of ScyllaDB showing sub-millisecond p99 latencies, and average read and write latencies in the range of 150 – 200 µseconds (microseconds).
Balakrishnan (“Bala”) gave an overview of their migration process. They began with saving a Redis snapshot in an RDB format file, which was then converted into Comma Separated Value (CSV) for uploading into ScyllaDB using cqlsh. One thing Bala cautioned was to watch for maximum useful concurrency of your writes to ensure you do not end up with write timeouts.
A similar process was applied to the Elasticsearch migration.
Once ScyllaDB Cloud had been loaded with the historical data from both Redis and Elasticsearch, it was kept in sync by modifying their processor application, to ensure that all new writes also were made to ScyllaDB, and an upgrade to the API server so that all reads could be made from ScyllaDB as well.
At that point, writes and reads could be cut out from the legacy Redis and Elasticsearch systems, leaving ScyllaDB to handle ongoing traffic. This completely avoided any downtime.
The Disney+ Hotstar team had also done some work with ScyllaDB Open Source NoSQL Database, and needed to move that data into their managed ScyllaDB Cloud environment as well. There were two different processes they could use: SSTableloader or the ScyllaDB Spark Migrator.
SSTableloader uses a nodetool snapshot of each server in a cluster, and then uploads the snapshots to the new database in ScyllaDB Cloud. This can be run in batches, or all at once. Bala noted that this migration process slowed down noticeably when they had a secondary (composite key).
To avoid this slowdown the team implemented the ScyllaDB Spark Migrator instead.
In this process, the data was first backed up to S3 storage, then put onto a single node ScyllaDB Open Source instance; a process known as unirestore. From there it was pumped into ScyllaDB Cloud using the ScyllaDB Spark Migrator. Bala found our blog, Deep Dive into the ScyllaDB Spark Migrator, particularly helpful in setting up his migration.
Advantages of ScyllaDB Cloud
Disney+ Hotstar is now running on ScyllaDB Cloud. So beyond the improved performance, predictable low latencies, and better TCO, they are also relieved of the burden of administrative tasks like backups, upgrades and repairs. Now they can focus on scaling their business.
If you’d like to learn more about the advantages of ScyllaDB Cloud for your own organization, please feel free to contact us.
Discover More in the On-Demand Webinar!
The Disney+ Hotstar team was hard at work since the ScyllaDB Summit session discussed in this blog. They later joined ScyllaDB for an in-depth webinar discussion on their additional ScyllaDB NoSQL use cases. You can watch it on demand: