ScyllaDB at Strava

Phani Teja Nallamothu7 minutesFebruary 15, 2023

Strava processes data streaming from millions of devices around the world, to provide better fitness and health outcomes. To continue to scale the team at Strava looked to ScyllaDB for its high performance and low latencies to avoid bottlenecking found with other technology solutions.

Discover how ScyllaDB is used at Strava, and how they came to embrace it, shrinking their clusters and easing their administrative burden. Plus a peek into their data-intensive applications.

 

Share this

Video Slides

Video Transcript

Good morning, good afternoon, and good evening to all the folks who joined this presentation from different time zones. Thank you all for joining this talk where I’ll share some information about how ScyllaDB is being used at Strava.

I’m a senior Cloud engineer on the infrastructure team at Strava I’m based in St Louis Missouri I’ve been working in it for just over a decade with interest in SRE devops aim a lots big data and Cloud engineering and during my free time I love playing Cricket poker and I’m a huge lover of dogs

in this presentation I’ll talk a little bit about what Strava is and a brief architecture overview of the Strava platform and how ScyllaDB is being used as Strava and I’ll wrap it up with the benefits we’ve observed by using cell ADB

what is Strava Strava is a subscription platform at the center of connected Fitness which enables athletes to connect with their community and compete we are compatible with over 400 devices through which athletes can share data with their athletic community Strava is part of every phase of athlete’s life like ranging from indoor outdoor training recovery and sleep we have a couple of plans a free plan as well as a subscription plan in the subscription plan we have some really cool features like segments matched runs heat Maps local Legends and Route discovery our Mantra is pretty simple we believe people keep people active we are the largest Sports community in the world with more than 100 million athletes in 195 countries we track more than 30 different activity types on the Strava platform this number is constantly growing we connect athletes to each other and help them find their personal best through group challenges clubs and leaderboards

let’s have a quick look at the architecture of Strava platform

we have requests coming in from the internet they will hit the load balancer or through the cloudfront distribution and then they will be routed to Linker d this is where all the magic happens Linker D handles all the authentication authorization and request routing Strava is fully hosted on AWS we use Apache misos on ec2 instances to manage all the micro Services containers

Strava uses a wide variety of data stores like Amazon RDS MySQL redis memcache S3 uh a few Cassandra clusters and a lot of Sila clusters one so how is ScyllaDB being used at Strava I’d like to discuss a few use cases where seller DB is being used the first one is hortem it’s a funky name all it does is store all the scalar values for activity data any activity which an athlete uploads like activity distance Max and average speed heart rate bike ID shoe ID anything which is a scalar value it gets stored in Horton you can look at the sample schema of this Horton database

we use athlete ID as a clustering column this enables Us in efficiently accessing all the athletes activities from a single node

frequently we look up all the values for any activity so this is a very good use case of Sila and and the other query which we frequently do is range lookup where we aggregate metrics like distance time annulation across multiple activities or even multiple time times foreign

personal records are stored and this is very high write volume operation I think silardb makes a good use case to store all this this uh segments data and the other one is Neo Geo where we store encoded map styles for all the static images this is a high read volume use case because these needs to be loaded on the feed and the right volume is also pretty decent it’s it’s once per activity

what are the benefits we’ve observed by moving over to Sila the first one is it’s much easier to migrate from Cassandra as ScyllaDB API is compatible with Cassandra we’ve also observed some increase in throughput after migrating to ScyllaDB we have experienced consistently low latencies in ScyllaDB clusters which were not possible with Cassandra Whenever there is a GC pause in Cassandra all reads and writes have to wait until the garbage collection is done this has caused significant performance issues the good thing is there are no GC positions we serve over 100 million athletes so we want our platform to be highly available all the time Sila achieves that we can also achieve the same performance with fewer nodes thereby reducing the admin costs and the operational complexity we deploy most of our clusters in the U.S Eastman region in 1a1b and 1C availability zones we have quite a few micro Services which use ScyllaDB they serve roughly around 30 000 read requests per second we use the 2022.1.5 version of ScyllaDB like our instance sizes vary from i4i dot large all the way through i4i dot 8X large and we vary these instances based on the load prediction

ScyllaDB also comes with an Enterprise version which has Sila manager it helps with automating the backups managing multiple clusters and also automating all the repetitive tasks they also provide Enterprise support and Professional Services

thank you all for attending this presentation please get in touch with me if you need more information [Applause]

Read More