ShareChat’s Journey Migrating 100TB of Data to ScyllaDB with NO Downtime

Anuraj Jain Chinmoy Mahapatra27 minutesFebruary 14, 2023

We at ShareChat want to describe our journey of migrating our NoSQL use-cases from different databases to cloud agnostic | performant | cost effective ScyllaDB. We had this mammoth task of migrating all the data from existing databases to ScyllaDB with NO downtime. We built a framework which had minimal impact on the application in terms of latency and downtime during the migration process and that has the ability to fallback to the source table in case of any inconsistency during migration. Our framework solved for multiple use-cases including both counter and non-counter use-cases across 3 languages used in ShareChat (NodeJS, GoLang, Java) through shared library. We built recovery and auditing mechanisms to maintain consistency across source and destination tables. There are 3 main components. For the existing data, we have a analytical job based on apache beam which moves the data between source and destination tables. For the live traffic, we have a generic driver which applications use to perform dual writes in both source and destination tables. For any failure scenarios be it in dual writes or any consistency issues, we have a job to re-sync the inconsistent data. During this entire process, we faced many challenges and ended up pushing ScyllaDB to its limit. We tried different consistency levels, compaction strategies, optimised data models and were finally able to move 100 TB of raw data from multiple databases to ScyllaDB with minimal inconsistency.


Share this

Video Slides

Video Transcript

Hi everyone. I am a software engineer at ShareChat. Today me and my colleague will be going over ShareChat’s Journey Migrating 100TB of Data to ScyllaDB with NO Downtime.

Worked at a pretty big scale we have over 40 million monthly active users and we wanted a database that is performant cost effective at the same time provides very good monitoring and resilience moving on to the agenda now so we’ll firstly be going over the live migration framework overview what is the problem that we are trying to solve what is the overall architecture now once we are thorough with it we’ll go over each of the components like we’ll be back in each of the components we’ll go over all the edge cases that we took care of and at the end we’ll be sharing our learnings and results

Moving on to the live migration architecture so firstly our problem is twofolds when we talk about live migration we need to make sure that the service that is serving the traffic is not Disturbed so we need to continue to serve our clients which means that the primary DB cannot be turned off uh at the same time we also need to migrate all the data that is in primary leaving to our new DB which is Cellar DB so if you see from the diagram we have a service this service could be an application a web server or any other job within this service we have something called a DB driver I’ll go more to much deeper later on now this service talks to the primary DB which is its main database which is responsible for reading and writing all the credit operations basically now we also want to move to seller so what we do as one of the phases is start doing dual rights so we start writing both to primary degree as well as writing to celebrate now there could be multiple scenarios where this fails like okay our right to primary DB succeeds but our right to secondary DB which is still rdb sales in that case we have something called a failure recovery system which is responsible for migrating those failed keys from primary DB to selecting we also have an audit DB for auditing all our failures so that we can later on take a better look at why those scale another key component of this part is also config store which basically helps DB driver access any of the tables in any of the databases that we use be it seller DB or otherwise another part which is important is is the exporting part which is moving all the data from primary DB to ciladi so we have a job for this I’ll move on to a deep dive into each of the components so firstly I’d like to start with DBW so DB driver is a key library that takes part in the live migration framework so this is a One-Stop shop for all the services to talk to any of the databases this has wrappers to talk to all the databases that we generally use for nosql use cases uh one of the important points of this is config store since DB driver is all about providing a a uniform interface which is easy for services to access tables we wanted to have all those configs stored in a store and these configs actually contain information like the quest information of the table that the table resides in and so on so conflict stores becomes an important part of this now in chat we use multiple languages across all and like the three primary of them are golang Java and node change now we found that both SQL and gossiple acts are pretty mature in terms of seller drivers so you wanted to go with the go rock so we have built our primary driver or library in golang now we also wanted to have support for node.js and Java drivers there are other approaches for example you could have used a sidecar pattern but we chose to go with the shared Library approach so we have a simple C plus plus staffer on top of the go driver and then we have individual clients for node.js and Java which basically uses the C plus wrapper to interact with the go driver this helps us Because We Do Not Duplicate the code we just have to write a wrapper for each of the other languages also if some other language comes along we generally have they generally have a good uh interface with C plus plus and hence we do not have to replicate the logic uh for all the languages

moving on uh so how were we able to achieve the Dual rights part so I wanted to take a deeper look into DB driver so if we look at DB driver there are mainly three parts one are the DB wrappers now the these DB wrappers are basically responsible for doing thread operations in each of the individual databases so we have a silver Apple which does current operation on Sila we have other wrappers which does spread operations from the other databases now for migration purpose we wrote another wrapper which uses all of these wrappers so we have a migration wrapper so this can use all the individual wrappers which can be used for doing thread operations in the individual databases as I explained before we definitely have a config store which is also used now for having migration conflicts which is basically let’s say table a switches to table B what does the field a maps to in table B and so on so all those migration conflicts are stored in Conflict Studio we also have API Handler which is basically the interface with which services and jobs use for interact with people now another essential part of this whole thing is also the failure recovery mechanism because we we have a possibility that the right to the our primary DB succeeds but the right to the secondary DB which is still rdb fails so in this case we have a topic per table and for each of these topics we have two subscriptions one is the migrator subscription the other is audit subscription now once let’s say uh key fails so let’s say I’m writing to Row 1 and it fails migration wrapper is responsible for pushing it to the failure topic and uh and and we have a failure recovery system which has two jobs pertaining to each of the tables one is the migrator another is the audit Channel a migrator is responsible for consuming from the migrated sub and reading that particular data from primary reading and writing it back to Synergy so we have consistency overall audit channel is basically to take whatever Keys fail from the failure topic via the audit sub and pushing it to an audit TV

moving on so all seems good but what about conflicts since we have two things going on Parallel one is dual rights other is bulk import export so in this case there is a possibility that we are updating the same row key so in this case as you can see we are updating the ID one the service is doing a dual right on id1 where it is changing the L name last name to Kumar at the same time there is the bulk import export job happening which is updating the F name and Company now we what we used is Sila’s last right wings for each of the cells that we have which is for each of the columns there is a time at which the data was written and whatever was written later uh is finally stored so in this case since L name was updated later by this dual rights by the service so that’s based on the final data the bulk import export job had L name and Company written so that stays in the final data now what is important is we need to write we need to do the bulk input with a timestamp Which is less than the timestamp that we start dual rights and Raj will be taking a deeper look into this

moving on so now I’ll go over the different migration stages that we have so the first phase which we start off is dual rights enabled so in this case we are reading from Source DB but we are writing to both Source DB and CID remember we haven’t started the export job yet then we start off the export job still we are writing to both the databases but reading from The Source database then we have an audit phase where during a read phase we read from Source TV but we also read from ScyllaDB and compare the data and if there are any failures We rise to the failure topic as mentioned before but we keep on writing to both Source TV and celebrity then we also have validation phase so this happens for a party for mostly the live data so people generally write a job which is which takes last three months or six months of data uh for all the rows and they basically compare and say if there are any inconsistencies and once all of this is done and we are happy with the consistency we do a read switch so we move the reads to ScyllaDB but still the ripes happen to both Source DB and ScyllaDB in case something bad happens and we want to switch over to Source TV then finally the right switch which is basically we are boot entirely to ScyllaDB and the ReUse and rights everything is happening from select so it seems like we are done with all the use cases but what about counters so let’s take a similar simple scenario since the atomic increments are non-identified in calls so we cannot use the same approach as before let’s say we take a simple scenario at time P1 we do a dual right into both the source DB and seller DB we implement the count of count to be one count by one for id1 so both in the source dbm cellular DB the count becomes one now let’s see the export job picks up the same ID it takes id1 it reads from Source DB and tries to write to cellative so the slightly count becomes 2 although in the source DB it remains as one so this is a problem how do you go about solving it so we basically uh split it into two phases initially we do not start Google rights at the start what we keep on doing is we start we start um an initial phase which is right to Keystone in this phase what DB driver does is it writes to Source TV and instead of writing to seller DB it writes to a dirty key topic now this topic again has two subscriptions one of that is a deduplication subscription which is consumed by a collective job and which pieces pushes all the keys to key store the other subscription is not used in this case so this is the first piece and in this phase the export job has started so the export is happening from Source DB to ScyllaDB no dual rights are happening right is happening to Source DB and the dirty key and all those keys are being moved to Keystone and the second phase once the export job is complete what we do is we start with a partial dual element so we do dual rights on Source dbm Cellar DB for Keys which are not in Keystone but migrator is responsible for migrating the keys that are in Keystone so it uses the migrator subscription it reads the data from Source DB and moves to select now once both of these phases are complete if everything goes well we have the data consistency because we did not do a double write on ScyllaDB for the keys that were used in the migration moving on so the the phases change a bit uh for the counters use case so we have our rights to dirty Keystone says where we are writing the source DB we are not writing the cell rdb but we are writing to a dirty key topic this is important because we want to keep the ownership of those keys that happens during dual rights with uh the source TV for the time once export job is complete we start the Dual rights but this is again a partial roll rights phase as we are my as still the ownership of the non-migrated keys remain with the source TV now once migrate dirty Keys is over then we have the complete dual rights on and we can go on with the General phases of audit validation read switch and write so now I’ll be moving on to the details onto the export job hey hi everybody uh this is anurag I’m a software engineer at sharechat I’ll be uh taking this uh presentation for explaining the export scenario so it depressed a previous part uh chenma explained uh how do we migrate the live data the live traffic that is coming uh on the application to to both the source and uh on the seller DB and now I’ll be taking you to uh explaining like how do we migrate uh the existing data work which is there in the source DB already uh into the seller cluster the cell idb so so actually we we had a lot of options of migrating the existing data to to sell rdb either we we would have written a a custom golang or node.js based job which kind of reads uh the entire website from The Source DB and then uh keeps on writing those uh to seller DB however there are two three issues with this approach first the data set that we are talking about here is really large so so what happens is that uh once uh that particle if there is a scenario that uh the source DB I mean the jobs that we are writing if any pods or any machine fails on that particular it’s hard to track uh like where till where we have read from the super STV until where we we have returned to this uh Cena DB then then there is already analytical jobs that are doing that particular purpose right so we should be going through the analytical job path but uh then then there is a scenario like how uh what approach we should take like either we should take a dump of that particular Source DP into some uh distributed storage do we write a analytical job to do that purpose and then from that particular storage we write another analytical job to to read that particular data back into the cellar DB or either we directly connect both seller and uh in the source DB with our analytical job which would be kind of reading and then parsing that particular data transforming into the data uh which seller understands and then writing it there again back to the slide DB so we chose a path where we will do this in a single job so to so that it becomes more cost effective and more uh you know reliable uh towards uh towards our data transform transfer so so what happens now currently here uh we we actually chose to go with a Apache beam programming model based analytical job which can run on uh gcp data flow ba we we are using gcp ecosystem so we used uh data flow to be our uh our analytical job runtime so here in this particular job I’ll explain like what what is the uh structure of this particular job so it basically contains three phases uh majorly the first is the read Parts kind of reads the uh entire raw data uh entire rows from The Source DB in back education and then it the second phase is the transforming those rules one by one into into the row or the schema that uh that are seller DB uh have so whatever table structure that we we have in seller uh we’ll parse uh the source uh rows into that particular structure and then in the third phase we will kind of migrate all those data into uh seller DB in a badged manner so that it it comes up with a lot more parallelism and a lot more speed uh that we that we can you know transfer our data so because we are we are talking about here uh about a really huge data set okay let’s let’s go to the next slide so here uh

okay so now let’s let’s uh since we we spoke about three phases of our jobs so let’s see some code base like how how we actually what we actually do to read and transform and write that data right see these are the three phases so since I I spoke about in the previous slide that we are using Apache BM based programming model here so let me give some context about that like so so when we read a bulk data from Apache uh from any particular Source uh connector here we uh we read it in the form of batch data that a group of data is actually called a p collection object in in Apache beam that particular P correction contains a a set of data that we want to apply some transforms or some uh functions on top of it that function uh application of some operation on top of that that is data is called P transform right so here what we are doing is we are using a source uh database connector so in Apache beam programming model we have connectors available to connect it with any any Source or saying so here we are using some Source DB connector to read data into our P collection as the first step of this particular job that’s a really part of the job and then uh we are taking that particular Source rows uh P collection and we are applying a migrator function and transform P transform on top of it to transform that particular red Source rows into into the destination uh rows I mean the uh rows which which are in structure or in schema of thistle rdb right and then we we are taking that processed rows and then we are finally applying a Cassandra i o so the connector which is uh which is applicable here for uh you know sending the data back to Salah is is called a Cassandra IO connector the good thing about Salah is that it is blessed with the Cassandra ecosystem so so uh things which are applicable to a Cassandra database is also applicable to Salah so here we we can use the Cassandra IO Apache beam connector to write data to to sell rdb okay let’s let’s see uh there is one interesting problem in the next slide let’s see that thing also so

so now uh till now we in the in the last section uh on the live data Cinemas uh explained like the export jobs uh have to I have to you know write data to solar DB with a timestamp which is kind of lesser uh than the uh time lesser than the live traffic right so how do we make sure that whatever queries that are uh that are going through the center IO which are writing data to sell rdb from export job is always making sure a timestamp which is lesser than the timestamp which is uh coming up in the live traffic queries so actually we can provide this thing with a customization over the Cassandra i o so Cassandra connector can can accept a custom mapper Factory which can provide us a facility to to provide a custom query and then in a custom query we can provide actually a timestamp which which we want to write the data uh with so so here we have actually put provided a example of custom matter Factory which kind of uses a custom object mappers in the next slide let’s see uh a custom object mapper which essentially is uh so custom object mapper is financially is is a mapper implementation from the Cassandra driver actually which kind of can provide us a AC basic method which can use a table accessor from Cassandra driver which can kind of use a custom query we can provide a a query which Cassandra i o should be writing to uh D cell idb right so in this particular query if you see here we have provided an insert query which is using a timestamp which we can hard code here whatever timestamp we want so far for example the live traffic comes uh when we start the Dual right that timestamp was T1 let’s say then then here in the using timestamp we can provide a timestamp which is a bit lesser than the time T1 right so so this will make sure that whatever queries whatever insert queries that we write into the seller DB uh using a central i o driver uh that always goes through that particular hard-coded timestamp which is lesser always lesser than the uh when we start the Dual right so so that that way we make sure that uh the the conflict resolution is always uh handy so okay cool let’s uh let’s move on to the next slide cool so let’s discuss uh how what are the challenges or the learnings through the through this particular entire live migration Journey that we had so we first as the first challenge we had the uh we have the learning of right consistency so whenever we are writing data to sell rdb through the export jobs uh we know that we are writing a very huge amount of data Maybe TVs of data millions and billions of rows so here the latency becomes really uh a important factor so that our export jobs our analytical jobs are very fast and writing uh data reading and writing data uh so so here if if we have the right consistency level is one it is going to reply uh the seller rights are going to be a little bit faster and they are going to reply faster and in case of uh it’s in comparison of Quorum however the downside is that we might suffer a bit on the uh reliability of the data right and in the second learning is uh and the partial data export so we have some Source DB uh connectors available in Apache beam which if the data size is really high sometimes those connectors you know uh the in the beam jobs uh the compute is not correct or the nodes are not scaling up properly they might miss some data they might uh you know uh kind of uh lose some data which is definitely not acceptable so in those cases we might need to you know uh export that data Maybe to a Intermediate A Storage step and then maybe read it from that storage step to uh to sell rdb to ensure that particular reliability and and not lose any data is the third step we can say uh as a third learning we can say we uh if if there is if the data is really uh huge and uh and and there are materialized views which also needs to be created oh and uh on the seller table it is better to create those materialized views after the export process it has already happened because export is going to you know increase the operations a lot more on the base table and uh updating those materialized views in real time uh for those increased operations is not going to uh is always going to you know slow up the process so it is better to create material issues after the increased operations of the export job is finished uh as a fourth learning we can say uh it is important to use the right compaction strategy if we are exporting huge amount of data and if we choose incremental stat incremental compaction strategy it is going to you know consume CPU and then do those compaction on on a time to time basis which is it’s going to take up a lot of Cellar compute uh and then it’s going to provide a lesser compute to the uh to the right operations that are happening uh so so it sometimes it is better to use in null complexion strategy let the data sink in into the seller cluster like the operations uh increase operations for because of export jobs finished and then we apply the right compaction strategy whatever incremental whatever you want for our database right and it’s the first fifth learning over the uh challenge we can say that uh validations of the particular database I mean data which is there in the in Source database and the destination database after the migration I mean the export has finished it also becomes a challenge because here the analytical job that we will write has to you know read through each of the row key uh from the source and then it just compare uh for the same thing in the in the distinction cellular database we have done a lot of kind of validations for example we might have taken uh three month data somewhere uh I mean keys of three month data in somewhere in in some storage and then compare those keys uh in source and destination database so to to make an understanding that uh we we have the data completely synced in both source and destination database across the uh throughout the live migration process uh the counters part is something I think uh China has already spoken about so let’s move on to the next uh slide

Cool so now now I want to discuss a bit on Sharechat’s usage of seller DB how uh how we are using seller debate what scale we are using Sarah DB so till now we have already migrated uh 65 DB of data from our existing databases to standard DB uh and then we are in plan to onboard 50 more DBS of data to sell ADB and around 35 to 40 Services uh at scale are using cell rdb in shares that ecosystem uh currently our biggest cluster size is around 28 DBS and our Max throughput for one of the cluster is around 1.5 million Ops per second so this is the scale that we are currently operating solar db8 yeah so with this I think uh we are done uh with the with explaining how we have migrated uh this huge 100 degrees or more data from to seller DB right uh thank you so much for attending this talk uh if you want to stay in touch here are our contact details which you can follow.

Read More