Get started on your path to becoming a ScyllaDB NoSQL database expert.Take a Course
Tzach has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signaling, policy and charging applications.
Hi everyone! Welcome to my session at ScyllaDB Summit 2023 use ScyllaDB to replace Amazon DynamoDB everywhere better more affordable all at once. I’m Zachary VP product at ScyllaDB, before that I spent many years in the Telecom domain and as you might see the picture here is it from a few years ago and I’m getting older from Summit to Summit.
a white testing is something problematic and why on some cases we found that customers that try to do Benchmark fail or not fail get the wrong result I would say first there’s many many combinations you can choose the number of loader thread connection you can choose a distribution zip fian or uniform or whatever you want to build your test in a way that you can reproduce and get reproducible results similar results every time you need to understand the loader The Benchmark tool and his problem or advantages and does it suffer from coordinator Mission or not Etc and you need to understand the database that you are testing even though dynamodb it’s a black box for us we don’t know exactly how it’s implemented we did find out for example that it failed or not failed doesn’t behave nicely with zipfian distribution we need to understand why and maybe switch to other distribution uh so without further Ado let’s jump to some result so this is a result that we got from the ScyllaDB cluster that I mentioned earlier or three nodes the cost by the way of such a cluster for a year is around thirty thousand dollar which is not much I spoiler that when we’re going to see a dynamodi because for a seminal throughput uh we tested here with hotspot distribution hot hotspot distribution to mimic realistic use case when you have a lot of partition but only a portion of them is busy at every time all of them are busy but the hot spot as it call is more busy than the others and we test it with different ratios of read versus write so for example the first row here represent a update of 10 percent and weights of 90 percent Etc till a object of 90 percent of fluid of one percent and you see here the throughput that we can push or utilize a ScyllaDB cluster in this case in the in the P99 agency in the entire presentation I’m only going to show P99 latency because it’s more relevant and more interesting in most cases and the mean latency as you can see even this small cluster can handle around 100 000 requests per second with the P99 leadership of one digit which is really impressive the mid latency by the way will be sub millisecond in most of these cases uh you note that ScyllaDB is actually better in in right intensive workload and in red intensive workload that’s come from the design inherited from Cassandra and the Dynamo paper by the way uh but but even for read intensive the throughput is still very high and like this is still very low so quite an impressive result one thing I want to mention that when you’re doing Benchmark you usually don’t want to test maximum throughput and latency at the same time so what we did here is first we measured the maximum throughput we can push to ScyllaDB using the the loaders and then for latency not the result that you see here we use only 70 percent of this throughput which is sustainable and have reasonable a latency if you push any system not just database to the max throughput you get bad latency that’s effect this is the same the same data visualized in a way I hope it make it clearer and not more complex this graph show both the P99 latency in colors and in blue it shows the throughput that we push for every use case but the data is the same so let’s not spend too much time on it and compare it to dynamodb’s promise so with dynamodb again we tested with hotspot the same Hospital distribution the same loader and the different ratio of updates and reads and as you can see here and let’s focus right now let’s ignore the throughput for a minute and focus on the latency as you can see here the P99 latency is much higher so if with sea light was below 10 millisecond for all use cases here it’s above 20 for this distribution with other distribution it’s even worse and by the way the X here means that this you have zero read so there is no read latency
uh the same visualization for for dynamodb by the way you might notice that dynamodb is actually a prefer reads or the more intensive the read part of the workload the better dynamodb behave which is the opposite of ScyllaDB so each of these database have a different sweet spot just touching on the cost if you remember earlier this throughput of around 100 000 or more was around thirty thousand dollar here roughly the same scale of throughput cost much much more and it’s gonna be up to half a million a dollar which is a lot of course and later I will compare them head to head on the coast and this is a later so in this case I try to compare the same throughput for ScyllaDB and dynamodb for the for one use case a use case of a hotspot distribution with 50 rate 50 right so um the latency as before is better with ScyllaDB hero actually do show mean latency which I promise not to show but you can see that the P99 is much better in ScyllaDB much worse in dynamodb but even more significant the yearly cost of ScyllaDB a cloud compared to dynamodb is about 10 percent so you are spending 10 times less with ScyllaDB and getter and getting lower latency in the same time by the way all the costs that I’m showing here is yearly with uh reserved both in dynamodb and in ScyllaDB if of course if you commit one year you get better cost better prices and that’s true for most systems including 0db cloud and dynamodb
um I want to quickly touch another distribution I mentioned earlier that we use hotspot we because we think it’s more realistic but we also tested with uniforms and zipfian zipian proved to be very bad for dynamodb and the reason is that zipian tend to send a lot of traffic to one or two partition and this partition quickly become overloaded and dynamodb will throttle them and the latency will be bad so if your data isn’t zipian model and you’re using dynamodb I would strongly recommend you to re to improve your data modeling because it simply won’t work it’s not great for ScyllaDB as well by the way but on Dynam would be it’s really bad so this is the same information comparing the different distribution a bit just on dynamodb in the essence of time I want to skip that the full result of what we did in Dyno the B I’m only showing it so if you uh download the slide later you know you have the information
um so why some of of uh of your workload and down in the business specifically we saw the transition might fail a dynamodb have a hard limit so throughput per partition with sealer do not have um and this throughput if you cross it for one partition not only that Dynamic people throttle even start throwing requests away worse it will introduce latency because this timeout to take a long time and basically all the requests that come after have uh latency as well and you are in a very bad shape um and Alex debris by the way and I hope I pronounce the name right wrote a very good blog post about it I recommend you uh reading it and everything he writes about dynamodb very interesting uh so just keep in mind if you’re a dynamodb user that you might if you hit this uh this limit you have a problem and you probably need either to switch to ScyllaDB or maybe improve your data modeling and so dynamodb actually have two kind of of uh of price Model that I quickly want to talk about one is on demand and one is provisioned as with any system and I mentioned it before On Demand is cost more so on most use cases if you know you workload in advance and Mo and and I think if you have a big system you should a reserve capacity or provision will be much cheaper and you mostly want to use that but then you have another problem what provision capacity should they set for each table and let me give you a concrete example it’s kind of synthetic but I think it’s also typical let’s imagine that you have nine table and the sustainable capacity that you want for each is 20 020k capacity so this this line here on the top represent the capacity you might want to provision but unfortunately the real life use case that you have that each of this table can pick to 100 000 requests per second and it can be for for an hour two hours per per month or per day it’s hard to predict and now you have a problem
um in dynamodb at least you have two main options one option uh that you can buy or order Reserve capacity for 100k for all of this table and this makes sense it will fix your provision problem but of course will cost a lot of money and if you evolved from provision to On Demand it’s also cost and so you don’t have a good solution here one solution that people use sometimes is to actually use one huge table for all of this table it can work but have other data modeling problems that I’m not going to touch on I would recommend using this method of a single table design only if you have the same partition key uh on ScyllaDB you don’t have this limitation per table and so if you allocate an additional capacity for all the cluster which doesn’t have to be huge if one of the stable Peak and the other are not at the same time you are still inside your cluster capacity like in this case so wincilla for example if I allocate or resize my cluster for 1250 000 requests per second it can sustain one of or two table Peak and the cost on seller Cloud will be way way less than dynamodb in this case more than 20 times less expensive the dynamodb because you don’t have to to understand or provision the capacity per table in advance you you can do it for the entire database so that’s much easier I hope by now you convinced that Sarah DB costs less than Dynamo and have better latency and so I want to switch to another topic still at the B alternator what is 0db alternator so some of you which are familiar with cldb for many many years now that we started with Apache Cassandra compatibility but around four years ago we introduced dynamodb compatible API as well we call it internally we call it alternator it’s based on rest and HTTP not on binary protocol uh why use an alternator so let’s say that you’re using dynamodb and you want to switch to ScyllaDB and so you can in theory update your application and start using SQL driver and work with ScyllaDB but it’s easier for you just to change the IP on your application and and working with the dynamodb compatible API you might another alternative that you might want to migrate from dynamodb to on-prem gcp AWS Outpost or any other platform that does not have dynamod Builder or you don’t maybe you don’t want to pay the cost Cloud cost of working from on-prem to AWS so cdb with alternator is a good alternative for you um so set it a bit as I mentioned is with us from for a few four years now it’s production ready it’s available on ScyllaDB Cloud as I will demonstrate in a second or a minute and we have production customer for it for quite some time and more coming every day this is a typical use case or not a typical use case this is a maybe a use case of one of our customers that’s run dynamodb API on production the alternator on production so we have 80 nodes I wouldn’t disclose any but you have 80 nodes and 400 000 requests per second the estimated cost here is much cheaper for cldb as expected but to be completely transparent I don’t really have the number how much they are pairing paying to AWS so this is like an estimation using the AWS calculator based on the request per second and storage that they have so as you can see again here even with alternator API still is much more uh makes sense from Price perspective okay so let’s do a quick demo and now I have to switch tab I hope okay so we we talked about alternator in the cloud and so you’re probably asking yourself what is still a cloud so this is still a cloud it’s a SAS application that runs ScyllaDB entry price as a managed service you can run let me demonstrate the new cluster here we have dedicated and serverless in beta we have a full session about serverless on the summit so I’m going to ignore it at least for now if I’m going to Dedicated um VM I can choose AWS or gcp and more importantly I can choose if I want to stick to SQL API or dynamodb API I already have a cluster running it’s actually running from my own account or my own credit card so I’m not going to start another cluster you know that I did chose T3 micro which is a very small instance only used for evaluation so the cost will be low if I’m going to the to the cluster information I see here connect instruction so I can copy paste it to my terminal I’m not going to demo all the connection here you can do it yourself I just want to mention that this this example was directly copied from dynamodb example and if you are familiar with dynamodb you will see everything is exactly the same the only different parameter here is the end point which is different between the real dynamodb and the Dynam would be compatible and still a cluster one thing I do want to show that if I’m clicking the monitoring page I’m switching to this page it’s a grafana dashboard and the standard of the governor dashboard for people familiar with ScyllaDB in this case it’s the alternator dashboard we have multiple Dash mode and here you can see the traffic going to the cluster of course this is a dummy instance I don’t have any real traffic or it’s very very low just from a health checks and such um so not much to see here just want to quickly demonstrate the monitoring dashboard and let’s go back to sell a cloud we have instruction of connecting with python uh other driver Etc and I can take standard action on the cluster like asking to scale it delete it Etc do keep in mind and this is something that failed me in the past that you need uip on the allow IP list if you’re doing such a demo and most production cluster by the way does not use public IP at all and they use a VPC and do VPC period between the application and and the cluster um so with that I hope you convinced that you are now convinced that the alternator can run and sell a cloud and it’s compatible compatible with cldb sorry with dynamodb this is the instruction you can browse it offline and use it by the way it’s very easy to evaluate um alternate or as the Dynamo API with Docker so I know the Dynamo also provide a Docker for the dynamodb but to be honest I think our version the cldb version of dynamodb it’s more compatible with the actual Dynamo division the docker version that the AWS provides yes so if you want to run dynamodb locally or on gcp of course another platform uh Sarah is a good option
so what is the state of the API compatibility we already support for many years we support all the main operations like table operation batch operation item operation query scan indexes time to live is pretty is a pretty new feature that we added this year and all the standard data types but some feature on Dynamic dynamod beer is still missing and we plan to implement them and by the way I will be happy to get the input of people on this session maybe later on the chat to let me know what is more important for you on our dynamodb implementation so we are planning to add point in time backup export and import from S3 and in the future may probably not this year paper operation and multi-item transaction and maybe in 2024.
uh okay last thing I want to touch on is migrating from dynamodb to Silo so we already convinced that ScyllaDB is less expensive have better latency and it’s compatible with dynamodb but you already have a lot of information on dynamodb now you want to move to Silo how to do that without downtime so first of all I assume here on this schema that you have a service that you cannot take down you need a you cannot sustain downtime and you also have historical data that you want to migrate from dynamodb to ScyllaDB if that’s not the case then you only need you don’t have historical data and you can sustain a few seconds or a few minute down time you can simply change the ipf in our application from dynamodb to cylinder will just work as I will demonstrate earlier but let’s say it’s not the case uh so what you want to do you’re leveraging a dynamodb feature that’s called stream which record every update that you do you enable it on the table you want to migrate and once you do that you listen using a tool I will mention later you listen to this update and feed them into ScyllaDB once you do that you can start writing to seal a new update as well and then you scan all the data historical data on dynamodb using the same tool and fit it into ScyllaDB once you complete that at least in theory the database are in sync and you can start using cldb as your primary database but on practice since it’s take time and want to validate it everything is okay you can run and people do that both database at the same time for quite some time doing dual read doing validation only once you are absolutely convinced that everything is okay you can fade off a dynamodb by the way this migration schema is Good from for any database to any database I guess the unique part here is the way that we can listen to the dynamodb stream and feed it to ScyllaDB which make the migration a little bit easier and the way to do that is with the tool open source tool called facilityb migrator it’s built on spark and it’s have readers from Cassandra ScyllaDB and dynamodb and writer to Cassandra and ScyllaDB and maybe to dynamodb as well if if you want to use it we mostly use it surprisingly to migrate to seller and out outside of Selah but I I suspect it’s possible another advantage of this tool which is a highly performance highly parallelized you can add code to the spark worker and do minor transport or major transformation in the middle so let’s say that you’re reading a data but you don’t want to put it as is to sell I want I don’t know to increment the decrement to change the format of what’s not you can do it and the the tool is completely open source and you’re more than welcome to use it with other project or even contribute I guess more database to it and if you want to read more about the tool there is a good blog post by a by Ravid that described this process he was one of the main contributor to this uh um spark migrator so I put the link here and if you’re interested you can read more about it we are almost off to out of time so I want to go back to the summary slide I presented at the start so what I try to do in this presentation is to prove to you still at the B is compatible compatible with dynamodb but can run anywhere in particular show how it’s run on ScyllaDB Cloud but I promise it’s random Docker gcp on-prem everywhere everybody want I I hope I proved to you that ScyllaDB is less expensive the Dynamo to be in many use case and have a much better P99 latency but I encourage you to try it and test it yourself maybe with the help of our support team if you have a trouble benchmarking and I show you not not really prove to you but I show you how you can migrate from Amazon DB to 0db without downtime if you want an actual proof visit the blog post that I mentioned earlier that include all the code it’s also in GitHub and you are more than welcome to test it and with that I want to thank you and I will be in the chat during this call and after to answer any question you might have [Applause]