fbpx

Where Database Monsters Connect!

The Consistency vs Throughput Tradeoff in Distributed Databases

21 minutes
Discover the latest trends and best practices impacting data-intensive applications. Register for access to all 30+ sessions available on demand.
Enter your email to watch this session from the ScyllaDB Summit 2023 livestream. You’ll also get access to all available recordings

In This NoSQL Presentation

It is well known that there are availability and latency tradeoffs that are required in order to achieve strong consistency in distributed systems. This talk will discuss whether or not there is a consistency vs throughput tradeoff in distributed database systems that guarantee ACID transactions.
ScyllaDB Summit 2023 Speaker – Daniel Abadi, University of Maryland College Park, Darnell-Kanal Professor of Computer Science

Daniel Abadi, University of Maryland College Park, Darnell-Kanal Professor of Computer Science

Daniel Abadi is the Darnell-Kanal Professor of Computer Science at the University of Maryland, where he performs research on database system architecture and implementation, especially at the intersection with scalable and distributed systems. He is best-known for the development of the storage and query execution engines of the C-Store (column-oriented database) prototype and deterministic, scalable, transactional, distributed systems such as Calvin. An ACM Fellow and recipient of numerous awards, Abadi received his PhD in 2008 from MIT. He blogs at DBMS Musings and tweets at @daniel_abadi.

Video Transcript

I’m a professor at the University of Maryland College Park and I’m here to talk about today trade-offs in distributed systems and particularly the the consistency versus super trade-off when dealing with database systems.

so that’s that’s what we’re going to talk about today so in general when we deal with distributed systems there are lots of trade-offs to consider I mean there’s no free lunch and then any any decision you make you want optimize one thing you often have to give something else up right so for example it’s a well-known known options the cap theorem has been around for around I don’t know 15 to 20 years or so by now um yeah it’s very famous it’s explained very clearly that if you want to have a consistent version of the data across all replicas wherever they may be in the world it doesn’t come for free you have to give up some availability right so it’s possible that uh that your system may be unavailable for uh for rights or even sometimes for read as well um if you want to have a fully consistent system right that’s the cap theorem um that’s well known another example is the Passat theorem uh that uh that says that uh uh not only just trade-off availability because it’s consistency uh but also in uh uh in the common case you also have to trade off Lane seat for consumency as well right so if you want to play consistent system you have to pay some amount of replication that takes time and that uh um and that causes increased latency in in the system right so these two are sort of fairly well agreed upon if you want consistency it’s important you can go get all three things you get um you know sort of a a system of which it’s much easier to develop develop uh to write application programs over it but you do have to give up some availability you can also latency and the question I want to discuss today is do you always have to give up throughput as well for consistency or or is it really uh uh not the case that doesn’t trade-off here and it’s an interesting question and it’s less obvious you may think and so we’ll go we’ll go into that now so let’s go into some background first right so before we get to distributed systems and particularly distribute database systems let’s talk about a single singles node regular old-fashioned database system right so we have we’re a simple application where what we have two tables on the database one is a um a table indicating the things that we’re selling let’s say for example we’re a retail application right so we’re selling things in widgets um and uh and we have customers and and say one table indicates what the price of the current stock inventory that we’re currently selling how much is left in stock then we have some information for our customers you know in particular how much store credit they may have to be able to buy items in our database system or in our in our retail application uh so uh so two typical tables and a typical uh transaction may want to run is a constantly want to buy something you need to look at it right so for example let’s say the customer two comes along uh who currently has a store kind of 100 and wants to buy widget three uh which uh Connie is a price of 79 with with a shortcut so how do you do this originally what you would do is uh in a single system if before we get to distributed systems and in a single system what you would do is uh you would uh read the relevant data right so you so in this case we need to read the widgets table to get the the the item that’s cost wants to buy and we have to read the customers table to get the value of the of the credit the customer has to be able to buy to buy the item right so we want to do that you know there’s different ways to implement this but uh but the traditional most common way to deal with this it’s a lock both things are going to read that way you don’t have to uh worry about other rights coming along interfering with what we’re trying to do and potentially causing isolation problems right so we’ll unlock uh widget three which is what we’re trying to read right now and we’ll lock also customer two which I’ll try to be right now and then once they’re both locked we can run the transaction right so first we have to check to see uh how much stock is left right so we don’t have anything in stock we should have bought the condash we can’t run this this purchase uh right now we’re fine we actually we have exactly one left so we’re fine uh and similarly if uh if the if you don’t have enough credit right if the if the store credit of customer two is smaller than the price then that would also be it probably should have bought but in this case also we’re fine uh because we have 100 179 that’s also fine but uh if either these two things uh were fast then we’d have to board the transaction s are both true so if uh uh since since we both have enough stock and we have both not credit then we proceed with the transaction uh and uh we decrement the stock right so we make the purchase so therefore stop will go from one down to zero uh and uh the credit will go down from 100 down to 21 as a subtract the price from from the store credit right and then we can release the locks and without the transaction right so in a single node system this database transaction uh is is fairly straightforward to run uh just requires a little bit of locking but we’re gonna get a correct result but but when we do this this lock-in uh and and certainly there’s no no concern about any of the asset properties it will guarantee activity and consistency isolation and durability so uh that’s great uh but now what happens if we have a distributed database system right so it turns out this very simple condition uh becomes much harder once you have the data divided across multiple different machines right so here let’s get this here we have the same exact two tables the the widget table and the customer table uh and customers are buying widgets and uh and uh so we have sort of One widget print node and two customers per node that’s a simple division that’s a partitioning of the original custom table in the original widget table and now what I want to do um let’s let’s kind of get rid of the two which you don’t care about because we’re only focused still on customer two buying which are three so I just got rid of the the two partitions which are not relevant to this particular transaction that we’re showing here on the screen just the two which are relevant uh which is uh you know the bottom left we have um uh the widget three which is what can I buy and the top right we have the customer two which is trying to buy that widget right so we have now two machines storing data relevant to transaction so in this case now what do we do right so uh so the original step could be the same we can still use locking if you want you know not all distribute systems uses locking it cannot play something to shoot that like a fish could be an issue but that’s just keeping simple I’ll assume we’re using locking uh so uh so we’ll go ahead and we’ll lock the uh the two relevant uh uh two pools in a database right so the bottom left machine will lock widget three and the top right machine will lock customer two uh which which were indicated by the the first two lines of our tradition here right when we did a read of of w and wc that cause these two things to be locked okay fine uh and now the next step in our introduction to check the stock if it doesn’t work or not this is also fine we can uh we can do this entirely on the bottom left machine right check to see if stock is less than one you just have to have that kind of code can be run entirely on the bottom-up machine and uh and if it’s less than one it will abort and if it’s one or more then we want to board that’s fine however the next estimate is a little bit harder to run right so here uh we want to check to see is the store credit less than the price right uh if so either boards right the problem is that is that the ceter store credit that’s stored in the top right machine and w dot price is on the bottom left machine right so we have two different machines storing data relevant for this to run this if statement right so there has to be some communication across these nodes in order to be able to to run the sub statement right but so there’s two ways you can do this let’s just do one of the ways and we’ll just send we’ll have the bar left machine send the current price after locking the the um the data item to the top right machine and so now the top right machine can now do the whole if statement right now it has the price and it has a store credit so you can check to see is that is the price if it is a laborator otherwise it’ll continue right so uh so at this point um the ball left machine did another board because the stock wasn’t enough at the top right but she did not report because the store credit was enough so now we can go forward with it with the last two lines of the transaction we can subtract one from this document track and track the price of the credit right so here we are on the bottom left stock is now zero on the top right this token is now 21 like we had before right so these machines are these these kinds of code are done separately on each machine so far so good uh but now the problem is is that uh each machine does not know if the other machine aborted or not they don’t know if they were able to succeed with their uh their responsibility for this transaction so we had to generally run when we’re done doing all the work introduction we generally have to run a commit protocol to be able to communicate each machine with each other to explain uh sort of where they were able to do their part of the transaction if so we can commit otherwise and the most common protocol was called the two phase commit protocol and the way this works um it’s sort of like a two rounded communication uh to ensure a full durability and ethnicity where One machines initiates the protocols let’s say it’s a top right machine so they tell the other machine to prepare the protocol and then the other machine then responds back whether whether they were able to commit the transaction at least locally if so they can say they’re ready if not they’ll say they wanted to report and then um uh the top right machine will then gather responses from every machine involved here it’s just two just two machines and then cover the final decision here it’s commit uh and then we’ll indicate uh the bottom right machine will then say okay I got your commit message we’re done now and uh now this at this point the transaction May commit so uh so with two rounded Communication in addition to around the communication before that the commit protocol uh and the key thing to note here is that uh while this protocol is running we have the relevant interlocked right so the widget three and customer two was locked throughout this whole transaction we locked to the beginning we need to read um we kept it unlocked not only during the transaction itself but even during the medical after the transaction we kept it locked and uh and this could cause a stupid problem right so uh so if there are other transactions which conflict with uh customer two or widget three I.E they are they want to access either read or write customer two or widget three then they have to wait because these these records are locked the uh the other transaction will have to wait for their locks to be released and so when you have lots of conflicted transactions alternate access in the same data you hold lock for too long it’s going to cause a general throughput reduction in the system and so here we see him that this problem we never had a protocol in the old in a single single uh node system we saw before there was no communication there was no commit protocol so there was no holding of locks during communication right so all of a sudden by going by moving to a distributed system all of a sudden we end up holding locks during communication they’re not that slow and that can potentially cause a typical reduction okay but that’s just uh to commit one the transaction itself but what if we have dative application and this is you know almost every disresistible applicator because you want to have high availability so uh uh and you know the performance and lots of read performance at least and you want to have um you know uh durability obviously you know there’s been set up your data and so pretty much every distributed system will applicated across across different sets of replicas uh and uh uh and the problem is if we’re doing application uh in a consistent fashion such that each replica is guaranteed to see the same data at any point in time then you end up having to hold locks during replication roughly speak again the simplest exceptions will get there soon but uh roughly speaking you have to hold lockstream application right so uh so uh so so what happens is we first do the transaction of one replica first right so here on the left side of the screen we have uh we have the final result of this reduction we’re spoken as 21 and it’s like a zero after volume two phase commit right so we so basically this slide picks up where the previous slide left off right see first do the communication then we went to phase commit and now we are uh the left side of the of the system left replica is done with the transaction but before it commits now it wants to replicate the result of the transaction because the right Epicure still has all values right still has a customer kind of 100 customer too and in stock of one for widget three so it has all values still right so before we commit we’ll make sure that all replicas are agree with each other I’ll agree with each other before we tell before we release the lock so that way we can sort of tell the user once we commit we can tell the user that no matter what you read Evan will see that everyone will see the same value right whether it’s one applicator two either way everyone sees the same value right so how we do this right we send we while putting locks we replicate the data before we perform the the changes that replica one did on Africa 2. there we go now we have a change of 21.0 and then we respond back that we’re done uh and only that point can then come up because one now commit the transaction right so after two phase commit an after application only then can we release the locks right so uh so here we have another set of a period of time where the transactions cannot run and this is typically even worse right because typically speaking you’re running too big commit within a a region a single region right it’s one applicant is typically one region and so you want to create within that one region but now if you have a truly global system a global deployment and a running application across uh across Africa is now you have a major uh latency delay to do this application across geographical distances like for example things like you know around 200 milliseconds or took across the world and then back again there’s even more uh so it’s it’s very so latency wise and that’s the personal theorem but here we’re seeing also a throughput issue as well right because uh since we’re holding locks during application no conflicting transactions can run while within application so uh uh so if you have other transactions which want to access customer two or widget three they can’t run they’ll get delayed and uh and if you’re about to commit in transactions you’ll you’ll see an overall throughput reductions as you get Cloud the system gets clogged up with too many waiting transactions so to summarize uh uh we saw sort of two potential problems when when by switching to a single system into a single node system to a distributed system we saw two problems that come up right so number one we saw that uh um uh that the latency will will decrease it will increase we’ll have a it’ll take longer than the transaction but by going from a single node to multi-node deployment latency increases for two reasons right number one because we have to run to commit protocol to make sure we can commit and we need to create protocol if you want it for guaranteeing atomicity consistency and durability of acid uh then you need to quit you can’t uh um you can’t get away without it um so uh uh so therefore that takes time that cost you latency so also by holding locks during the protocol uh also costume people as well similarly by uh uh um a second problem that comes up is by replication right we saw that when you hold lockstream application you end up needing to increase latency as you wait for the definition to occur while having locks and since you’re holding locks then those can cause a different production as well right so both because of replication and because of the code protocol you end up having latency increase in throughput reduction right so uh so that’s you know that’s the problem and so this whole discussion was assuming we’re using a locking protocol and that’s fine you know uh um you know that’s you know that’s sort of one way was come up but uh the truth is even if you’re not using lucky protocol if you’re using optimistic protocol or Multiverse protocol it’s still the case that you cannot allow complete transactions to to run while we’re doing uh 250 and while we’re doing replication because if you do you may have end up having some major um uh correctness issues in terms of the application uh so so although the example we saw was locking this is also true optimistic protocols it’s also true for multimedual profiles as well okay so uh so that’s the case then uh then you know I’ve basically saying that uh that we have two problems right so we have both a transaction a a transaction duper problem and also a traditional agency Problem by having two basically have an application so uh so why is it the case then you know so that we the perceptive only talks about latency and not throughput right so it would seem let me just took a clear view there’s some theorem right so it says that uh when there’s a petition you have to trade up between a test atomicity and consistency but if there’s no petition or just a regular Baseline case just run conductions the way the way they’re supposed to run then either you guarantee contingency and every application you can see the same value which case you have to pay the latency to get their application to get all every data to every replica to agree or you pay consistency you you um you reduce consistency you get better identity that way but then now you’re up because I’m agreeing with each other right so that’s the position you could indicator elainism consistency through uh latency versus a consistency trade-off would seem that there should be a duper trade-off as well based on the arguments we’ve been making so far so uh so you know the truth is is that uh it’s not uh uh it’s not that the throughput trade-off is not as fundamental as latency through as a latency trade-off so uh and the reason is is that no matter what you do no matter what your workload looks like you will always have Atlanta trade-off right so uh so as long as you need to have every replica green on the Diaries of the state of the database system it just takes time to get that data across to all of replicas right so uh either you send all data to the same replica in which case then you have other issues um availability not in other things um or you have to spend the uh the the latency cost to make sure that all our applicants are agreeing with each other so uh so uh and this is true no matter what you’re accessing right so even if you have no conflicting transactions uh and every every time they’re actually touching something else still uh because we don’t know for sure that they won’t conflict in advance we have to still hold locks uh and perform an application and then not commit the production until their application is done right so once you’re doing to classification and you don’t commit to what application is done where those transactions or not you still have a limited cost however uh but it comes with throughput it’s not the case right so we only had all the super problems we saw was from holding locks or otherwise preventing capricular transactions from running while performing today’s commit and replication if there are documented transactions then uh we’re fine there actually is no super reduction right so it’s the only problem it’s just it’s just we’re only preventing conflicting protections running so if you have other transactions you can run which are not convicting then it’ll be fine with the pushback perspective so that’s why it’s one point the other point is that uh you can just not guarantee strong isolation right so um so uh so if um uh if you run transactions in such a way that it’s okay if uh we have two transactions at the same time which may sort of interfere with each other if your application is designed in a way as such that doesn’t matter um and therefore you can reduce the strongestation guarantee from the system then uh you don’t need to hold locks while you do application you want to unlocks when you want to do is commit and therefore again you have um uh you have node to portrayed off we furthermore our third Point um is that uh there’s a whole nother sort of set of systems called chemistry systems uh which uh uh which are well beyond the scope of this talk I think we’re running low on time at this point uh but uh uh but uh they work in such a way that they actually run two phase commit and application outside of transitional boundaries in other words they do the replication before International even begins so they don’t have to hold locks uh during replication and also they don’t they actually get rid of mostly get emotions commit um so basically uh ensure you know they are they they also get rid of of entirely even into academic transactions so ensure either there are no pressure transactions or we don’t care about some isolation or we need to determine system all if all those things then the temperature of is not isn’t there’s no type of problem and that’s why it’s not the same level as a lady trade-off so in practice I guess the main take a message here in practice usually there is a super trade-off right so most of the time uh uh if you want consistency you’re gonna pay throughput as well uh but uh but it’s not as fundamental therefore it’s not part of the photography it’s a little bit less of a uh of a of a fundamental trade-off so that’s the end of the talk uh and uh but yeah contact me any questions um either during the session or uh or afterwards um like contact officials on the slide um and uh um you know I’m very passionate about District systems and happy to stop people at them foreign [Applause]

Read More