Solving the Issue of Mysterious Database Benchmarking Results

Daniel Seybold19 minutes

Benchmarking tremendously helps to move forward the database industry and the database research community, especially since all database providers promise high performance and “unlimited” horizontal scalability. However, demonstrating these claims with comparable, transparent and reproducible database benchmarks is a methodological and technical challenge faced by every research paper, whitepaper, technical blog or customer benchmark. Moreover, running database benchmarks in the cloud adds unique challenges since differences in infrastructure across cloud providers makes apples to apples comparison even more difficult.

With benchANT, we address these challenges by providing a fully automated benchmarking platform that provides comprehensive data sets for ensuring full transparency and reproducibility of the benchmark results. We apply benchANT in a multi cloud context to benchmark ScyllaDB and other NoSQL databases using established open source benchmarks. These experiments demonstrate that unlike many competitors, ScyllaDB is able to keep up with its performance and scalability promises. The talks covers not only the in-depth discussion of the performance results and its impact on cloud TCO but also outlines how to specify fair and comparable benchmark scenarios and their execution. All discussed benchmarking data is released as open data on GitHub to ensure full transparency and reproducibility.

Share this

Video Slides

Video Transcript

Welcome to my talk solving the issues of mysterious database benchmarking results. Before diving into the technical details, here are a few words about myself. I’m Daniel, I have an academic background did my PhD in computer science at the Olympia University in Germany.

the research was basically focused around evaluating distributed systems especially distributed database systems that are operated in the cloud and after finishing my PhD together with my co-founders we started bench and benchmarking as a service platform for database systems where I’m responsible for the product development and obviously I can pursue my interest in benchmarking database systems having said that today’s talk is centered around yeah database performance and how you can assure to achieve reliable and reproducible and transparent performance measurements and in order to address that I will briefly go over first the question is database benchmarking still relevant these days second highlighting some challenges of reliable database benchmarks especially in the cloud in the third part I’m going to do a short demo based on our open database performance ranking before concluding with some takeaways so let’s start with the first question well is database benchmarking still relevant and well we’re all here as we know a lot about databases since we’re interested in database systems especially high performance database systems and there was also yeah tremendous change over the last decade when it comes to the database landscape first of all the idea of one size fits all is over we have still the relational systems but we have the nosql we have the new SQL ones and well they promise you to provide the perfect solution for your data intensive application second these database systems are nowadays preferably operated in the cloud and not on physical resources anymore and together with these two yeah tremendous changes um there is also a third fact that is now well or reached the main mainstream database as a service and especially serverless database as a service is already out there provided by a lot of database vendors but also Cloud providers and this might be the future of the database world and with these yeah significant changes um there are or there come a lot of options for selecting the right database technology for your data intensive use case and well all the providers when it comes to Performance promise you a similar thing they promise you high performance unparalleled performance and so on but as a user you actually might want to to verify these claims um and in order to do so well you need some data to verify these claims and where can you get these data from well let’s have a look so first of all you will find a lot of uh information about the functional features by just reading the docs of the respective database Technologies when it comes to the costs for operating them in the cloud you will have to do some data mining due to heterogeneous pricing models of different Cloud vendors but also database vendors but the interesting part actually the performance and the scalability of the system well there it gets tricky um you could of course ask Google or ask chat GPT nowadays um if or what kind of performance the respective system is going to provide and you will find well blog posts white papers or even scientific papers about that but there are always problems with that so they might be already outdated due to the rapidly evolving technology domains um they might be biased or when they’re driven but all of them have um some yeah thing in common so most likely they won’t provide you all the data to fully understand the performance result or even to reproduce the performance results this is why yeah benchmarking is still a challenge but you can also get a lot of insights out of it if it’s done right um and that’s also something that the academic perspective recently highlighted by the report on database research well highlighted again that database benchmarking has been around a long time and it helped the industry but also the database research community

um it also highlights that benchmarking in the cloud adds another layer of of complexity and makes it even harder to provide or to enable Fair comparisons across different Technologies

there is also an industry view on benchmarking database systems basically all the big players on the cloud Market or on the database Market highlight the need for running benchmarks by well by yourselves in order to have an application specific view on the the data that you can get out of that and before uh doing any yeah any decision when it comes to selecting a database systems you should thoroughly measure everything with respect to database Technologies but also respect to the available Cloud offers and the same approach goes when yeah performance engineering companies such as per Kona address Performance problems well you should do it in a systematic way in it just ask Google and with that we basically can already answer that database benchmarking nowadays is still relevant might maybe even more relevant than it was before since the cloud yeah adds another layer of complexity and you want to get in the in-depth measurements to understand the performance impact and with the benchmarks you cannot only compare different database Technologies you can do a lot more so for example comparing the cloud provider performance doing a lot of load and stress testing with respect to scalability and elasticity that’s an important aspect for serverless dbas offers you can also um do application specific database optimizations by tuning the configurations and measuring the impact keeping track of new releases new Cloud research offers and this always offers the the option to lower your TCO by running benchmarks in order to find a more cost efficient solution

but this running these benchmarks in the cloud comes also with some challenges in order to highlight them I will briefly go over the typical process when you run benchmarks in the cloud so they’re basically four steps allocating the resources deploying configuring the cluster deploying executing the Benchmark and processing the objectives but each of these steps involves a lot of domain knowledge for the respective step so for the cloud we have now a plethora of cloud resource offers so example for over 500 even on ec2 not counting all the other publicly available um Cloud offers when it comes to the databases there are over 100 850 database systems available thanks to their database of databases provided by the Carnegie Mellon University keeping track of this involvement nowadays we also have already over 170 dbas offers so examp for example right now there are over 10 dbos offers providing um and service for Cassandra query language compatible services

when it comes to the database Benchmark ex actually there are also different Benchmark suits available these Benchmark suits will allow you to carry out um several workloads from iot based to web-based to analytical workloads and when it comes to the objective so usually you evaluate performance or performance versus cost but also scalability and availability so doing that all manually is very time consuming very very error prone and since this whole process is not yet just one shot it’s an iterative process the first thing first thing you need is automation because yeah without an automated benchmarking process you won’t be able to run large-scale studies or keeping track with the rapidly evolving Technologies but um good thing is that over especially over the last two to three years there were several approaches towards automating the Benchmark process for database systems in the cloud so for example there is the the great tooling by ScyllaDB there are also approaches by mongodb that are published in different scientific conferences there is another scientific approach the flexi bench for running benchmarks of different database systems and kubernetes there’s the Mowgli framework that was the the project that I was working back in my research days this is an open source tool for running benchmarks for databases in a multi-cloud context and it’s also the foundation of benchend that builds on top of these Concepts and provides a benchmarking as a service platform for database systems but with the automation it’s not done sort of so to say because um with the automation of the process you get a deterministic execution which is already good thing but in order to have it fully transparent and fully reproducible you will need a lot more data here I just put four example questions that usually pop up when we look at Benchmark results so there is a lot more data that needs to be covered um in particular for each step so for the cloud level you need to collect also cloud provider and the resource metadata on the database level which configurations were applied and also how was the system utilization in order to avoid some bottlenecks imposed by The Benchmark setup on the workload level which fine-grained workload configurations were applied for example data set size query distribution patterns and so on and for the results of course you should not only provide the aggregated results but also raw measurements and in the time series based manner and all of this needs to be compiled in a comprehensive data set to ensure or that ensures then a transparent and reproducible results

and this is also something that we had mentioned this concept um yeah is followed by our by our platform and in the following I will briefly show some as a demo our open database performance ranking that contains a lot of performance data um and I’m going to demonstrate how we achieve to ensure transparent and also reproducible results with it but before jumping into these performance numbers just a short disclaimer the ranking is not intended to make your final database decision upon it’s just a very very Baseline for starting application specific benchmarks and there are also some other aspects that are not covered yet especially all the the workloads are based on synthetic Benchmark tools with a rather small data set size

feel free to play around by yourself with the ranking or have a look at it it’s completely open source available on our website and I’m always happy to answer any questions that pop up regarding the ranking so just drop in email and with that I will briefly jump over to the ranking um so what we did is we measured the performance of different database systems on different Cloud providers on three workload types you can find all the details below I will just show now the results for ScyllaDB where we had different scaling sizes ranging from one node to nine node and what you can get out of the ranking is so for example you get the the throughput the average throughput read latencies write latencies also the monthly costs to operate that cluster on the respective cloud provider and a throughput per cost ratio you will also get some more details what kind of version here we haven’t included the latest 5.1 yet it’s up to to be released you can find some details on the the cloud um and yeah so from the smallest scaling size to the largest we also see that we have a great scalability of solar here and yeah just a few more insights that you can get is so if we add some other node SQL and new SQL systems here um then you’re going to see that well Scala outperforms them all on AWS uh for this specific workload with this specific configurations so Cassandra couchbase and even so cockroach is way more down so this is one of the insights that you can get out of the ranking um second one is you can also use it to compare the performance of different Cloud providers so here for example I filtered it that Cassandra same scaling size but operated on different Cloud providers and yeah here we see in that case that Alibaba it’s they’re pretty similar all but in that case Alibaba did a good job um compared to AWS and the others it will also spot some um some results where for example Azure provides really low performance and yeah but feel free to play with it by yourself interesting part is also here that all the data is available on GitHub and here we not only provide the performance numbers but all the applied configurations so all the things that can be yeah played with during The Benchmark execution we have as well monitoring data we have the cloud provider metadata the virtual machine meter data so you can see which operating system which kernel version you get a lot of database metadata so configuration files cluster State and so on and we can also say that with that data well you can use it via our platform to reproduce the results or you can um rebuild the Benchmark process by yourself by that data and reproduce the results had already also be done by some database Windows which we’re not happy about the results and they did it by themselves and okay then they had to agree that with that data they were able to get the same results

so switching now back to the slides so with that data um we actually able to provide you reproducible and transparent Benchmark results and in summary

we can say that database Benchmark can support a broad range of use cases from cloud resource selection database tuning database comparisons that running database benchmarks in the cloud in the first place it’s easy but ensuring transparency and reproducibility is hard and a step towards that is to have automated Benchmark executions by using a supportive framework or tooling and in addition the results need also to include comprehensive metadata to ensure the reproducibility so I hope I got you interested in yeah what is required to ensure transparent Benchmark results and I also hope to yeah to get you to know what you need to look for when the next time a white paper or blog post comes up about some database performance results thanks a lot for attending my talk [Applause]

Read More