ScyllaDB Cloud: Simplifying Deployment to the Public Cloud

21 minutes

In This NoSQL Presentation

ScyllaDB Cloud is ScyllaDB's managed database-as-a-service (DBaaS). Available on AWS and Google Cloud, find out how you can run a fast, performant, managed NoSQL database that can keep up with your company's growth.

Tzach Livyatan, VP of Product, ScyllaDB

Tzach has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signaling, policy and charging applications.

Video Transcript

So, hi, everyone, and welcome to a session in ScyllaDB Summit 2022, ScyllaDB Cloud: Simplifying Deployment to the Public Cloud, or as I would like to call it, ScyllaDB Cloud, the best database-as-a-service out there for Big Data, and I will try to prove my bold claim in the next 15 minutes or so. I’m Tzach Livyatan, VP Product of ScyllaDB, and I want to thank Adar, the cloud product manager, for helping me prepare and create this deck.

Adar was supposed to present this deck, was a little bit under the weather, so I’m taking his place. So let’s get started. I already presented myself, so let’s jump directly to the topic. So what is ScyllaDB Cloud? So I’m assuming most of you are familiar with ScyllaDB, both the open-source and enterprise version of it. It’s by now a well-known NoSQL database. ScyllaDB Cloud is a database, is a service offering of the same ScyllaDB, or if you want the cloud offering of this database. That mean that it’s a fully managed database on the cloud. It’s hosted on both AWS and Google Cloud, and by fully manage I mean that it’s completely supported by the ScyllaDB Cloud team. I’m talking about 24-7 monitoring, pro active support, rapid response, everything that come with it. A bit of information I got from our VP solution, that our average response time to critical issue is less than 8 minutes, so our team is really proactive in a sense that it’s not only responding fast to tickets. It will also find about most of the issue before you do, and that’s one of the advantage of using the expert on the database as, basically as your DBA. ScyllaDB is a highly available database, and that’s included for ScyllaDB Cloud as well. You can run it on multi data center, which usually mean multiregion on AWS or GCP. All deployment by default are multizone, and the database-as-a-service exists for more than 3 years, and by now we have hundreds of production clusters while managing 4 petabyte of data, and the numbers just keep growing and growing. And before I jump to the ScyllaDB Cloud’s benefit, I want to try quickly, give you a very, very quick demo of what ScyllaDB Cloud means, so ScyllaDB Cloud, the UX is not very fancy. You can see your active cluster, or you could launch a new cluster. As you can see here, I can choose AWS or Google, and I can choose the region, and I can choose the instance type. We are supporting on AWS MN instance type. Here it is, so usually from the i3 families, the i3 to i3 end but also T3 micro, a very low-cost instance not usually used for evaluation or function evaluation. If you want to move to production, you need more than that, and you can choose the number of nodes, starting from three nodes. As we will see in a second, three nodes can already support huge amount of transaction per second or request per second, but you can actually start a much bigger cluster if you want to. I won’t go in and do it now because it’s actually coming out of my personal credit card, so it might be a little bit expensive, but if you choose a small instance with not as much number of node, the number or lower, so here in my account, my personal account, I have two clusters right now. One of them is evaluation cluster, as I mentioned earlier, running on T3 micro. The second one is production cluster. Only three nodes, but it’s a much bigger node. It’s i360, an extra large, and as you can see, this cluster support over 800 requests person second. The cluster is not fully loaded. It’s only 60 percent load, probably because I didn’t put enough loader machine to put traffic on it, and let me switch to the Grafana dashboard. For those of you using ScyllaDB Open Source or Enterprise, it’s the same or very close monitoring stacks to what you are used to from ScyllaDB Enterprise. You can see here, for example. This is the overview dashboard. We have many dashboard. As you can .. . Let me .. . Okay. So you can see here the average number of writes, average number of reads, and let’s see the last 3 hours, for example. Let’s refresh it. Otherwise the scale will get distorted because it’s only show the change. Okay. This is more like you see, over 400 write, over 400 reads. I want to mention the latency because I want to touch a bit later. The tail latency of ScyllaDB is very, very low, and that apply for ScyllaDB Cloud as well. So you can see here, the 99 percent latency of write is a millisecond. Of read it’s 3 millisecond. I can assure you. That’s the best latency you will get on any database-as-a-service out there, and don’t take my word for it. I encourage you to go ahead and test it, so that’s the overview dashboard. We have many, many other dashboard. I’m not going to deep dive into the monitor and stack right now, but just for your information, we have OS-level metrics and SQL-level metrics, DynamoDB API metrics, more and more dashboard. All the dashboard are available here, but I want to go back to the ScyllaDB Cloud’s benefit of value and why you should consider ScyllaDB Cloud for your next project. So I broke the advantages to four on a very high level, and the first one and maybe the most important is price to performance. You’ve already seen in this slide’s demo a three-node cluster that can run well over 800 requests per second. For those of you who can calculate quickly in your head, you can already evaluate the price compared to other database-as-a-service like DynamoDB or other. It’s much, much lower, much lower, and I will show it in a graph later. I wrote here a fifth of the cost. Of course, it’s vary. It can be lower. It can be higher than that, and so that’s one point. A second point is predictable low latency. I will touch on that later, fully managed service, and I already explained what it mean, but it also mean that we are doing all the operation like repair, backups, everything that come with it automatically and developer-friendly, and we’re going to touch on each of these four points in the next few minutes. So let’s start with a price to performance ratio, so we have a tool called the pricing calculator, which we capture the current state of ScyllaDB Cloud pricing as well as other cloud pricing, and you can interactively compare different use cases and see how much it would cost you on each of the cloud. In this case, the use cases I chose is half a million requests per second for read, half a million requests per write of the data set, a small data set of 20 terabyte. Of course, all the prices and the benchmark are a point in time, and the cloud prices do change, not in ScyllaDB, but on other cloud they do change the price from time to time, and the benchmark result change depending on the use case, so I really encourage you not to take my word for the performance end-pricing and measure it for yourself. You can have even on ScyllaDB Cloud or specific database, you can have big changes between use case just by increasing the data size or the average item size, for example, from a kilobyte to megabyte. Of course, all the number would change, so I really encourage you to test with your own use case. I choose this specific use case, and this is the result I got for on-demand monthly cost for ScyllaDB Cloud compared to DynamoDB, Astra, AWS Keyspaces, which is, by the way, built on DynamoDB, and the latest offering from MongoDB, Atlas serverless option. I took this option from Mongo because they provide per operation costs, which was easier for me to use, so it’s very easy to say that ScyllaDB Cloud, at least for this specific use case, is much, much cheaper than the other offering. Other offering is way more expensive. Again, might not be the case in all cases. For example, from the top of my head, I can tell you that if you have only few requests per second, DynamoDB probably will be less expensive than ScyllaDB, but if you have thousand and millions of requests per second, the difference become bigger and bigger. In this case, it’s pretty huge. Latency, I already mentioned latency is super-important for us and for our customer, so this is from a benchmark that we did comparing ScyllaDB and DynamoDB in different use case. DynamoDB, this might be a little bit misleading because DynamoDB usually behave better than that, but still I can absolutely guarantee you that ScyllaDB latency would be much better than DynamoDB. Both are p95 and p99, and I have here the link for the benchmark if you want to deep-dive into these results. So we talked about performance and cost structure. We talked about latency. Now I want to mention a little bit about security and the way that ScyllaDB Cloud is architected. So the way that ScyllaDB Cloud works, and this might be the same or different from you other database-as-a-service that on ScyllaDB Cloud, each cluster or each database, if you will, get dedicated VM on the cloud, either AWS or GCP, and dedicated VM, dedicated storage, dedicated monitoring and admin stack. Everything is isolated. Everything is dedicated, and this of course bring you great performance but also very good security because nothing is shared between the customers, so each customer have its own isolated environment, and of course it’s much easier for us as administrators to secure this kind of deployment. We also have SOC2 Type 2 certificate for ScyllaDB Cloud, and we are just in the last day of approving our ISO certificate as well, which was quite an effort. ScyllaDB Cloud is very flexible and friendly. You already see that you can very easily run a development cluster, a test cluster or deploy cluster. This diagram shows three nodes, but our customer can actually run as many nodes as they want. Usually they don’t need a huge amount of node just because ScyllaDB is so efficient with big machines and the number of transaction it can support per node, but we do have customers that run, for example, in multiple data centers origin with more than 50 ScyllaDB nodes across the globe, so we’re very flexible deployment and developer-friendly. Why is it developer-friendly? First because it’s compatible with two big or huge ecosystem in the database domain. One is Apache Cassandra, and the second one is DynamoDB, and when I say compatible, I mean compatible at the protocol level. That mean that every driver and every tool that work with either Cassandra or DynamoDB will work seamlessly with ScyllaDB. Also maybe unlike other offering, we’re fully compatible with the enterprise and open-source ScyllaDB in a way that we didn’t put limitation on the way that you can use the cluster, so you can use multiple keyspace, multiple users, as many tables, materials, view keys as you want to. Of course, each of the decision might have performance result, but we are not hard-limiting you for a specific number of, for example, tables or keyspaces, and this mean that you can use the power of ScyllaDB, and as you can see, as you saw, ScyllaDB have a lot of power. You can use it to provision more than one keyspace or more than one use case to the same cluster, and that might mean that when you have one type of use case done, the other type of use case will take advantage of that. And together with features that we call service-level priority, this is super useful. I already mentioned driver compatibility. You also have tools like CQL tracing, the CQL dashboard and the monitoring that I showed earlier. Another tool, I do want to mention one specific tool for those of you who are looking to migrate to ScyllaDB Cloud, and you run either ScyllaDB, Cassandra or DynamoDB, and you want to migrate to ScyllaDB. There is a tool called ScyllaDB Migrator built on Spark, which will make it very easy for you to migrate either from ScyllaDB on-prem or at the databases. So if you’re interested, take a look and let me know if you have any question. Battery-included monitoring stack, I already actually demoed, the monitoring stack, so I’m not going to spend too much time on it. I do want to mention that although we are offering this stack, you are not married to it in a sense that you can use your own monitoring stack. There is a checkbox when you start a ScyllaDB cluster on the cloud if you want to export the metrics or not. If you choose to explore the metrics, you can integrate with external tools like Datadog or other monitoring stack, and by that you can consolidate more than one cybersecurity to one monitoring dashboard, maybe your own data pipe, and you can also set up your own alerts maybe to your own PagerDuty and such, so this is a popular option. Keep in mind that if you do work with a external monitoring system doesn’t mean you cannot keep working with the Grafana dashboard. You can use both. It doesn’t cost you anything to continue using both systems, not in performance and not in actual cost. Bring your own account, bring your own account is a very popular offering for AWS users. By bring your own account, we mean that .. . Let me just show the diagram. I think it’s clearer, so with bring your own account, you are running the ScyllaDB cluster on your own AWS account. The VM, the backups, everything run on your account, but it’s still fully managed by the ScyllaDB Cloud automation and the ScyllaDB Cloud team. That mean that we are giving you minimal access just to connect to the database over SQL or REST API for DynamoDB compatible. We are not really allowing you to do SSH into the machine or anything like that. That will make it very hard for us to support, so you have limited access to the database on just what you need, and we at ScyllaDB Cloud have very limited access to your account to operate the nodes. We are asking for minimal permission, and the end result is that the cluster is then fully managed by us. We will outscale, downscale, do upgrades, whatever it takes but not more than that. So the data never leave your account and your region, including for backups or anything like that. None of your data actually exposed to the ScyllaDB back end, so everything stay on your account, and a lot of customer like this option. One reason is that the security and the responsibilities, as I mentioned earlier. The other one that you pay for, the VM directly to AWS, on some cases, customer prefer to do it this way, so bring your own account, very popular. Before we wrap up, I want to mention just two use cases of ScyllaDB Cloud. A customer already mentioned that ScyllaDB Cloud is a production environment for the last 3 years, and we have hundreds of cluster reproduction. One of our biggest, an important customer, is iFood in Brazil. It’s a food delivery system. They move to ScyllaDB from DynamoDB. They actually have a longer journey from SQL to NoSQL. You can read more about it in the blog post. I have the link here, and by doing that, they save a lot of money and are able to run in a multiregion environment, so if you want to read more about them and other customer, look at our blog post. Another relatively old customer of ScyllaDB Cloud is GumGum, a company that deploys ScyllaDB cluster on a multiregion environment and by that reduce the number of servers that they are using significantly, and of course the cost that associates with that. So just a quick wrap-ups of what we’ve seen so far, we’ve seen four advantages, a big advantage of ScyllaDB Cloud. The first and the most important probably is the performance per cost ratio, which is the best in ScyllaDB Cloud compared to other similar databases. The second one was a support, and then the user-friendly and the latency. All of them I tried to prove for all of these four category that ScyllaDB is superior to other databases, but the work of course is never done, and we actually have huge plan for ScyllaDB Cloud for 2022. So we are planning to enhance usability and observability, integrate with more system. We are planning to add the long-waited cloud API that will allow you to control your cluster in a programmic, secure way. We are adding a lot of feature to ScyllaDB itself like the Raft consensus algorithm, and this will allow our cloud offering to become more elastic in a sense that we can outscale and downscale much faster than what we can do right now, and we are adding more region and more zones all the time close to the rate that the cloud provider adding them. Last thing I want to mention, that we are expanding to new machine types, and we have another session in this summit with Ken Krupa from AWS presenting the latest and greatest EC2 instances, in particular the i4 instances. We got lucky, and AWS was kind enough to share, approve your instance of i4, so we can test with it, and our preliminary test already showing that we can run twice the throughput on i4 compared to i3, so you can imagine that this automatically can translate to half the machine and half the price of what you’re paying right now for the same throughput, so this is huge, and we are very excited about getting these i4 machine and adding them to ScyllaDB Cloud once AWS will make them publicly available. And if you’re interested to learn more about i4 and the new Arm instances, please check out the joint session with Ken Krupa, very interesting stuff. And also thanks to Michal for helping me benchmark .. . Not helping me, actually doing all the work in benchmarking the new AWS instances, and with that, I want to thank you guys. I put here a few useful resources for ScyllaDB Cloud. You can use the free trial option if you want to jump on the wagon and just try testing with ScyllaDB Cloud with no cost. Read more about ScyllaDB Cloud in our documentation, see the benchmark or do the university class or lesson about ScyllaDB Cloud, and please let me know if you have any question, so thank you very much.

Read More