Enter your email to watch this video and access the slide deck from the ScyllaDB Summit 2022 livestream. You’ll also get access to all available recordings and slides.
Ken leads the Specialized SA organization for Compute for the Americas at Amazon Web Services (AWS). He is part of a group of talented specialists within AWS’s Compute organization who help customers (as well as fellow Amazonians) get the most benefit out of the products within the AWS Compute portfolio.
Tzach has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signaling, policy and charging applications.
Hi, everyone, and welcome to our session in the Scylla Summit, “New AWS Instances” and why they are perfect for ScyllaDB. I’m Tzach Livyatan, VP Product for ScyllaDB and with me presenting today, Ken Krupa, Head of Specialized SA Compute at AWS. So we’ll let Ken present himself later. I am a VP Product at ScyllaDB. Before that, I was in product management at Oracle and other places, but I’m focusing on NoSQL events ScyllaDB these days. So let’s get started.
The first chapter of this presentation will be the state of ScyllaDB on AWS. Later we’re going to switch to Ken to present the latest and greatest AWS instances, the I4 series. After that, I will take control again and show some results that we got with early benchmark of ScyllaDB on I4. Spoiler alert: very good result, and last, we’re going to summarize and maybe give a little bit of our future plan on these instances and AWS in general. So let’s get started with the state of ScyllaDB on AWS. So ScyllaDB have three variants, if you will. The first is open-source. You can browse the code in GitHub, even compile it if you are into this kind of thing. Second is Scylla Enterprise, which is a closed-source version of Scylla with some extra features, some around performance, some around security. And the last is Scylla Cloud, and I’m going to spend some time on it later. You can actually run Scylla on-prem, on your machine at home or whatever. You can run Scylla .. . That’s the most popular option these day .. . run Scylla on the cloud, and AWS is the leading cloud for Scylla these days, or you can let us manage your deployment with Scylla Cloud. Scylla Cloud run on AWS itself, and you can run it in two modes. One mode is running Scylla Cloud on our account and the second, which is very popular, run it on your account on AWS, so you actually manage or pay directly to AWS for the hardware, but we still manage the cluster for you. In addition to running VM of Scylla on AWS, it can also actually run Scylla Docker and Amazon EKS, and that’s relatively new option from Scylla. If you’re interested, check out Scylla Operator for Kubernetes, for example, on how to run Scylla on EKS. Scylla is also available or in certify on AWS Outposts. For those of you not familiar, AWS Outposts is like a mini region or mini zone, to be more precise, of AWS region that you can host on your site with AWS hardware, and Scylla will certify this environment. This is very useful if you used to work with DynamoDB because AWS is compatible .. . sorry, Scylla is compatible both with DynamoDB and Apache Cassandra, and if you are working with Dynamo, you can continue working with Scylla, Dynamo API on Outposts. So that’s pretty exciting, and that’s getting momentum. One of the customers that’s actually using the compatibility with DynamoDB is GE Health care, and this API allow them to run both with Amazon DynamoDB and on-prem with Scylla. So with Scylla and AWS, you can actually do what is now called hybrid cloud, which is becoming very popular. As you know, EC2 and AWS have many, many, probably hundreds type of instances and instance family, and you probably asking yourself, “What is the best instance type to run Scylla?” So Scylla is a database which focus on performance at low latency, and our test show that you get the best combination of RAM, storage and CPU, which is I3 and I3en family, and this is the go-to instance for a AWS user, and this is what we are providing on Scylla Cloud, but that does not mean that you are only limited to these instances. If you want to run and manage Scylla on your own, you can choose whatever EC2 instance that fit your need. You can run, for example, Scylla with EBS, which are .. . but you probably won’t get the same high-performance number that you get on inferior SSDs, and that’s double with the new I4 instances, and that’s probably a good point to switch control to Ken to present the new instances type.
Thanks, Tzach. Appreciate that. Thanks for the introduction, and, yeah, today I’m going to talk about AWS EC2 innovation, and I call it beyond the box, and hopefully it’ll become clear to you what I mean by that as I go through the slides. We’re going to talk specifically about the instances that Tzach alluded to earlier, but first I wanted to talk about some other instances that are near and dear to me. These are some of my museum pieces, and that is my very first computer on the left, Commodore 64. They say you never forget your first love, and here’s mine, and then next to that is actually one of the very first portable computers. That’s about the size of an iPhone ,and I had that in college, and it’s a Tandy portable computer that runs BASIC and scientific calculator, and that thing actually still works, and I show it there on the screen with it turned on, and it’s next to a flash drive just to give you a perspective of its size. So who says that something from the late 1980s couldn’t be small and cool from a computing perspective? And then next to that I’ve got a big rig, circa 2000, I believe, one of the back in the days when you built your own rigs. I put it next to a chair, and these are things that my wife says I have to get rid of, but I won’t let go for whatever reason. And so about me, you already know what my first computer was. I’m a native New Yorker, living in New Jersey now, and I have a comp sci background, and I’ve done tech ranging from, in the early days towards the end of when mainframes were going away, I actually did do some work on mainframes, all the way from mainframes to the cloud and just about everything in between. Done a lot of app dev in my background, worked on databases quite a bit. In fact, you look at my profile, my last job was working with a database company and then a good amount of financial services just by geography but not exclusively. So I joined AWS in August of 2020. It has been great fun since I’ve joined. Okay. With that out of the way, let’s talk about what we’re going to talk about today, and these are the AWS EC2 instances, and I’m going to just go over how we innovate around EC2, just to give a baseline of how we got to where we are today, and then I’m going to delve into Nitro and Nitro SSD, and this is important. It’s an important differentiator not just for EC2 instances in general but for I/O optimized instances, which we’re going to be seeing benchmarks more from Tzach a little bit later, and then I’ll talk specifically about the EC2 storage optimized instances, go into AWS silicon more broadly and then a little deep into the Graviton processors, not too deep because we don’t have all the time in the world, and then I’ll wrap things up and turn things back over to Tzach. Okay. With that, let’s go. So when we think of CPUs in the very general sense, we think of x86, right, so Intel and AMD, and as you know, at AWS, we support those as well as our own processors. So there are more types of processors in a .. . when you think of what processors mean more broadly, but by and large, when you look across EC2, you’re going to see Intel, AMD and Graviton processors. I like to take a walk down memory lane from the perspective of AWS. You think of our first instance in 2006, the M1 that everybody knows and loves. We’ve come a long way since then, as you can see by these numbers, and though, as I mentioned, we talk about CPUs in the traditional sense, we go beyond CPUs. We’ve got GPUs, FPGAs, accelerators and a multitude of instances based on all these combinations. So you look at some of the big audacious numbers in 2021 of the capability of some of these compute instances and then the sheer number of instances that we have today, when you consider the categories, the capabilities, the instances sizes, and the options, it nets out to a little over 475. Now, you may ask, “Well, why do we do this?” and I’ll quote what Werner said at re:Invent just this past year when he was addressing the customers in the audience. He says, “You asked for this. It’s your fault!” so, and what he does mean and what we mean by that is this is to address customer needs, and so we keep providing what our customers want, and you all represent our road map. An interesting thing to talk about when we think of the history of EC2 instances, I mentioned the M1 goes back to 2006. However, I want to take a look at instance types from 2010 and onward and particular moments in our history and, in practice, the introduction of the Nitro System in 2017. Now, Nitro didn’t just improve security and performance but really our ability to innovate. That’s as evidenced by the number of instance types that we have, and if you look at this graph, we’ve accelerated quite a bit. So let’s go into AWS Nitro a little bit and what is behind that. Now, Nitro cards are .. . I’m going to talk about Nitro cards. I’m going to talk about the security chip and the Hypervisor, but Nitro cards are things that offload functions to their own hardware. We’re familiar with this concept of hardware acceleration. If you’ve got a function that’s done in software and if you can abstract it and put it into hardware, things can go better. So our Nitro cards support things like I/O acceleration, VPCs, monitoring and security, and then the security chips themselves not only offload the security aspects, but they reduce the attack surface. That’s very important from a security perspective, and then you have the Hypervisor, which is a Nitro .. . a Hypervisor specifically made for the Nitro System, and that’s what kind of puts things all together, and it gives our virtualized instances bare-metal-like performance, and I hover on that a little bit because it’s not something that is typical of large-scale data centers. It’s something that we were able to do through our customers’ needs and through our ability to innovate at scale. And to get an idea of how the Nitro System is an innovation, let’s look at this picture to see how things look conceptually, and if you look at classic virtualization, everything was inside the box. Right? So you think of a compute, you think of piece of compute that you rack and stack or something that you buy even for your desktop. You would download a hypervisor. You’d run it, and there you would have the ability to carve up that machine and put multiple instances on it, and that in and of itself is an innovation, but because these closed boxes themselves have multiple abstraction layers inside them which allow for this innovation, the hypervisor itself, what we’re able to do is to say, “Well, you know what? If we could still meet the contract of these API layers, it doesn’t matter that the OS is talking to directly to hardware or not.” What it means for us is that we didn’t have to stop a classic virtualization from an innovation perspective and have everything inside the box, and so what we did here is with Nitro, we’re able to offload these things that are typically done inside of the hypervisor into their own hardware. Okay? And that’s what allows us to do things at that .. . from a security perspective, reduce the trusted computing base, right, because we .. . It sounds almost opposite of what we mean, right, increase the trusted computing base, but when you decrease the trusted computing base, you minimize the attack surface. Okay, so that’s an important thing to do, but more importantly from a performance perspective, all the networking, the I/O and storage is separate, isolated and secure. And this capability that Nitro offers is not just limited to virtual machines. It’s even something that can be used with our bare-metal instances in the cloud. Okay, so this is a very important innovation that’s been in play since 2010, and it doesn’t just stop for our new instances. It’s also being made available to previous generation instances, and so other cloud providers may be deprecating instances and forcing them to migrate. We have lots of customers who are happy with the instances they’ve been using for the last 15 years, and they want to continue to run that and, at the appropriate time, upgrade. So what we’ve done is we have announced the ability to benefit from a Nitro architecture even for some of the older instances. All right. So a discussion of Nitro would not be complete without the Nitro SSDs. So SSDs themselves are silicon, and we can even think of them as mini boxes in this term that I use in their own right, right? Some people can refer to them as their own little databases to try to extract what’s going on in the flash storage so that they look like traditional storage or spinning disk, at least to the OS and to other components of the system. And because they have their own extraction layers, like in the case of SSD’s flash transaction layers and so forth, but without getting into the details of these things, these FTLs or NVMe for which I’m not even the best person to talk about anyway, it’s the notion of taking the opportunity to rewrite the implementations, leaving the necessary API abstractions in place that allow us to make significant improvements for both the context of not only I/O-intensive applications but within the context of the cloud itself, and that’s what we’ve done with Nitro SSD, so given the fact that it’s tightly integrated with our Nitro System, you can see on these slides the lower I/O latency compared to commercial SSDs and the ability to do quicker firmware upgrades and other things from an operational perspective as well as the features that you would expect from a secure platform, which in this case all data is stored on the disk. All data stored on the disk is encrypted at rest with AES-256-bit ephemeral keys. Okay, so now that we know about these Nitro instances, right, so we have Nitro, which offloads things, gives bare-metal performance to virtualized instances in the cloud. We have the Nitro SSDs, which take advantage of the Nitro System for optimal performance and security. What happens when we add the Nitro SSDs to specific EC2 instances around storage and I/O? Well, we’ll start with the EC2 I4i instances, and Tzach is going to talk about some of their benchmarks, and so these are our fourth-generation I-based instances, powered by .. . This might be a little confusing on the slide .. . third-generation Intel Xeon, so these are the Ice-Lake-based processors, and these take advantage of Nitro SSDs, and so, not surprisingly, they’re ideal for SQL databases, NoSQL databases and data analytics. And then moving on to our own silicon, our two Graviton-based two instances in the I/O optimized family where one flavor is more ideal for traditional data workloads or what we think of these days as the expanded view of traditional data workloads, SQL, NoSQL, data analytics with the other one, the one on the right, the Is4gen indexed more towards stream processing, real-time databases and log analysis. What we’re going to hear about is the Im4gn a little bit later, and these instance types, not surprisingly, use the Nitro SSDs, and we’re going to see how they behave. All right. So I mentioned Graviton from the perspective of it’s a chip set that we offer, from the perspective of these two instances that I mentioned. So I want to go into Graviton a little bit more since it’s relatively newer when compared to other CPU families, right, compared to x86, and so I’ll start by revisiting what may now be a somewhat rhetorical question of, why do we build our own chips? And as we saw with Nitro, we’ve been able to accelerate our innovation. Going hand in hand with that is the ability to specialize on behalf of our customers for when the quote, unquote, “off-the-shelf” components don’t quite meet the needs of customers. We’re perfectly happy to use generally available components, as we do in many of our offerings, but sometimes they don’t exactly provide what our customers need, and that’s when we need to specialize. And then speed, as I mentioned on this slide, is within the context of not just performance but speed to deliver. All right? As evidenced by that slide earlier that I showed where we’re able to .. . the pace of innovation goes up itself significantly, and then finally, at cloud-scale, we can improve our insight into the operations of a hardware that we run in the cloud. So our custom silicon provides some of those capabilities, and some of the silicon innovation at AWS is the aforementioned Nitro System and Nitro SSDs, the Graviton processors, and just for completeness, I put some of our machine learning chips here. We’re not going to cover those today, but I did want to mention that on this particular slide, that we innovate in this particular space for machine learning hardware and software at scale, but we’re going to cover Nitro and Graviton together. So the Graviton2 processor, this was first announced in 2019 by AWS, and ARM processors have been around for a while. We’re familiar with them in our mobile phones and if you do things with Raspberry Pis. I do, and others might, too. These are ARM processors, and so they power phones, tablets and other cool devices, but now they’re starting to show up in laptops and servers, and so our Graviton2 servers are an example. So Graviton2 is our second generation, not surprisingly, and what you see here is that it’s based on the 64-bit on cores, universe cores, and there are 64 cores per chip, over 30 billion transistors on that chip or approximately that, and these were built specifically with targeted optimizations in mind, to reduce cost and reduce costs while still getting performance from an economic standpoint. And when we put these together, we see things like this. So we’ve got the Nitro cards, and we’ve got the Graviton2 instances, and when you look at our Im4gn and our Is4gen instances, comparing them to I3 instances, you see things like a 75 percent lower latency variability. Now, why is that important? Why is latency variability important? So you might sometimes hear terms .. . It’s a recent term that’s been cropping up in the storage space called tail latency, and the way I see it in layman’s terms when you think of things, the tail of a graph, these are things that are considered statistical outliers but in certain contexts can create big problems, particularly at scale, and so reducing this has been very important to our customers so reducing that variability associated with I/O latency, and then from a memory perspective, another term that you might hear are NUMA concerns, right, so NUMA, non-uniform memory access. It’s an optimization strategy, if you will, that, depending upon your context, may or may not be appropriate, and so what we’ve done is we’ve taken out these NUMA concerns by ensuring that you get the same latency across the various cores in the Graviton2 processor and to DRAM. So that’s another way to remove variability, and then lastly on this slide, I talk about now SMT or hyperthreading, in this particular case. One vCPU is equal to one core, and so why do we do that? Well, it’s helpful to understand why hyperthreading was done in the first place and how it works. So in some contexts, right, when you have, say, two hardware threads on a core, there are cases where a hardware thread is idle, and in real-world scenarios, this does happen, and so that’s an opportunity for another thread that’s saying, “Hey, maybe I can do a bit of work. I can leverage the same execution resources,” and in these quote, unquote “ideal scenarios,” that can work really well, whereas you see by this particular animation that when one is going to sleep, the other one goes and gets the resources, and we’re all good. But what happens when there is contention, when they’re both very active? And this is something that is very real, particularly in the cloud, highly concurrent environments. Well, then that’s when they start to stack up. So in these scenarios, you can get variability related to the contention for the processor execution resources, right? So when you have dedicated processor execution resources dedicated to each core, you don’t have such contention, and that’s another way to minimize variability. So what does it net out to now? So since we announced Graviton2 in 2019, we have 12 different instances across 23 regions globally. So if you look on the slide, it looks like there are nine there, but the first three have the G and GD series represented by the D in parens, and so these 12 different instances across 23 regions globally is a testament to the fact that customers are getting value out of this, and by customers, we mean partners as well. And they’re not the only ones, right? There’s .. . Well, backing up a little bit, ways to use Graviton are because of its ubiquity, whether it’s in the open-source community or commercial operating systems. The ARM platform works in these scenarios and similarly for containers, right, as you can see here, not just our own but those out in the community as well, and then we benefit and, in turn, our customers benefit from using Graviton as well. These are the various services inside of AWS that leverage Graviton, and then in terms of sharing a customer example, it’s probably best to share one of our own because sometimes we get asked about scale, and many customers don’t want to talk about the size of their own deployments, but we’ll be happy to share one of our own, and this is Amazon Prime Day, and I’ll let you folks read the slide, but what it refers to is a decision to say, “Let’s trust the scale of Amazon Prime Day on our Graviton2 instances,” and we did that so 53,200 C6g instances on a three-region cluster. So it does scale, and it’s another example of trusting Graviton for critical workloads. Okay, and so we’re going to wrap things up now with a mention of what’s off in the distance in the future. So we announced at last year’s re:Invent Graviton3. There’s always something new on the horizon. Now, this is a recent announcement so certainly not as available as Graviton2, but this ups the game quite a bit, in that there’s significant performance improvements above and beyond Graviton2. I think 25 percent higher performance is what we quote per core. It’s the first DDR5 system in our data centers, but some of the things we see go beyond just performance per core, and when we announced the C7g instances or the first of this type, we see other things, like significant higher floating point performance and significant performance improvements with cryptographic workloads. So it’s not just in the core CPU where we see performance but some of the kinds of workloads around the general purpose workloads where it really shines, and so what it nets is significantly more energy efficiency over the EC2 instances. They’re available in preview, and, again, it’s just the C family of instances, so we haven’t yet announced an I/O optimized one, so Graviton2 is still the significant engine of our ARM offerings, but this is to demonstrate that Graviton3 is right behind, and there will be other innovations down the road. All right. So wrapping things up, what did we cover? So we took a trip down memory lane, and we’ve learned that we went from one instance to more than 475 instances to fit a variety of workloads. This is based on customer need. We talked about the Nitro architecture, and that is an important point to make, where we’re able to offload functions that are typically handled by a hypervisor and offloaded them into hardware, allowing us to provide a lightweight hypervisor but also the ability to innovate in ways that go beyond just virtualization, as evidenced by the Nitro SSD, and the Nitro SSD is another in-silicon innovation by AWS, where when you go below the level of the box, as long as you meet the APIs of that device, you can do a lot of things at the implementation level to improve performance, scale and security, and then along the way, we covered the I/O optimized instance types, I4i by Intel, the Intel-based instance type that we offer, the Im4gn and the Is4gen instances, which are Graviton2 instances, and so with that, I’m going to conclude this portion of the presentation and turn it back over to Tzach to see how these work with real-world tests.
So thanks, Ken. That was very interesting, at least for me, and once we heard at Scylla that there are new I/O optimized server, of course we had to run and start benchmarking it because, as a database, Scylla care a lot about I/O, and I/O often determine the performance of the database. So the three instance family that Ken mentioned and we tested is the Im4gn, which is an ARM-based Graviton2 with more RAM; the Is4gen, which is also ARM-based but with more storage compared to CPU; and last but not least, the I4i, which is x86-based but also have the latest storage and, as you will see, actually showed the best performance for Scylla. So we tested this images, this type of instances, and there’s a side effect, by the way, the effect of we can now compile Scylla, and I’m talking specifically about Scylla, for .6 and beyond, to ARM, also allow a new Mac user with the M1 to run Scylla natively on Mac, is something that they couldn’t do before. So let’s see some results. We started benchmarking with instances which require a more disk compared to the CPU with RAM, so we compare the new Is4gn to the I3en, which was the go-to family. The I3en was the go-to family for more storage instances so far. As you can immediately see, the performance of the new instances is much better, both for a cache read and write and for a disk read and write, but you might say the comparison is not completely fair because we are comparing here two different instance types, and the Is4gn instance actually have more cores in this case, 32 versus 24, so you might expect that, and if you look at the quantity per price, which is usually the metric you want to look at .. . You want to compare how much money you pay for each property .. . you can see that for this instances, you actually, for the same price, get more storage with the relevant parameter for this type of instances. So you would go to the .. . If your workload is more, require more storage compared to CPU and RAM, you would now you should go to the Is4gen compared to the now-we-consider-old I3en. And next we did a similar comparison of the new Im4gn to the old now I3.8extralrage. The number of CPU here is the same. The parameter is very similar. As you can see here, if you compare the .. . Again, we are comparing the quantity per price, so how much throughput, in this case, you get for the same money. As you can see, both for disk-bound and CPU-bound workload, you get much better result in the new instance. So you pay less money for more throughput. Okay, so what instance you should use in each case? So I tried to formulize it in the following script, and the main question is how much disk space you need per CPU. If you need more CPU than disk in comparison, you should go with the Im4gn, and if it’s not available yet in your region, drop to the existing I3. If you need more storage compared to CPU, then you should go to the Is4gen, and if not yet available, drop to the I3en, which is, again, more focused on storage. So we covered these two ARM-based server, and we can see they are preferable, if they exist in your region. Now, I want to switch to show some numbers on the latest I4i, the x86. Before I present the actual result that we got, we still a just quick review of the property of this new and exciting instances, so the frequency of the CPU in the I4i is higher, and we took here the I-end machine on each kind of instance, the I3, I3en and the new I4i. The number of CPU in the big machine is higher, so the I4i 32 extra large is a monster. It’s a .. . and I will show later some number with Scylla. You can appreciate the number of transaction we expect to support in these huge machines, and Scylla, by the way, is very good at handling this kind of high number of CPU per machine because it really scale linearly with the number of cores. We have a unique architecture of shard per core that allows Scylla to scale linearly, so other product or other application sometime cannot handle this number of cores, and the performance actually drop or get even when the number of cores get higher and higher, not the case for Scylla. And the RAM in this instances, of course, with more CPU come more RAM, and that’s not something that we did. Everything come from EC2 and AWS. The storage on the I4i is similar in proportion to the I3 while the I3en have more storage, as expected. One thing I’m not going to deep dive into because I don’t want to steal the thunder from our CTO Avi presentation, this is coming from a tool that we call Disk Explorer. It’s a tool that we are using to measure the performance of disks, and we use it with EC2. We use it with other cloud. Spoiler alert: The performance of AWS are superior and especially the new I4 instances. As Ken mentioned, a lot of work was done in EC2 to improve storage performance, and this explorer tool really show that it shine, especially for Scylla, and here is the result, the I3 versus the I4. So as you can see on the similar kind of server with the same number of core, we get more than twice the throughput on the I4, which is better than we expected, maybe better than Ken expected as well, more than twice the throughput on a similar machine with better latency. So the latency, including the tail latency, and we are in Scylla care a lot about latency, so what you see here is actually the P99, the 99 percent of the latency, which is better on the I4, although the throughput is twice as much, and we are actually comparing here two different kind of workload, one with reading and writing directly to the disk by bypassing the cache and one when most of the read is coming from the cache, and in both type the performance of the I4 are more than twice the performance of the I3, so we are really, really excited about that, and just to give you .. . Earlier, I showed the result for one server, but a typical Scylla cluster will have at least three nodes for redundancy. So here you can see the number for a three-node cluster, and as you can see, just on three nodes, on the I4i 16 extra large, you can support way over a million requests per second, and that’s an actual realistic workload. You can imagine if that if you use the higher-end, the I4i 32 extra large, we’re expecting twice the number, at least twice the number of requests per second. So those of you are in working the database domain, you can appreciate these are very high number. This is huge number for a database and all of that from just a three-node cluster, so we are super excited about this new instance. And I want to specifically thank Michal that helped me do all of these benchmarks. So going back to the formula script, if you have the I4i available in your region, go for it. At least for Scylla, it’s much better and present superior performance, and to summarize, Scylla and AWS have a long-term relationship, and we started with the I2 and moved to the I3, and now we are super excited to move to the I4. The preliminary benchmark that we did already show that it have superior performance, both in throughput and latency over a previous generation of EC2 instances. And, with that, let me wrap up. Thank you very much, Ken, for your deep dive into the Nitro and the new instances, and, again, thank the benchmarking team in Scylla to help me gather the result. So thank you very much.
Thank you. Thank you, Tzach, and Michael as well and team who put the time in to do the benchmarking. Always love being pleasantly surprised by results. Those were great, and we look forward to continuing our partnership and learning more about how our stuff works for our customers.