Multi-cloud State for k8s: Anthos and ScyllaDB

19 minutes

Register for access to all 30+ on demand sessions.

Enter your email to watch this video and access the slide deck from the ScyllaDB Summit 2022 livestream. You’ll also get access to all available recordings and slides.

In This NoSQL Presentation

One cloud is hard enough, am I right? Now everyone expects that you can deploy containerized applications "everywhere" and things will "just work." Our customer sure did! Join Miles Ward, CTO at SADA, Google Cloud's #1 Implementation and Resale partner, on a detailed, data-filled exploration of the complexities and constraints of modern multi-cloud and hybrid scenarios, rooted in the pursuit of almighty uptime and SLO adherence. We'll show what worked, and what didn't, in a detailed architectural review, as well as demonstrate (and perf test live!) components of the final production system.

Jenn Viau , Staff Solutions Architect, SADA

Jenn is passionate about exploring new approaches, tools, and methodologies. She enjoys helping people find the “right” way and “right” tools to get things done.

Miles Ward, CTO, SADA

As Chief Technology Officer at SADA, Miles Ward leads SADA’s cloud strategy and solutions capabilities. His remit includes delivering next-generation solutions to challenges in big data and analytics, application migration, infrastructure automation, and cost optimization; reinforcing our engineering culture; and engaging with customers on their most complex and ambitious plans around Google Cloud.

Video Transcript

Miles: Hi, everybody. My name is Miles Ward, and this is multi-cloud state for Kubernetes. That’s Anthos and ScyllaDB together at Last. I am so excited about this presentation. I hope you are too. We are really going to dive into the technical nitty gritty about how to extract maximum value from Google’s managed Kubernetes platform, that’s anthos, and from the superior data management provided by our friends at ScyllaDB.

Oh, I can’t wait. I’m here, but I’m also joined by one of my incredible associate CTOs, Jenn Viau. Jenn, say hi to everybody. Jenn: Hi, everybody. It’s nice to meet you. Miles: So here we are today. Who are these two strange people? Where did they come from? Let’s do quick intros, and then we’ll dive into the demo. So my role here as Chief Technology Officer is to lead our cloud strategy and solution capabilities. That means plugging into customers, understanding what they need from public cloud and the broad set of software applications that sit inside of that environment, and helping them extract value from all those tools, right? Cloud is incredible, ScyllaDB is incredible, the tools that we’re building really enable application developers to move much more quickly than they have before, but they’ve got to have a how-to guide. That’s the kind of stuff that we work on. Before this, I worked at Google Cloud, building the solutions architecture function there, and I worked with folks at AWS before that, so about 12 years of bad decision making in public cloud. So looking forward to helping everybody see how these kinds of tools work together. Jenn, maybe take away an intro for you. Jenn: Yeah, so my role at this Associate CTO here is for Anthos. And so some of the capabilities and best practices that we do here at SADA, I help guide, shape and then, you know, help our customers get the best that they can get out of Anthos, and then doing really, really awesome things like getting ScyllaDB to run on a multi-cloud environment, which is the super, super exciting thing that we’re going to show today. Miles: Jenn’s got it exactly right, and she is far too shy. She is an Anthos fellow. That’s the most difficult certification in the Google Cloud arsenal. It’s a gnarly test. I know because I wrote some of the questions on it, and I’m really excited to see her build the whole system that we’re taking a look at today. So what is the system? What are we talking about? What are Anthos? What is ScyllaDB? You all know what ScyllaDB is, you’re at a ScyllaDB conference. Anthos is an application management infrastructure from Google to automate the deployment and operations to extend the capabilities of Kubernetes in environments around the world. So not just on Google Cloud, but on other cloud providers, and even on bare metal hardware in your environment. So that means you can use the same GKE control interface, things- all of the systems that Istio provides as a managed system called Anthos service mesh, and a whole series of other tools that we’ll walk through to ensure that the applications that you build and design for the best managed Kubernetes environment, that’s GKE, work no matter what physical infrastructure you’re on. So you can be on AWS, you can be wherever you want. ScyllaDB as an incredible system for storing data and retrieving it, replicating it, distributing it all over the world, is a natural fit for Kubernetes environments, and we have lots of customers that are trying to sort out this problem. How do I build a system that works the way that I expect it to in one cloud, but really operates that way no matter which cloud I am, and frankly, across those clouds? Right? How many folks saw outages from AWS over the last little bit? Trust me, Google will have an outage too at some point, you know, maybe in 2027 or 2092. Those outages are real risks for customers. So being able to build a system that’s high availability across cloud providers, not just across availability zones or regions is a big deal. So if we’re going to make that work, we’ve got to have an application. So what application should we use? Memes. Memes are the right application. Everybody likes a meme, they are critical for social cohesion and communication. It is important for us to build memes in the right way, we have to meet the design goals of our app. Those include high integrity storage. So when I create a meme, I don’t want to get the wrong meme template in the background. I don’t want to have my text to get gobbled up. I want to make it so that a meme management services high availability. If one region, one whole cloud goes offline, it should keep working. And I want to make it so that as I need, if I have to handle more traffic, this is a system that’s scalable, so I can add more nodes in my Kubernetes cluster, whichever cluster that needs to happen in and add more ScyllaDB instances to deal with the traffic. That’s the advantage of these scale out services is I can add as much as I need. So if we’re going to build a meme Gen application, a system for creating and ranking memes, why not do, you know, no better to do an impression of the original meme Gen service built by Khan McMillan inside of Google, inside of alphabet, which is the sort of cultural zeitgeist inside the business. So we’ve built a tiny clone of meme Gen, we call it Memes Gen for copyright purposes, and we’ll be using that as our sort of demonstration environment for today. So how did you do that, Jenn? How did we pick that work? Jenn: Thank you. Well, so we start with ScyllaDB, because ScyllaDB’s got this fantastic speed of replication, and we stuffed some pretty large base 64 encoded images into that to replicate it across a bunch of clouds. And then we also used Anthos GKE, you see my little- little bit of regex in there. So we did it on AWS, we did it on GCP, and we did it on bare metal. And we also did that inside of Kubernetes. Right? So ScyllaDB is running inside of Anthos GKE on all three of those environments. And then to, I want to say, continue doing what I preach to everyone else, we did this with GitHub, we did it with Cloud build, it’s full GitOps, and Anthos config management and all of those things. And then, you know, on the architecture side, you can sort of see the way this looks, it’s rudimentary, but this is kind of what we look at, right? We’ve got Google Cloud on one side, we’ve got AWS cloud on the other side, I’ve got an on-prem data center and Anthos on bare metal environment in the middle, because if we’re, you know, cheekily talking about memes, but we’re also talking about data and data replication, and how people would actually run applications. And often it’s in your own data center plus in a couple of clouds. So we’re doing hybrid multi-cloud data replication using the ScyllaDB Docker containers. And then, yeah, so then we get into the demo. That’s on again. Miles: I mean, if that’s the design, that’s the architecture, can you show us the app first, and then maybe take us through how it was designed? Jenn: And there are memes. So Memes Gen. We’ve got our lovely indicator up here that says we’re sitting on Anthos on GCP right now. And if I go over and look somewhere else, and hit a refresh, we can now see that we’re, you know, our front end at least is saying we’re running on Anthos on AWS. These are our memes, we have a couple of functions, you can create your own, which means we’ve enabled the ability to upload images into our ScyllaDB in order to allow that to replicate across the world. So frontend. So the reason we chose to do it this way, and I’m going to show- I’m going to go into just very quickly the Kubernetes sort of platform here very quickly, so that everybody can see what I’ve got. I’ve got, you know, an AWS and Azure, and this is one of the benefits of the Anthos side of the house is that all of my clusters are visible in this one place, which means I can go check them, I can go see how big they are, I can go see their utilization, I can do all of those things. So just so we can see it, I’ve got my three- my four clusters here. And then I’ve got all of my workloads. Oh, no, I have one that’s not running properly, we might have to fix that as part of this demo. But you can see the API is deployed, the UI is deployed and our database is deployed. And so let me go back over to this. So this one is running on my bare metal environment, and it’s looking like there’s a problem with my ScyllaDB pod. I think if I look at it, it’s probably because I’m having- there’s a problem with the persistent volume. So let me go, and there’s an Azure cluster over here. So why don’t we just bring Azure into the environment just to give us our availability back. So what I’m going to do to do this, because I have put everything over here into GitHub, I have a GitHub repo that contains all of my application code code for Memes Gen, including my ScyllaDB environment. We’re using- for the purposes of this demo, I’m using the Docker configurations for my stateful set for my ScyllaDB cluster, we’re working, you know, with the ScyllaDB Operator team in that to be able to do this with the full Operator stack. But for the purposes of this demo, I’m doing just with the Docker container with my own stateful set. So we can see that I can specify that all there. So I’ve got everything up here in configuration management, which means adding my Azure environment back into the environment or back in, is as simple as connecting it back to configuration management. So I’m going to very quickly do this. Over here, I get to specify my GitHub repository. Because I can’t remember any of this, I’m just going to copy and paste my GitHub URL– Miles: Great developer’s copy, incredible developer’s paste. Jenn: And then there’s some- and I can talk about the ACM structure a little bit more after. Let me just make sure I have all of this set up properly. So I’ve already set this cluster up, it does have the SSH key, I’ve used deploy keys because it is GitHub, we do do GitOps. So I have the deploy key in my GitHub environment that is hooked up to this cluster, and which means I don’t have to do any of these extra pieces. And let me just go into.. . the sharing window is actually hiding my complete button. Here we go. So while this is happening, I’m going to also add Azure into our load balancer. Apologies, we’ve lost the screen. We’re just using so that part of that load balancer, we’ve got a Google Cloud global load balancer sitting in front of all of this, passing a bit of information over to HA proxy to do the routing between my AWS, my Azure and my GCP environment, and my bare metal when it’s, you know, not having persistent volume problems. So I’m just going to restart HA proxy over here. And as this Azure comes into synchronization, we’ll be able to refresh our memes. And I’ll go create a meme, and we’ll watch it replicate. But the other part of this, so you’ve got a couple of things, we did- I talked about the GitOps approach and all of that. But we’ve got config management here, I did build all of this with TerraForm. And you’ll notice there’s a couple of sections in here. So I have some security policies, and this is the part that I wanted to mention around where Anthos config management comes into our benefit, because I have four different environments, and if we look on our memes, I have this little, you know, powered by Anthos on AWS, I’ve got a powered by Anthos on GCP. I can tell Azure is coming online because HA proxy is doing something weird. In order to do stuff like that super, super easily, I can do things like declare my clusters, and I can select where things are going. So my frontend deployments have multiple environment variables. So I’ve done things with some selectors where, you know, we say this cluster, this cluster, this cluster gets whatever piece of information, but in the end, the only thing that’s different is that environment variable. So you can keep the same code in order to run it across all of those different environments. Let’s go check on my configuration management, this error thing is normal. It’s just pulling up all of my security policies, lovely security policies like band image tags. We have a posture here, where using the tag latest on your Kubernetes images is a really, really bad idea, and so we enforce that with the security policy. We’ve got some other ones like you can’t run as a privileged container, no privilege escalation, all of those things, just to enforce a security, and that’s all part of that toolkit. But while we’re doing all of that, let’s go back and look at Kubernetes, and we can see, I’ve gotten an API pod, I’ve got an API pod, and I have a ScyllaDB pod. They’re all just finishing their little Bootstrap. And they’re up, which means if I go over here, and I hit refresh a few times, there is Anthos on Azure, Memes Gen powered by Anthos on Azure. So proof that, you know, we’re actually doing this- no, I’m not going to upload a Betty meme GIF. I’m going to make a new meme. And I’m really, really bad at making memes. So we’ll just do a test meme. Miles can probably make better memes than I can. Miles: I’m happy to argue that I am truly gifted at meme construction, but it’s one thing to make a funny meme, it’s another thing to make a hosting infrastructure for said memes. So I’m feeling pretty good about it. Jenn: What I’ll say is everybody watched me, I hope you saw the quick little transition there. I was on Anthos on Azure, a Memes Gen frontend on Azure. And I just refreshed, we hit GCP. We are on AWS. And that is, you know, part of the beauty of this is I didn’t configure, I may go and sort of do that proof over here with my ScyllaDB cluster. So on one of my clusters, I have a seed database, I have a CQL script that on the first one that we built out, I seeded the database with templates and all of those lovely things. This one I didn’t, all I did was give it the config map and the connector to the seed service that all of my other clusters are running, and it immediately started replicating the data. And that’s, you know, the beauty of being able to use a tool so fantastic like ScyllaDB to do this is that I can just go add a new cloud, I can add another data center, I could add, you know, another region in AWS, in Azure, in, you know, insert cloud provider of choice here, and it would be this easy. And so that’s my lovely demo. Miles: So I did make a new little new meme. Yeah, you did great, Jenn from Lumberg. I think that’s an important one to have on the top of the list. It comes up because we’re able to sort for newest, which is a function we added fairly recently. I made that one on the GKE cluster on GCP. It’s been propagated now through to the AWS side and over to the Azure side. So you can imagine in an operations context, an engineer that’s, you know, that has a limited set of resources in each of these different environments is taking on a bunch of extra traffic or has a hardware failure in one of the systems, or receives, you know, error telemetry, has got a, you know, some kind of systems are offline, being able to quickly I mean, that’s real quickly, we did that all in a span of a couple of minutes, to be able to bring online new cloud infrastructure connected into your Anthos distributed cluster, and make it so that the- not just the frontend application layers running, but the full data storage layer is distributed across those environments. That’s just a huge operational advantage in comparison to even the best practices ways of doing this stuff from two and three years ago. So it’s a big step forward, and we’re just really excited to show it to each of you. Really appreciate, Jenn. Great job. Everybody give her a round of applause. Demos that work are great. Miles: So I’m going to take back over on the screen. Good job. This kind of work, the engineering design work, the requirements gathering, the ratification of a best practices plan that comports with not only the security requirements and all of the sort of best features that are available in these products, as well as working together with a team of engineers on our side to plug in with customers and help them learn these tools for themselves is sort of the core of the way that SADA works with companies. So if you’re interested in doing this kind of project, or if you have an application that would benefit from high availability, if you’ve got a service that could take advantage of more than one cloud for reliability or consistency, for all of the benefits for, you know, regionality, or being able to meet compliance regulations, there’s all sorts of reasons you might want to be in more than one. Anthos is a great way to simplify that fits inside of our time window with rooms to spare. So our teams at SADA, we’re over 12% of the global Anthos fellows where we’ve done hundreds of implementations where we’re now, you know, a really material fraction of the global GCP footprint. We’re ready to work together with you to dive in on the hard problems, on the stuff that’s creative and unique and requires this kind of next level engineering to ensure you’re able to meet the requirements of your customers. So in order to stay in touch, we’re super easy to get at, I’m just @milesward Twitter, Jenn is @jennviau at Twitter. It’s not all that complicated. And those are our email addresses, too. I’m just really excited about what we’ve been able to produce. If you have any questions about it, I know we’ll be getting together with people as a part of the live show. So looking forward to that. Thank you everybody for taking time, and I’ll see you soon.

Read More