Nick Stott is is a software engineer at Compose, with experience developing the company’s application architectures and container orchestration infrastructure.
He is speaking on a new Twelve Factor Manifesto that covers how to manage databases, such as Scylla, that must work reliably at Internet scale. Some of the 12 factors for apps can be anti-patterns for a stateful service. We asked Nick about his progress on improving simplicity and resilience for databases.
The original “12 factor app” design was written for web applications. What do you have to do differently for a database?
I’ve got a different 12 factors, and there’s some overlap. One of the things that’s interesting about the original 12 factors is that it doesn’t mention containers at all. I’m going to talk a little about how containers work and some of the useful things we learned from our history with containers. When we started in 2011 using containers it was a very static deployment and we only hosted MongoDB. This was before Docker and other tools.
The next iteration was more dynamic, we were provisioning containers on demand. But adding users on MongoDB was troublesome. Over three different MongoDB versions, they had three different ways to add users. It didn’t work for us. We couldn’t have all those if/else statements, so we needed to take that logic out of the orchestration.
All databases need similar sorts of things. You need to back up data, quiesce a node, manage clustering, and so on. We took those similar operations, made our orchestration layer able to handle those operations, then pushed everything else into the container.
So you have environment variables outside the container that are used to populate a config file in the container?
You can’t run Scylla just with environment variables, you need a config file. So we had a creation operation that takes the container’s environment and uses it to configure the database. The idea of storing these parameters in the environment is a really good idea.
How do you handle building the config file inside the container?
The configure step just calls a hook inside the container. It’s up to each database how it handles these hooks. It’s a migration of control from the orchestration layer into the containers themselves.
You can see it when when we’re talking about logs and metrics. If you’re really worried about how the heap is doing—the orchestration layer doesn’t really care about that. Each database is really responsible for managing itself. We call this tools and versions. We have the idea of a container overlay that has these ways of measuring the database.
Is this configuration and management all Compose code that runs inside the container?
It’s a layer that we write for each database, we call it the overlay. It maps what each database provides to what the orchestration layer expects.
How do you handle network configuration?
Originally, database nodes were accessible on the public Internet. There’s no real way to secure these nodes from an exploit. Elasticsearch doesn’t even offer any authentication or authorization. We needed some layer of protection. What we decided to do is have all the containers in a deployment have a private VLAN—we use Open vSwitch and an overlay network to manage the details of that—what matters is that all the cluster replication happens over the private network.
We open this up to the application using portal containers. For Elasticsearch we were using SSH and HAProxy. For mongo we’re using Haproxy and Mongo Router. Basically these containers are on the edge network and focus on just proxying traffic. We can deploy these hosts with different network parameters so the whole host doesn’t have to be exposed to the network. Our portals handle SSL termination, and the nodes use an encrypted VLAN for inter-cluster communication.
For databases with simple leader/follower semantics we can have a portal node that just detects the primary and forwards all traffic there. The client drivers don’t need to care about the internal state of the database. This was a problem with Scylla and Cassandra drivers because they’re a complex beast. They do node discovery which messes with this.
To solve this problem we basically just bundle a proxy with each database node. And these portals and database nodes are kept in lock step. This exposes one the big problems that docker and rocket and other containers have yet to deal with. They have this mentality of cattle not pets, which implies that you should not have to configure these containers individually, they should just work en masse. But when I configure a Scylla cluster, the portal nodes need to know the addresses of the Scylla nodes, and the Scylla nodes need to know the addresses of the portal nodes. That is something that is specifically outlined as one of our 12 factor: deployment network details need to be known in advance. Making applications do discovery leads to problems. That is something that is somewhat contentious—a lot of folks do not agree with it, but I think it’s really important.
One of the things that we do with these deployments—a deployment is a logical collection of containers—is that we have more than just portal and database nodes. We have a bunch of utility containers that we can add to a deployment. If a customer needs logging we can add a specific container for logging. If you’re using DataDog, we can add agents to the network and these agents don’t get exposed to the public Internet.
What else did you learn from extending your system to handle Scylla containers?
The thing that hurt us on Cassandra specifically, and this is a common factor of the JVM databases, is they’re really hard to scale. Because we’re on multi-tenant systems we prefer to scale up when we can. And this elastic growth has been something that is an important part of platform. With the JVM databases, this is tricky.
Something that’s different with Scylla is that we can horizontally scale, dynamically. The ability to add or remove nodes by customer demand or potentially automatically is a powerful feature.
What resources are you giving to new Scylla nodes?
The minimum node is 1GB RAM and 10GB of disk. We find that generally throttling memory is a good proxy for limiting other things as well. Scylla (beta) was particularly hard to limit, it’s a very aggressive database in using hardware! The CPU pinning that’s in place now is really good for multitenant machines.
What are you hoping to get out of Scylla Summit?
I’m interested in learning more about what people want from Compose. We have this thing called “engineer in support” where we take a week working with customers on customer facing tickets, and this is really valuable to us to talk to people who are learning our service. I like to build fun things, and better to build fun things that people are going to use. I’m looking forward to meeting people in the community.
Check out the whole agenda on our web site to learn about the rest of the talks—including technical talks from the Scylla team, the Scylla road map, and a hands-on workshop where you’ll learn how to get the most out of your Scylla cluster.
Going to Cassandra Summit? Immerse yourself in another day of NoSQL. Scylla Summit takes place the day before Cassandra Summit begins at the Hilton San Jose, adjacent to the San Jose Convention Center. Lunch and refreshments are provided.