Our most recent ScyllaDB developer conference and hackathon was a virtual event. As a hackathon team, we wanted to itch one of our own scratches and make testability a bit more simple and straightforward task. While many people these days are focusing on Kubernetes, we’ll explain below why we implemented this in Docker and are not aiming for Kubernetes.
What’s CCM (a.k.a history lesson)
Cassandra Distributed Tests (known as dtests, which you can find at cassandra-dtest) are the heart of functional testing for Cassandra and Scylla. It’s based on Cassandra Cluster Manager (CCM). CCM is also used in driver integration, to test them against a real Cassandra/Scylla cluster.
CCM is the script/library that allows a user to create, manage and remove a cluster. In Cassandra’s case, you can download a Cassandra version with the use of CCM from a git repository or from official tarballs.
For the past few few years we only supported using CCM out of the Scylla source tree, meaning you had to first compile Scylla locally to get dtests working. Later we had dbuild that help with the building part, but it still was a long process.
Then came the relocatable packages, which eased the process, but still the housekeeping of handling a few running clusters at the same time, was an issue for our CI. And it was a constant struggle for any new team member joining (on how to get a version, and how to setup it correctly for tests to run)
Ever since I’ve seen the cassandra-dtest and our fork of it, and our fork of CCM (scylla-ccm) I thought using Docker could help make it a much more pleasant experience for those who are using it for testing.
Packaging was a real struggle, from where to store it on S3 to how to lookup the correct version you want, and how to squeeze the poor install script into putting the different parts into place.
And all that from 3 different tarballs (that later become one); it’s a very fragile process.
One docker pull command from DockerHub eliminates almost all of it. And all packages are nicely installed and configured by the battle proven package manager, and not by all kinds of hacks we had in place.
With our current approach (relocatable package or running out of the source), every Scylla “node” is a bunch of local processes that need a lot of attention and housekeeping: how to select local addresses, make sure ports won’t conflict and keep tabs of each process id.
Docker is a perfect match for those things, since each container has its own network address and you can treat it almost as a real remote machine.
A nice side effect would be to help newcomers get up and running fast, since it also removes some of the dependencies we need to install, like Java and friends. Trading that with a working Docker, might prove a bit more problematic (yeah Fedora we are looking at you…), but we tend to be using it more and more in our daily routine.
Why Not Kubernetes?
While we came up with the idea some people suggested we might also support Kubernetes. In theory that sounds like a fantastic idea, but incorporating scylla-operator into the mix has a long list of challenges.
Since the usage of CCM is mainly for a functional test, you would like the test to control what is going on during the test. The nature of how a Kubernetes operator works kind of defies the notion you have control, since it would keep doing what it’s built for: fixing your cluster to the requirements you’ve set.
By the way, don’t worry, we are working very hard on testing the scylla-operator, but in the context of scylla-cluster-tests, where we aim for more realistic environments and run it as an example on top of GKE, and with bigger, more taxing loads. That whole effort is worth its own blog post (or even a series of them).
Since it was a pain mostly felt by the testing team, I had Fabio Gelcer, Shlomi Balalis, Alex Bykov and Oren Efraimov, all from our QA team, Yaron Kaikov from our release engineering team, and Aleksandar Janković from the Scylla Manager team joining me.
We use CCM mainly in two ways, or via a command line or via environment variables when invocating dtest runs.
The usage from command line, should now support
--docker option (instead of the previous
ccm create scylla-docker-cluster -n 3 --scylla --docker scylladb/scylla:4.1.8
And similar for the usage for dtest runs, a new environment variable was introduced:
SCYLLA_DOCKER_IMAGE=scylladb/scylla:latest nosetest ...
We are working on a dedicated branch for that effort:
We got a lots tests up and running in dtest, (1.5h running suite of 256 tests for sstableloader, as an example (hot straight out our IDE):
During the process we integrated with Github Actions for running integration tests on each PR (something we didn’t yet have for our ccm
We played with the idea of adding to the mix support for scylla-manager, but it didn’t get there in the 3 days we had.
The main target for this was using it in our dtest suite, which has around 1,800 tests that run daily. So we need to vet our work with all of those tests. That’s our next target to achieve before we think about switching all of our test jobs to use our new Docker based CCM.
We’re also considering replacing our runs in Continuous Integration (CI) to use Docker-ccm, since running the whole dtest suite takes about 6-8 hours in our current setup. Instead we are thinking of experimenting with something we’ve been contemplating for a long time now to be able to run them quicker: distributing the tests across a large number of EC2 machines with the spot-fleets jenkins plugin. (Yet again, something we’ll surely elaborate on once we have it fully operational, and in daily use.)
Hope you enjoyed this view into the kinds of work that we do behind the scenes. If you want to be part of ScyllaDB’s next internal hackathon, check out our job opportunities!