See all blog posts

ScyllaDB on Docker

This blog post is a short introduction on how to use the ScyllaDB Docker image to start up a ScyllaDB node, access nodetool and cqlsh utilities, start a cluster of ScyllaDB nodes, configure data volume for storage, and configure resource limits of the Docker container. For full documentation, see the image description on Docker Hub.

Please note that the instructions in this blog post assume that you have configured Docker so that you can run it as a regular user. Usually, this is done by adding the user to a Docker group. See your platform specific Docker installation documentation on how to do that (see, for example, instructions for Fedora and Ubuntu). If you have not configured a Docker group, you need to prefix the docker commands with sudo to have sufficient permissions to run them.

Getting Started

To start a single ScyllaDB node instance in a Docker container, run:

$ docker run --name some-scylla -d scylladb/scylla

The docker run command starts a new Docker instance in the background named some-scylla that runs the ScyllaDB server:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                          NAMES
616ee646cb9d        scylladb/scylla     "/docker-entrypoint.p"   4 seconds ago       Up 4 seconds        7000-7001/tcp, 9042/tcp, 9160/tcp, 10000/tcp   some-scylla

As seen from the docker ps output, the image exposes ports 7000-7001 (Inter-node RPC), 9042 (CQL), 9160 (Thrift), and 10000 (REST API).

To access ScyllaDB server logs, you can use the docker logs command:

$ docker logs some-scylla  | tail
INFO  2016-11-09 10:27:48,191 [shard 6] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 4] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 3] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 1] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 2] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 7] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,191 [shard 5] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO  2016-11-09 10:27:48,193 [shard 0] database - Schema version changed to 36afc006-c075-3856-b1d3-188448b8618f
INFO  2016-11-09 10:27:48,193 [shard 0] storage_service - Starting listening for CQL clients on 172.17.0.2:9042...
INFO  2016-11-09 10:27:48,194 [shard 0] storage_service - Thrift server listening on 172.17.0.2:9160 ...

The Docker image also has ScyllaDB’s utilities installed. Nodetool is a command line tool for querying and managing a ScyllaDB cluster. The simplest nodetool command is nodetool status, which displays information about the cluster state:

$ docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.17.0.2  125 KB     256     100.0%            c1906b2b-ce0c-4890-a9d4-8c360f111ad0  rack1

The cqlsh tool is an interactive Cassandra Query Language (CQL) shell for querying and manipulating data in the ScyllaDB cluster.
To start an interactive session, run the following command:

$ docker exec -it some-scylla cqlsh
Connected to Test Cluster at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.

and then run CQL queries against the cluster:

cqlsh> SELECT cluster_name FROM system.local;

 cluster_name
--------------
 Test Cluster

(1 rows)

Starting a Cluster

Now that we already have a single some-scylla instance running, joining new nodes to form a cluster is easy:

$ docker run --name some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"

You can use the nodetool status command to query when the node is up and running:

$ docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens  Owns (effective)  Host ID                               Rack
UN  172.17.0.3  177.48 KB  256     100.0%            097caff5-892d-412f-af78-11d572795d6f  rack1
UN  172.17.0.2  125 KB     256     100.0%            c1906b2b-ce0c-4890-a9d4-8c360f111ad0  rack1

Configuring Data Volume for Storage

The default filesystem in Docker is inadequate for anything else than just testing out ScyllaDB but you can use Docker volumes for improving storage performance.

To use data volumes, first create a ScyllaDB data directory /var/lib/scylla on the host, which is used by ScyllaDB container to store all data (and make sure it’s on a proper filesystem like XFS):

$ sudo mkdir -p /var/lib/scylla/data /var/lib/scylla/commitlog

Then launch ScyllaDB instances using Docker’s --volume command line option to mount the created host directory as a data volume in the container and disable ScyllaDB’s developer mode to run I/O tuning before starting up the ScyllaDB node.

$ docker run --name some-scylla --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla --developer-mode=0

Configuring Resource Limits

ScyllaDB utilizes all CPUs and all memory by default.

To configure resource limits for your Docker container, you can use the --smp, --memory, and --cpuset command line options documented in the section “Command-line options” of the Docker image documentation.

If you run multiple ScyllaDB instances on the same machine, it is highly recommended that you enable the --overprovisioned command line option, which enables certain optimizations for ScyllaDB to run efficiently in an overprovisioned environment.

About Pekka Enberg

Pekka Enberg is a software engineer currently working on ScyllaDB. He has been a Linux kernel contributor for the past 10 years and has worked on various other open source projects. He has extensive background in UNIX kernels and the JVM.