
This blog post is a short introduction on how to use the ScyllaDB Docker image to start up a ScyllaDB node, access nodetool
and cqlsh
utilities, start a cluster of ScyllaDB nodes, configure data volume for storage, and configure resource limits of the Docker container. For full documentation, see the image description on Docker Hub.
Please note that the instructions in this blog post assume that you have configured Docker so that you can run it as a regular user. Usually, this is done by adding the user to a Docker group. See your platform specific Docker installation documentation on how to do that (see, for example, instructions for Fedora and Ubuntu). If you have not configured a Docker group, you need to prefix the docker
commands with sudo
to have sufficient permissions to run them.
Getting Started
To start a single ScyllaDB node instance in a Docker container, run:
$ docker run --name some-scylla -d scylladb/scylla
The docker run
command starts a new Docker instance in the background named some-scylla
that runs the ScyllaDB server:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
616ee646cb9d scylladb/scylla "/docker-entrypoint.p" 4 seconds ago Up 4 seconds 7000-7001/tcp, 9042/tcp, 9160/tcp, 10000/tcp some-scylla
As seen from the docker ps
output, the image exposes ports 7000-7001 (Inter-node RPC), 9042 (CQL), 9160 (Thrift), and 10000 (REST API).
To access ScyllaDB server logs, you can use the docker logs
command:
$ docker logs some-scylla | tail
INFO 2016-11-09 10:27:48,191 [shard 6] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 4] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 3] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 1] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 2] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 7] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,191 [shard 5] database - Setting compaction strategy of system_traces.node_slow_log to SizeTieredCompactionStrategy
INFO 2016-11-09 10:27:48,193 [shard 0] database - Schema version changed to 36afc006-c075-3856-b1d3-188448b8618f
INFO 2016-11-09 10:27:48,193 [shard 0] storage_service - Starting listening for CQL clients on 172.17.0.2:9042...
INFO 2016-11-09 10:27:48,194 [shard 0] storage_service - Thrift server listening on 172.17.0.2:9160 ...
The Docker image also has ScyllaDB’s utilities installed. Nodetool is a command line tool for querying and managing a ScyllaDB cluster. The simplest nodetool
command is nodetool status
, which displays information about the cluster state:
$ docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.2 125 KB 256 100.0% c1906b2b-ce0c-4890-a9d4-8c360f111ad0 rack1
The cqlsh
tool is an interactive Cassandra Query Language (CQL) shell for querying and manipulating data in the ScyllaDB cluster.
To start an interactive session, run the following command:
$ docker exec -it some-scylla cqlsh
Connected to Test Cluster at 172.17.0.2:9042.
[cqlsh 5.0.1 | Cassandra 2.1.8 | CQL spec 3.2.1 | Native protocol v3]
Use HELP for help.
and then run CQL queries against the cluster:
cqlsh> SELECT cluster_name FROM system.local;
cluster_name
--------------
Test Cluster
(1 rows)
Starting a Cluster
Now that we already have a single some-scylla
instance running, joining new nodes to form a cluster is easy:
$ docker run --name some-scylla2 -d scylladb/scylla --seeds="$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' some-scylla)"
You can use the nodetool status
command to query when the node is up and running:
$ docker exec -it some-scylla nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
-- Address Load Tokens Owns (effective) Host ID Rack
UN 172.17.0.3 177.48 KB 256 100.0% 097caff5-892d-412f-af78-11d572795d6f rack1
UN 172.17.0.2 125 KB 256 100.0% c1906b2b-ce0c-4890-a9d4-8c360f111ad0 rack1
Configuring Data Volume for Storage
The default filesystem in Docker is inadequate for anything else than just testing out ScyllaDB but you can use Docker volumes for improving storage performance.
To use data volumes, first create a ScyllaDB data directory /var/lib/scylla
on the host, which is used by ScyllaDB container to store all data (and make sure it’s on a proper filesystem like XFS):
$ sudo mkdir -p /var/lib/scylla/data /var/lib/scylla/commitlog
Then launch ScyllaDB instances using Docker’s --volume
command line option to mount the created host directory as a data volume in the container and disable ScyllaDB’s developer mode to run I/O tuning before starting up the ScyllaDB node.
$ docker run --name some-scylla --volume /var/lib/scylla:/var/lib/scylla -d scylladb/scylla --developer-mode=0
Configuring Resource Limits
ScyllaDB utilizes all CPUs and all memory by default.
To configure resource limits for your Docker container, you can use the --smp
, --memory
, and --cpuset
command line options documented in the section “Command-line options” of the Docker image documentation.
If you run multiple ScyllaDB instances on the same machine, it is highly recommended that you enable the --overprovisioned
command line option, which enables certain optimizations for ScyllaDB to run efficiently in an overprovisioned environment.