Mutant Monitoring System Day 10 – Backup and Restore

By Phillip Tribble

May 3, 2018

IMPORTANT: Since the first publication of the Mutant Monitoring System we have made a number of updates to the concepts and code presented in the blog series. You can find the latest version now in ScyllaDB University. Click here to be directed to the new version.

This is part 10 of a series of blog posts that provides a story arc for ScyllaDB Training.

In the previous post, we learned how developers at Division 3 can create applications that can interact with the Mutant Monitoring System by showing how to use the Cassandra libraries available in programming languages. Due to recent violent events in the mutant community, Division 3 implemented a new policy for ScyllaDB Administrators to learn how to backup and restore the mutant data in the cluster. In this post, we will learn how to backup and restore data from the Mutant Monitoring System.

Starting the ScyllaDB Cluster

The ScyllaDB Cluster should be up and running with the data imported from the previous blog posts.

The MMS Git repository has been updated to provide the ability to automatically import the keyspaces and data. If you have the Git repository cloned already, you can simply do a “git pull” in the scylla-code-samples directory.

git clone https://github.com/scylladb/scylla-code-samples.git
cd scylla-code-samples/mms

Modify docker-compose.yml and add the following line under the environment: section of scylla-node1:

- IMPORT=IMPORT

Now we can build and run the containers:

docker-compose build
docker-compose up -d

After roughly 60 seconds, the existing MMS data will be automatically imported. When the cluster is up and running, we can run our application code.

Backing Up the Data

First, we need to backup our schema with the following commands:

docker exec -it mms_scylla-node1_1 bash
cqlsh -e "DESC SCHEMA" > /schema.cql

With the schema backed up, we can create a snapshot of the two keyspaces used for the Mutant Monitoring System: tracking and catalog. Snapshots are taken using nodetool snapshot. The command first flushes the MemTables from memory to SSTables on disk, then creates a hard link for each SSTable in each keyspace. With time, SSTables are compacted, but the hard link keeps a copy of each file. This takes up and increasing amount of disk space.

nodetool snapshot tracking

Requested creating snapshot(s) for [tracking] with snapshot name [1524840557127] Snapshot directory: 1524840557127

nodetool snapshot catalog

Requested creating snapshot(s) for [catalog] with snapshot name [1524840566260] Snapshot directory: 1524840566260

The snapshot is created in the ScyllaDB data directory /var/lib/scylla/data and It will have the following structure: keyspace_name/table_name-UUID/snapshots/snapshot_name.

Simulating Data Loss

All of the data is backed up on node1 only. Division 3 must now prepare to handle cyber attacks from the mutants and other external organizations. In this scenario, we will simulate such an attack by using cqlsh to delete the Mutant Monitoring Keyspaces.

cqlsh
drop keyspace tracking;
drop keyspace catalog;

To verify that the keyspaces are gone, run the following command:

describe keyspaces;

The output should be:

system_traces system_schema system

Great, we only have the default keyspaces now. We can now learn how to restore the data.

Restoring the Data

To restore the data, we first need to re-create the keyspaces from the backup done previously:

docker exec -it mms_scylla-node1_1 bash
cqlsh -e "SOURCE '/schema.cql'"

Run the nodetool drain command to ensure the data is flushed to the SSTables:

nodetool drain

Next, we will need to kill the ScyllaDB processes in each container and delete the contents of the commit log. In each container, run the following commands:

pkill scylla
rm -rf /var/lib/scylla/commitlog/*

Repeat for containers mms_scylla-node2_1 and mms_scylla-node3_1.

Return back to mms_scylla-node1_1:

docker exec -it mms_scylla-node1_1 bash

The first snapshot restore will be for the catalog keyspace. We first need to find the original snapshot folder which is the oldest directory in /var/lib/scylla/data/catalog:

cd /var/lib/scylla/data/catalog/
ls -al
total 16 drwx------ 4 root root 4096 Apr 27 15:03 . drwxr-xr-x 1 scylla scylla 4096 Apr 27 14:40 .. drwx------ 3 root root 4096 Apr 27 15:03 mutant_data-32afab304a2c11e8b776000000000002 drwx------ 4 root root 4096 Apr 27 14:58 mutant_data-dcffd0a04a2811e89412000000000000

Luckily the directories are numbered sequentially. The original data directory with the snapshot is mutant_data-dcffd0a04a2811e89412000000000000 and the current data directory created after importing the schema is mutant_data-32afab304a2c11e8b776000000000002. We can now copy the contents from the snapshot directory to the new data directory:

cp -rf mutant_data-dcffd0a04a2811e89412000000000000/snapshots/1524840566260/* mutant_data-32afab304a2c11e8b776000000000002/

Repeat the process for restoring the tracking keyspace in /var/lib/scylla/data/tracking. When that is complete, we can start ScyllaDB in each container with the following commands:

export SCYLLA_CONF='/etc/scylla'
/usr/bin/scylla --developer-mode 1 --options-file /etc/scylla/scylla.yaml&
/usr/lib/scylla/jmx/scylla-jmx -l /usr/lib/scylla/jmx&

Repeat for mms_scylla-node2_1 and mms_scylla-node3_1. It will take a few minutes for the nodes to form a cluster. You should see messages like this:

INFO 2018-04-27 17:00:35,574 [shard 0] gossip - Connect seeds again ... (11 seconds passed) INFO 2018-04-27 17:00:36,574 [shard 0] gossip - Connect seeds again ... (12 seconds passed) INFO 2018-04-27 17:00:37,575 [shard 0] gossip - Connect seeds again ... (13 seconds passed)

When the nodes form a cluster, you should see similar messages as shown below:

INFO 2018-04-27 17:05:27,927 [shard 0] storage_service - Node 172.18.0.3 state jump to normal INFO 2018-04-27 17:05:28,024 [shard 0] gossip - Favor newly added node 172.18.0.3 INFO 2018-04-27 17:05:28,058 [shard 0] storage_service - Node 172.18.0.2 state jump to normal INFO 2018-04-27 17:05:29,034 [shard 0] gossip - Favor newly added node 172.18.0.2

After the nodes are online, we can run a repair and then verify that our data is restored properly with the following commands:

nodetool repair

You should see many messages like this:

INFO 2018-04-27 17:31:28,744 [shard 1] repair - Repair 302 out of 498 ranges, id=1, shard=1, keyspace=system_traces, table={sessions_time_idx, node_slow_log_time_idx, events, sessions, node_slow_log}, range=(1506223645386967449, 1537077568234766066]

This means that repair is syncing the data across the cluster. The entire process may take a few minutes. When the repair process is complete, we can run queries to ensure that the data is restored to both keyspaces:

cqlsh -e 'select * from tracking.tracking_data;'
cqlsh -e 'select * from catalog.mutant_data;'

Conclusion

In conclusion, we followed the directive from Division 3 to teach each ScyllaDB Administrator to learn how to backup and restore a cluster. The process began with backing up the schema followed by creating a snapshot of the data. We simulated an attack by deleting the keyspaces and then restoring and repairing the cluster. Stay safe out there and backup as much as you can!

Next Steps

Learn more about ScyllaDB from our product page.
See what our users are saying about ScyllaDB.
Download ScyllaDB. Check out our download page to run ScyllaDB on AWS, install it locally in a Virtual Machine, or run it in Docker.

Previous Post Next Post

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog