See all blog posts

Saving Thousands by Running ScyllaDB on EC2 Spot Instances

ScyllaDB & Spotinst – A Reliable and powerful combination

Spot instances can save you a lot of cash but what if you have a stateful service such as a NoSQL database? The main challenge is that every node in the cluster must sometimes maintain its entire state (IP, Data, and other configurations). This blog describes how a ScyllaDB cluster can be used on AWS’s EC2 Spot without losing consistency with the help of SpotInst’s prediction technology and advanced stateful features.

What is ScyllaDB

ScyllaDB is an open-source distributed NoSQL database. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It also supports the same protocols and file formats as Apache Cassandra. However, ScyllaDB is a completely rewritten implementation in the C++ programming language and not in Java like Apache Cassandra. ScyllaDB was also built using the Seastar framework which is an asynchronous programming library that replaces threads, shared memory, mapped files, and other classic Linux programming techniques. ScyllaDB also provides a unique Disk I/O Scheduler which also helps to boost performance.

Benchmarks conducted by engineers at both ScyllaDB and third parties have demonstrated that ScyllaDB outperforms Apache Cassandra by up to 10x!

How ScyllaDB replicates its data between nodes

ScyllaDB provides an Always-On availability. Automatic failover and replication across multiple nodes and data centers provide reliable fault tolerance.

ScyllaDB, like Cassandra, uses a type of protocol called “gossip” to exchange metadata about the identities of nodes in a cluster and whether they are up or down. Of course, since there is no single point of failure, there can be no single registry of a nodes state, so nodes must share information among themselves.

How to run ScyllaDB on Spotinst

When architecting a new ScyllaDB cluster, spot instances probably won’t be the architects’ first choice because of their inconsistent behavior and the fact that they can be terminated within 2 minutes notice makes it hard to manage a stable cluster. This is why Elastigroup is a classic choice for this kind of environment.

Elastigroup provides 100% availability of a service on top of the Spot market. By choosing the right bid for the right spot, historical and real time data is analyzed to choose the spot instances who offer the combination of lowest price and highest longevity. Using a predictive algorithm, changes in the Spot Market are identified 15 minutes in advance and a Spot replacement is triggered seamlessly without service interruption.

As part of the stateful feature, Elastigroup allows retaining the data volumes of the machine. Any EBS volume that is attached to the instance will be continuously snapshotted while the machine is running and will be used as the block device mapping configuration upon replacement.

 

In order to keep the same machine running in a failure scenario, you need to keep in mind several things:

  • Private IP – Make sure that the new machine has the same private IP so the gossip protocol can continue and communicate with this machine.
  • Volume – The node must be connected to the same storage and needs to have the same volume it had before. If not, the service will not be available.
  • Config file – scylla.yaml by default is located at /etc/scylla/scylla.yaml. This must be configured so the nodes will know their configuration information.

In the config file, you will need to configure a few key values such as

  • Cluster name – The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value
  • Listen_interface – The interface that ScyllaDB binds to for connecting to other nodes
  • Seeds – Seed nodes are used during startup to bootstrap the gossip process and join the cluster
  • Rpc_address – IP address of the interface for client connections (Thrift, CQL)
  • Broadcast_address – IP address of the interface for inter-node connections, as seen from other nodes in the cluster

Rack Considerations

For better availability of your data, it’s recommended to spread the nodes between A-Z’s. This can be configured by the value Ec2Snitch in both the scylla.yaml and cassandra-rackdc.properties files.

Let’s assume that you have a cluster set up in the us-east-1 region. If node1 is in us-east-1a and node2 is in us-east-1b, ScyllaDB would consider these nodes to be in two different racks within the same data center. Node1 would be considered rack 1a and node2 would be considered rack 1b.

We will now show how to install a six node cluster. Each data center will consist of three nodes and two seeds nodes. The IP’s are as follows:

U.S US-DC1

Node# Private IP 
Node1 192.168.1.1 (seed)
Node2 192.168.1.2 (seed)
Node3 192.168.1.3

U.S US-DC2

Node# Private IP 
Node4 192.168.1.4 (seed)
Node5 192.168.1.5 (seed)
Node6 192.168.1.6

On each ScyllaDB node, edit the scylla.yaml file. The following is an example of one node per DC:

U.S Data-center 1 – 192.168.1.1

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.201"
listen_address: "192.168.1.201"

U.S Data-center 2 – 192.168.1.4

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.4"
listen_address: "192.168.1.4"

On each ScyllaDB node, edit the cassandra-rackdc.properties file with the relevant rack and data center information:

Nodes 1-3

dc=us-east-1a
rack=RACK1

Nodes 4-6

dc=us-east-1b
rack=RACK2

Spotinst Console configuration

When configuring Elastigroup, it’s important to enable the stateful feature to persist your data and network configurations when replacing an instance in case of a spot interruption.

Open the Compute tab followed by the stateful feature and chose the features as shown in the screenshot below.

It’s also recommended to run the “nodetool drain” command in our shutdown script section to flush the commit log and gracefully stop accepting new connections.

Let’s see how this works in real life!

In the following image you can view a ScyllaDB cluster with 3 instances. All of the nodes are running on spot instances with our stateful feature configured.

Once one of the instances is interrupted, our stateful feature creates an instance with the Private IP and Root/Data volumes. As you can see below, the instances return to the cluster.

ScyllaDB & Spotinst together provides a strong combination of extreme performance and cost reduction. To get started, you can go to the Spotinst website and start your free trial today!

This blog post was also published on the Spotinst blog.

Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.