Join ScyllaDB Co-Founders Dor Laor & Avi Kivity on May 28th: What Real-Time AI Requires from Your Database. Register Now!

Menu

Products
- Quick Links
  
  Real-Time AI
  
  ScyllaDB powers real-time AI with low latency, high throughput, and billion-scale data.
  Learn More
  
  Is ScyllaDB right for me?
  
  ScyllaDB is purpose-built for data-intensive apps that require high throughput & predictable low latency.
  Learn More
- Products
- Compare
- Related Technologies
Developers
Customers
Resources
- Featured Resource
  
  ScyllaDB University
  
  Level up your skills with our free NoSQL database courses.
  Take a Course
  
  ScyllaDB Blog
  
  Our blog keeps you up to date with recent news about the ScyllaDB NoSQL database and related technologies, success stories and developer how-tos.
  Read the Blog
- Resource Center
- Events
- Compare
Pricing
Contact Us
Chat Now
Get Started
Sign In
Search
Button Links

See all blog posts

Saving Thousands by Running ScyllaDB on EC2 Spot Instances

By Phillip Tribble

August 31, 2017

Bluesky Reddit X LinkedIn

Subscribe to Receive Blog Updates

Subscription Categories

All Posts
Events
Integrations
News
Performance
Releases
ScyllaDB
Seastar
Tutorials
User Stories
Developers Blog

Bluesky Reddit X LinkedIn

ScyllaDB & Spotinst – A Reliable and powerful combination

Spot instances can save you a lot of cash but what if you have a stateful service such as a NoSQL database? The main challenge is that every node in the cluster must sometimes maintain its entire state (IP, Data, and other configurations). This blog describes how a ScyllaDB cluster can be used on AWS’s EC2 Spot without losing consistency with the help of SpotInst’s prediction technology and advanced stateful features.

What is ScyllaDB

ScyllaDB is an open-source distributed NoSQL database. It was designed to be compatible with Apache Cassandra while achieving significantly higher throughputs and lower latencies. It also supports the same protocols and file formats as Apache Cassandra. However, ScyllaDB is a completely rewritten implementation in the C++ programming language and not in Java like Apache Cassandra. ScyllaDB was also built using the Seastar framework which is an asynchronous programming library that replaces threads, shared memory, mapped files, and other classic Linux programming techniques. ScyllaDB also provides a unique Disk I/O Scheduler which also helps to boost performance.

Benchmarks conducted by engineers at both ScyllaDB and third parties have demonstrated that ScyllaDB outperforms Apache Cassandra by up to 10x!

How ScyllaDB replicates its data between nodes

ScyllaDB provides an Always-On availability. Automatic failover and replication across multiple nodes and data centers provide reliable fault tolerance.

ScyllaDB, like Cassandra, uses a type of protocol called “gossip” to exchange metadata about the identities of nodes in a cluster and whether they are up or down. Of course, since there is no single point of failure, there can be no single registry of a nodes state, so nodes must share information among themselves.

How to run ScyllaDB on Spotinst

When architecting a new ScyllaDB cluster, spot instances probably won’t be the architects’ first choice because of their inconsistent behavior and the fact that they can be terminated within 2 minutes notice makes it hard to manage a stable cluster. This is why Elastigroup is a classic choice for this kind of environment.

Elastigroup provides 100% availability of a service on top of the Spot market. By choosing the right bid for the right spot, historical and real time data is analyzed to choose the spot instances who offer the combination of lowest price and highest longevity. Using a predictive algorithm, changes in the Spot Market are identified 15 minutes in advance and a Spot replacement is triggered seamlessly without service interruption.

As part of the stateful feature, Elastigroup allows retaining the data volumes of the machine. Any EBS volume that is attached to the instance will be continuously snapshotted while the machine is running and will be used as the block device mapping configuration upon replacement.

In order to keep the same machine running in a failure scenario, you need to keep in mind several things:

Private IP – Make sure that the new machine has the same private IP so the gossip protocol can continue and communicate with this machine.
Volume – The node must be connected to the same storage and needs to have the same volume it had before. If not, the service will not be available.
Config file – scylla.yaml by default is located at /etc/scylla/scylla.yaml. This must be configured so the nodes will know their configuration information.

In the config file, you will need to configure a few key values such as

Cluster name – The name of the cluster. This setting prevents nodes in one logical cluster from joining another. All nodes in a cluster must have the same value
Listen_interface – The interface that ScyllaDB binds to for connecting to other nodes
Seeds – Seed nodes are used during startup to bootstrap the gossip process and join the cluster
Rpc_address – IP address of the interface for client connections (Thrift, CQL)
Broadcast_address – IP address of the interface for inter-node connections, as seen from other nodes in the cluster

Rack Considerations

For better availability of your data, it’s recommended to spread the nodes between A-Z’s. This can be configured by the value Ec2Snitch in both the scylla.yaml and cassandra-rackdc.properties files.

Let’s assume that you have a cluster set up in the us-east-1 region. If node1 is in us-east-1a and node2 is in us-east-1b, ScyllaDB would consider these nodes to be in two different racks within the same data center. Node1 would be considered rack 1a and node2 would be considered rack 1b.

We will now show how to install a six node cluster. Each data center will consist of three nodes and two seeds nodes. The IP’s are as follows:

U.S US-DC1

Node# Private IP 
Node1 192.168.1.1 (seed)
Node2 192.168.1.2 (seed)
Node3 192.168.1.3

U.S US-DC2

Node# Private IP 
Node4 192.168.1.4 (seed)
Node5 192.168.1.5 (seed)
Node6 192.168.1.6

On each ScyllaDB node, edit the scylla.yaml file. The following is an example of one node per DC:

U.S Data-center 1 – 192.168.1.1

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.201"
listen_address: "192.168.1.201"

U.S Data-center 2 – 192.168.1.4

cluster_name: 'ScyllaDB_Cluster'
seeds: "192.168.1.1,192.168.1.2,192.168.1.4,192.168.1.5”
endpoint_snitch: Ec2Snitch
rpc_address: "192.168.1.4"
listen_address: "192.168.1.4"

On each ScyllaDB node, edit the cassandra-rackdc.properties file with the relevant rack and data center information:

Nodes 1-3

dc=us-east-1a
rack=RACK1

Nodes 4-6

dc=us-east-1b
rack=RACK2

Spotinst Console configuration

When configuring Elastigroup, it’s important to enable the stateful feature to persist your data and network configurations when replacing an instance in case of a spot interruption.

Open the Compute tab followed by the stateful feature and chose the features as shown in the screenshot below.

It’s also recommended to run the “nodetool drain” command in our shutdown script section to flush the commit log and gracefully stop accepting new connections.

Let’s see how this works in real life!

In the following image you can view a ScyllaDB cluster with 3 instances. All of the nodes are running on spot instances with our stateful feature configured.

Once one of the instances is interrupted, our stateful feature creates an instance with the Private IP and Root/Data volumes. As you can see below, the instances return to the cluster.

ScyllaDB & Spotinst together provides a strong combination of extreme performance and cost reduction. To get started, you can go to the Spotinst website and start your free trial today!

This blog post was also published on the Spotinst blog.

Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.

oreilly-book-ads

AWS deployment 3rd-party-integration cloud

About Phillip Tribble

View all posts by this author

Previous Post Next Post

Related Posts

ScyllaDB Cloud on AWS I8g and I8ge: 2x Throughput, Lower Latency, Zero Extra Cost

5 Factors when Selecting a Low Latency, High Performance Database

Benchmarking ScyllaDB 5.0 on AWS i4i.4xlarge

Start scaling with the world's best high performance NoSQL database.

Product

Product Overview
Architecture
ScyllaDB Enterprise
ScyllaDB Cloud
Benchmarks
Solutions
Release Notes
Install
Tools
Pricing

Resources

Documentation
Online Training
NoSQL Guides
Resource Center
Events
Customer Support
Customer Portal
Community Projects
Blog

Company

About Us
Team
Customers
Partners
News
Media Kit
Careers
Contact Us

Social

GitHub
LinkedIn
Twitter
YouTube
Facebook
Forums
Slack

Legal

Terms of Use
Privacy Policy
Data Subject Request Form
Data Privacy Framework
CCPA Privacy Notice
Cookie Policy
Trust Center
Legal Center
Contact

2026 ©ScyllaDB | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.