See all blog posts

A Shard-Aware ScyllaDB C/C++ Driver

We are happy to announce the first release of a shard-aware C/C++ driver (connector library). It’s an API-compatible fork of Datastax cpp-driver 2.15.2, currently packaged for x86_64 CentOS 7 and Ubuntu 18.04 (with more to come!). It’s also easily compilable on most Linux distributions. The driver still works with Apache Cassandra and DataStax Enterprise (DSE), but when paired with ScyllaDB enables shard-aware queries, delivering even greater performance than before.

GET THE SCYLLA SHARD-AWARE C/C++ DRIVER

Why?

ScyllaDB C/C++ driver was forked from Datastax driver with the view to adding ScyllaDB-specific features. It’s a rich, fast, async, battle-proven piece of C++ code, although its interface is in pure C, which opens a lot of possibilities, like creating Python or Erlang bindings. Its newest feature, which will be discussed in this post, is shard-awareness.

We’ve already written a neat introduction into this concept so, instead of repeating all of that here, we’ll just briefly remind you of the key topics. In short: shard-awareness is the ability of the client application to query specific CPUs within the ScyllaDB cluster. As you know, ScyllaDB’s code follows the concept of a shard-per-core architecture, which means that every node is a multithreaded process whose every thread performs some relatively independent work. It also means that each piece of data stored in your ScyllaDB DB is bound to specific CPU(s). Unsurprisingly, also every CQL connection is served by a specific CPU, as you can see by querying system.clients table:

cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;

 address   | port  | client_type | shard_id
-----------+-------+-------------+----------
 127.0.0.1 | 46730 |         cql |    2
 127.0.0.1 | 46732 |         cql |    3
 127.0.0.1 | 46768 |         cql |    0
 127.0.0.1 | 46770 |         cql |    1

(4 rows)

Each CQL connection is bound to a specific ScyllaDB shard (column shard_id). The contents of this table are local (unique to every node).

By default, scylla-cpp-driver opens a CQL connection to every shard on every node, so it can communicate directly with any CPU in the cluster. Now — by locally hashing the partition key in your query — the driver determines which shard, on which node, would be the best for execution of your statement. Then it’s just a matter of pumping the query up the right CQL connection. The best part is that all of that happens behind the scenes: client code is unaffected, so to get this ability you’ll just need to re-link your C/C++ application with libscylla-cpp-driver!

Note: The driver doesn’t do client-side parsing of queries, so it must rely on ScyllaDB to identify the partition keys in your queries. That’s why, for now, only prepared statements are shard-aware.

Note: To benefit from shard-awareness, token-aware routing needs to be enabled on the Cluster object. Don’t worry; it’s on by default.

How?

ScyllaDB nodes extend the CQL SUPPORTED message with information about their shard count, current shard ID, sharding algorithm, etc. Knowing all this, the client can take one of two approaches to establish the pools of per-shard-connections: “basic” and “advanced”.

In the basic mode driver just opens connections until all shards are reached and the goal set by cass_cluster_set_core_connections_per_host() is met. It works because ScyllaDB node assigns CQL connections to the least busy shards, so with sufficiently many connections all shards will eventually be hit. The downside to this is that usually some number of connections are opened unnecessarily, which led us to the idea of “advanced” shard selection.

In the advanced mode, available since ScyllaDB Open Source 4.3, CQL SUPPORTED message contains a field named SCYLLA_SHARD_AWARE_PORT. If set, it indicates that ScyllaDB listens to CQL connections on an additional port (by default 19042). This port works very much like the traditional 9042, with only one important difference: TCP connections incoming to this port are routed to specific shards depending on the client-side port number (aka. source port, local port). Clients can explicitly choose the target shard by setting the local port number to a value of N × node_shard_count + desired_shard_number where the allowed range for N is platform-specific. A call to:

cass_cluster_set_local_port_range(cluster, 50000, 60000);

tells the driver to use advanced shard-awareness and to bind its local sockets to ports from the range [50000, 60000). That’s all you need to do.

Note: Of course, the new port also does have a secure (SSL-ed) counterpart.

Note: Advanced shard selection falls back to the basic mode automatically in case of failure. Failure can happen for a number of reasons, such as when a client hides behind a NAT gateway or when SCYLLA_SHARD_AWARE_PORT is blocked. The fallback slows down the process of reconnecting, so we don’t recommend relying on it.

Installation

Our C/C++ driver releases are hosted on GitHub. If you are lucky to run CentOS 7, Ubuntu 18.04 or their relatives, you can simply download and install our packages along with their dependencies.

# Installation: CentOS 7
sudo yum install -y epel-release
sudo yum -y install libuv openssl zlib
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm
sudo yum localinstall -y scylla-cpp-driver-2.15.2-1.el7.x86_64.rpm scylla-cpp-driver-devel-2.15.2-1.el7.x86_64.rpm

# Installation: Ubuntu 18.04
wget https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver_2.15.2-1_amd64.deb https://github.com/scylladb/cpp-driver/releases/download/2.15.2-1/scylla-cpp-driver-dev_2.15.2-1_amd64.deb
sudo apt-get update
sudo apt-get install -y ./scylla-cpp-driver_2.15.2-1_amd64.deb ./scylla-cpp-driver-dev_2.15.2-1_amd64.deb

For those working on cutting-edge Linux distributions we recommend compilation from sources. Once you have the tools (cmake, make, g++) and dependencies (libuv-dev, zlib-dev, openssl-dev) installed please check out the latest tagged revision and proceed according to the linked instruction.

Example

In this short example we’ll just open a number of connections and exit after a minute of sleep.

// conn.cpp - demonstration of advanced shard selection in C++
#include <cassandra.h>
#include <iostream>
#include <chrono>
#include <thread>
 
int main(int argc, char* argv[]) {
  CassCluster* cluster = cass_cluster_new();
  CassSession* session = cass_session_new();
   
  cass_cluster_set_num_threads_io(cluster, 2);
  cass_cluster_set_contact_points(cluster, "127.0.0.1");
  cass_cluster_set_core_connections_per_host(cluster, 7);
  // Now enable advanced shard-awareness
  cass_cluster_set_local_port_range(cluster, 50000, 60000);
   
  CassFuture* connect_future = cass_session_connect(session, cluster);
   
  if (cass_future_error_code(connect_future) == CASS_OK) {
    std::cout << "Connected\n";
    std::this_thread::sleep_for(std::chrono::seconds(60)); // [1]
  } else {
    std::cout << "Connection ERROR\n";
  }
   
  // Cleanup
  cass_future_free(connect_future);
  cass_session_free(session);
  cass_cluster_free(cluster);
  }

If you installed the driver from packages, compilation is a breeze:

g++ conn.cpp -lscylla-cpp-driver -o conn

If you built the driver from sources but you hesitate to install it system-wide, you can still compile the example with:

g++ conn.cpp /cpp-driver/build/libscylla-cpp-driver.so -Wl,-rpath,/cpp-driver/build/ -I /cpp-driver/include/ -o conn

Assuming that you have a running 1-node ScyllaDB cluster on localhost, the code above should greet you with “===== Using optimized driver!!! =====“. Then it would establish a session with 2 threads, each owning 7 connections (or the closest greater multiple of shard count on that instance of ScyllaDB). While the sleep [1] takes place, we can use cqlsh to peek at the client-side port numbers and at the distribution of connections among shards:

cqlsh> SELECT address, port, client_type, shard_id FROM system.clients;
 
 address   | port  | client_type | shard_id
-----------+-------+-------------+----------
 127.0.0.1 | 37534 |         cql | 0
 127.0.0.1 | 37536 |         cql | 1
 127.0.0.1 | 38207 |         cql | 4
 127.0.0.1 | 50000 |         cql | 0
 127.0.0.1 | 50001 |         cql | 1
 127.0.0.1 | 50002 |         cql | 2
 127.0.0.1 | 50003 |         cql | 3
 127.0.0.1 | 50004 |         cql | 4
 127.0.0.1 | 50005 |         cql | 5
 127.0.0.1 | 50006 |         cql | 6
 127.0.0.1 | 50007 |         cql | 7
 127.0.0.1 | 50008 |         cql | 0
 127.0.0.1 | 50009 |         cql | 1
 127.0.0.1 | 50010 |         cql | 2
 127.0.0.1 | 50011 |         cql | 3
 127.0.0.1 | 50012 |         cql | 4
 127.0.0.1 | 50013 |         cql | 5
 127.0.0.1 | 50014 |         cql | 6
 127.0.0.1 | 50015 |         cql | 7

(19 rows)

Results of running conn.cpp against an 8-shard, single-node cluster of ScyllaDB 4.3. There are two connections from cqlsh and one “control connection” from scylla-cpp-driver. Then 16 pooled connections follow (8 per each of 2 client threads), because the requested number of 7 was rounded up to the shard count. Please observe the advanced shard-awareness in action: the value in the port column determines shard_id of a pooled connection.

Summary

Shard-awareness has been present in Java, Go and Python drivers for a while, and we’re now actively developing a shard-aware Rust driver. Today we are delighted to welcome the C/C++ driver as a member of this honorable group. If you use the Datastax cpp-driver with ScyllaDB you should switch to scylla-cpp-driver right now — it’s basically a free performance boost for your applications. And there is more to come: CDC partitioner, LWT support, RAII interface… The future will definitely bring these and a few other reasons to follow our driver!

GET THE SHARD-AWARE C/C++ DRIVER FOR SCYLLA

Juliusz Stasiewicz

About Juliusz Stasiewicz

Juliusz Stasiewicz is a Software Engineer at ScyllaDB. He has also worked at a number of companies on C++ programming, including Samsung and DataStax.