fbpx
See all blog posts

Scylla Developer Hackathon: Rust Driver

Scylla’s internal developer conference and hackathon this past year was a lot of fun and very productive. One of the projects we put our efforts into was a new shard-aware Rust driver. We’d like to share with you how far we’ve already gotten, and where we want to take the project next.

Motivation

The CQL protocol, used both by Scylla and Cassandra, already has some drivers for the Rust programming language on the market – cdrs, cassandra-rs and others. Still, we have rigorous expectations towards the drivers, and in particular we really wanted the following:

  • asynchronicity
  • support for prepared statements
  • token-aware routing
  • shard-aware routing (Scylla-specific optimization)
  • paging support

Also, it would be nice to have the driver written in pure Rust, without having to use any unsafe code. Since none of the existing drivers fulfilled our strict expectations, it was clear what we have to do — write our own async CQL driver in pure Rust! And the ScyllaDB hackathon was the perfect opportunity to do just that.

After intensive work during the hackathon, we completed the first version of the scylla-rust-driver and made it open-source available here:

The Team

Here’s our hackathon team, discussing crucial design decisions for the new most popular CQL driver for Rust!

Rust driver hackathon team members beginning with Piotr Sarna in the upper left and clockwise: Pekka Enberg, Piotr Dulikowski, and Kamil Braun.

Design and API example

Our driver exposes a set of asynchronous methods which can be used to establish a CQL session, send all kinds of CQL queries, receive and parse results into native Rust types, and much more. At the core of our driver, there’s a class which represents a CQL session. After establishing a connection to the cluster, the aforementioned session can be used to execute all kinds of requests.

Here’s what you can currently do with the API:

API features include:

  • connect to a cluster
  • refresh cluster topology information
    • how many nodes are in the cluster
    • how many nodes are up
    • which nodes are responsible for which data partitions
  • perform a raw CQL query
    • not paged or with custom page size
  • prepare a CQL statement
  • execute a prepared statement
    • not paged or with custom page size

To see more comprehensive examples, take a look at https://github.com/scylladb/scylla-rust-driver/tree/main/examples

A snapshot of the documentation is available here:

https://psarna.github.io/scylla-rust-driver-docs/scylla/

Implementing the driver with Rust async/await and Tokio

Rust language already has built-in support for asynchronous programming through the async/await mechanism. Additionally, we decided to base the driver on the Tokio framework, which provides an asynchronous runtime for Rust along with many useful features.

Our first step was to implement connection pools used to connect to Scylla/Cassandra clusters and to ensure the driver can handle both unprepared and prepared CQL statements. In order to provide that, we meticulously followed the CQL v4 protocol specification and implemented the initial request types: STARTUP, QUERY, PREPARE and EXECUTE.

Having such a solid footing, we split the work to also provide proper paging and the ability to fetch topology information from the cluster. The latter was needed to make our driver token-aware and shard-aware.

Token awareness allows the driver to route the request straight to the right coordinator node which owns the particular partition, which avoids inter-node communication and generally lowers the overhead. Shard awareness is one step further and is only supported when using the driver to connect to Scylla. The idea is that the request ends up not only on the right node, but also on the right CPU core, thus avoiding inter-core communication and minimizing the overhead even further. Read more about Scylla shard awareness and its positive effect on performance in a great blog which described this optimization for a Python driver.

Interlude: fixing murmur3 by implementing it with bugs

Wait, what? That’s right, during the hackathon we ended up needing to rewrite a murmur3 hashing algorithm with bugs in order to stay fully compatible with Apache Cassandra!

When performing token awareness tests, I noticed that around 30% of all requests ended up on a wrong node. That shouldn’t happen, so we quickly started an investigation. We meticulously checked:

  1. That the topology information fetched from Scylla is indeed correct,
    and consistent with the output of `nodetool describering`
  2. That the token computations return correct results on the first 100 keys,
    which makes it highly unlikely that token computation is to blame

… and we shouldn’t have stopped at checking just 100 keys! It turns out that the first failure happened after we rerun the test for the first 10,000 keys. Further investigation showed that a similar problem occurred for our Golang friends: https://github.com/gocql/gocql/issues/1033.

In short, Cassandra’s murmur3 implementation handwritten in Java operates on signed integers, while the original algorithm used unsigned ones. That creates some subtle differences when shifting the values, which in turn translates to around 30% of tokens being calculated inconsistently with the Cassandra way.

We had no choice but to stop using a comfy crate from crates.io which provided us with a neat murmur3 algorithm implementation and instead we spent the whole night rewriting the algorithm by hand, bugs included™!

Results

We ran two simple benchmarks to see how scylla-rust-driver compares to other existing drivers.

The benchmark’s goal was to send 10 million prepared statements as fast as possible, given a fixed concurrency of 1,024. The usage of token-aware and shard-aware routing was allowed, if supported by the driver. All drivers were compared against the same 3-node Scylla cluster, each node having 2 shards. We compared against GoCQL (enhanced by us with shard awareness) and cdrs.

gocql scylla-rust-driver cdrs
Writes real 0m59.658s
user 15m21.846s
sys 2m32.438s
real 0m18.310s
user 1m44.170s
sys 0m36.318s
real 12m34.761s
user 2m14.253s
sys 6m21.757s
Reads real 1m6.276s
user 17m35.803s
sys 2m46.497s
real 0m23.928s
user 1m52.654s
sys 0m41.791s
real 12m50.929s
user 3m7.008s
sys 6m43.048s
Mixed
(reads and writes)
real 1m3.409s
user 17m23.127s
sys 2m29.905s
real 0m19.715s
user 1m51.372s
sys 0m35.209s
real 13m28.133s
user 2m44.918s
sys 6m42.705s

Output of Linux’ time command for processing 10 million prepared statements with a fixed concurrency of 1,024 using different drivers.

Source code of all benchmarks:

Future plans

The future of our project is very bright. As a matter of fact, it’s already scheduled for another year of hands-on work! A team of four talented students from the University of Warsaw will continue developing the driver from where we left off, as part of the ZPP program. This is the second time Scylla is proudly taking part in the program as a mentor. Here are the other projects from last year:

We also have an official roadmap, updated version of which can always be found in our repository (https://github.com/scylladb/scylla-rust-driver):

Done:

  • driver-side metrics
    • number of sent requests
    • latency percentiles of sent requests
    • number of errors
  • handling topology changes and presenting them to the user
  • CQL batch statements
  • custom error types
    • robust handling of various errors (e.g. repreparing statements)

In progress:

  • CQL authentication support
  • TLS support
  • configurable load balancing algorithms
  • configurable retry policies
  • query builders

Backlog:

  • CQL tracing
  • [additional] performance benchmarks against other drivers
    • cdrs
    • gocql
  • handling events pushed by the server
  • speculative execution
  • expanding the documentation
  • more correctness tests
  • more integration tests – preferably using Scylla’s ccm framework and Python
  • preparing the work to be published on crates.io
Piotr Sarna

About Piotr Sarna

Piotr is a software engineer very keen on open-source projects and C++. He previously developed an open-source distributed file system (LizardFS) and had a brief adventure with Linux kernel during an apprenticeship at Samsung Electronics. Piotr graduated from University of Warsaw with MSc in Computer Science.