Fast Database Framework and Network Stack

Scylla’s Seastar framework network stack - built on the Data Plane Development Kit (DPDK) - is designed to squeeze the most out of modern hardware. No different from the other Scylla components, we optimize the heck out of multicore CPUs, efficiently manage memory and similarly, maximize network throughput to deliver an ultra fast NoSQL database to process your big data.

Flexible and Powerful Database Networking Architecture

Scylla supports two different networking modes—our own native Seastar network stack and the traditional Linux network stack to deliver low-latency, high-throughput TCP connections.

Seastar Framework: Highly optimized network stack

Seastar networking is a framework including a network stack that runs in userspace. It utilizes the open source DPDK driver and has additional TCP/IP code implemented on top. The network stack itself is transparently sharded and thus multiple TCP/IP instances are running on each core. This mode provides low-latency, high-throughput networking. No system calls are required for communicating, and no data copying ever occurs. This Seastar framework implementation ensures best performance. Seastar’s DPDK mode works on physical machines, virtual machines (either with devise assignment or with para virtual devices) and containers.

Linux standard socket APIs

In this mode, Scylla consumes ordinary Linux networking APIs. Scylla preserves this mode for ease of application development and trivial deployment. Users can switch between modes dynamically and pick the networking mode that works best at first time usage or production installation.

Why a custom network stack helps increase database performance

  • Separation of the network stack into kernel space means that costly context switching is needed to perform network operations, and that data copies must be performed to transfer data from kernel buffers to userspace buffers and vice-versa. Scylla, using DPDK in userspace, avoids the costly context switching seen in other database systems.
  • Linux is a time-sharing system, and so must rely on slow, expensive interrupts to notify the kernel that there are new packets to be processed. Scylla, with its own built-in schedulers and network IO, avoids burdening the Linux kernel with such interrupts.
  • Linux is heavily threaded, so all data structures are protected with locks. While a huge effort has made Linux very scalable, this is not without limitations and contention occurs at large core counts. Even without contention, the locking primitives themselves are relatively slow and impact network performance. Since Scylla runs single-threaded, sharded per hyperthread, it has a shared-nothing architecture to avoid lock contentions.
  • Apache Cassandra compaction thrashes the page cache, because it reads and writes everything, and after compaction the most frequently used data is likely to no longer be in the cache. Apache Cassandra has some workarounds for this problem, but the row cache is the most direct solution: compaction simply doesn’t touch the row cache, which remains populated with relevant data.

Not using the default Linux network stack wasn’t a “not-invented-here” (NIH) decision for ScyllaDB. Our developer team is well experienced with OS development and we wanted to focus on optimizing database performance. So early on with the Seastar framework development we noticed that our sample Memcache application spent most of its time in the kernel. We couldn’t resist putting Seastar sharing and reactor design into action and we quickly coded our very own TCP/IP code. As you can see for Memcached, it performs miraculously well when one compares Seastar-native TCP vs Linux: Memcached Benchmark.

DPDK, the userspace network toolkit, is designed specifically for fast packet processing, usually in less than 80 CPU cycles per packet. It integrates with Linux seamlessly in order to take advantage of high-performance hardware. Traditionally L2/L3 software appliances use DPDK as a means to replace custom hardware with an of-the-shelf x86 one. To the best of our knowledge, Scylla and Seastar are the first products to integrate a higher layer application with DPDK kernel bypass. The ScyllaDB team is going to leverage this unique property and apply it to areas further beyond the basic database.

Let’s do this

Getting started takes only a few minutes. Scylla has an installer for every major platform. If you get stuck, we’re here to help.