Jul25

Thrift Support in Scylla 1.3

Subscribe to Our Blog

Thrift is the original Cassandra protocol. It’s a software framework for back-end services developed at Facebook, and now an Apache project. New Cassandra applications use the CQL query language, but because Thrift was heavily used in Apache Cassandra’s early years, several important integrations depend on it. Some examples are:

Newer integrations such as Apache Spark are all CQL. However, some integrations, and in-house applications at some companies also depend on Thrift. The Scylla team has therefore decided not to require future Scylla users to update their applications to CQL. We will instead support Thrift natively in Scylla 1.3 and future versions.

Scylla 1.3 is scheduled for release soon, and brings Thrift support to Scylla.

Implementation

Scylla is internally designed in a CQL-centric way. Instead of changing the internal model, the Thrift API is implemented on top of CQL: Thrift column families are mapped onto CQL tables, just as Thrift operations are implemented as operations on those tables.

To illustrate this, let’s consider a simple example of data that you might want to have in Scylla: a list of stories for a social reader application. In Thrift, using the deprecated cassandra-cli, we could define a story column family as follows:

create keyspace ks;
use ks;

create column family story 
with key_validation_class = UTF8Type
  and comparator = UTF8Type
  and column_metadata = [
    {column_name: date, validation_class: DateType },
    {column_name: content, validation_class: UTF8Type }
  ];

Listing 1: A Thrift column family for news stories in a simple “social reader” application

This is a static column family, as it defines all columns up-front, almost like you would when defining a CQL table. Using cqlsh we can describe the CQL table Scylla creates from this column family:

cqlsh> desc ks.story;

CREATE TABLE ks.story (
    key1 text PRIMARY KEY,
    content text,
    date timestamp 
) WITH COMPACT STORAGE

Listing 2: The equivalent CQL table, created by Scylla

This table maps precisely to the Thrift column family. Since we didn’t specify a key_alias, Scylla names the primary key key1 by default. We can use CQL to update the column’s name:

cqlsh> alter table ks.story rename key1 to url;
cqlsh> desc ks.story;

CREATE TABLE ks.story (
    url text PRIMARY KEY,
    content text,
    date timestamp
) WITH COMPACT STORAGE

Listing 4: Renaming default column names with cqlsh

Not all data can be represented with static column families. Another column family is the feed (which, in the social reader, can be RSS feeds, lists of URLs posted to a social site such as Twitter, or lists of stories submitted by a user). A feed has a feed URL, a fetch time, and a story URL.

 create column family feed
  with key_validation_class = UTF8Type
    and comparator = DateType
    and default_validation_class = UTF8Type;

Listing 5: A column family for news feeds

This is a dynamic column family, as each row may contain completely different sets of columns. For example, a feed for a particular user will look different from another user’s. The underlying CQL table is the following:

cqlsh> desc ks.feed;

CREATE TABLE ks.feed (
    key1 text,
    column1 timestamp,
    value text,
    PRIMARY KEY (key1, column1)
) WITH COMPACT STORAGE
    AND CLUSTERING ORDER BY (column1 ASC)

Listing 6: The equivalent CQL table, created by Scylla

In addition to having a partition key, the CQL table for a dynamic column family has a clustering key (something that doesn’t exist in Thrift) and a regular column, which respectively hold the names and values of dynamically inserted columns. Their names can be renamed as well, transparently to their usage from the Thrift API.

We can use cassandra-cli to insert and query some data over Thrift:

set story['fakeurl']['content'] = 'fake post';
Value inserted.
Elapsed time: 7.85 msec(s).
set story['fakeurl']['date'] = '2016-07-25 02:26:04';
Value inserted.
Elapsed time: 1.21 msec(s).

list story;
Using default limit of 100
Using default cell limit of 100
-------------------
RowKey: fakeurl
=> (name=content, value=fake post, timestamp=1469055076976000)
=> (name=date, value=2016-07-25 02:26Z, timestamp=1469055131379000)

The same results can be obtained by querying that data through CQL:

cqlsh:ks> select * from story;

 url     | content   | date
---------+-----------+--------------------------
 fakeurl | fake post | 2016-07-25 02:26:04+0000

While all tables created through the Thrift API are consumable from CQL, not all CQL tables are consumable from Thrift. Any CQL table not created with the with compact storage option cannot be used from Thrift.

Performance

Thrift performance is on par with CQL. The following benchmark consists of running cassandra-stress workloads against Scylla — write, read and mixed workloads, in that order — each executing 10M ops (with 100 threads). We establish a baseline by running cassandra-stress in CQL3 mode, and then repeat the workloads in Thrift mode. Note that cassandra-stress ran on the same machine as Scylla, with the Scylla process restricted to 4 CPUs with taskset and the cassandra-stress one to 12 CPUs; while the numbers don’t reflect what would be expected in a proper deployment, they nevertheless present a valid comparison of the two APIs.

Write workload:
                    +--------+--------+
                    | CQL3   | Thrift |
+-------------------+--------+--------+
| Rate              | 135865 | 129880 |
+-------------------+--------+--------+
| Latency 99th %ile | 2.8    | 2.9    |
+-------------------+--------+--------+
| Latency max       | 251.9  | 314.9  |
+-------------------+--------+--------+
Read workload:
                    +--------+--------+
                    | CQL3   | Thrift |
+-------------------+--------+--------+
| Rate              | 115270 | 116725 |
+-------------------+--------+--------+
| Latency 99th %ile | 1.7    | 1.7    |
+-------------------+--------+--------+
| Latency max       | 71.0   | 28.9   |
+-------------------+--------+--------+
Mixed workload:
                    +-------+--------+
                    | CQL3  | Thrift |
+-------------------+-------+--------+
| Rate              | 88077 | 89306  |
+-------------------+-------+--------+
| Latency 99th %ile | 5.3   | 5.9    |
+-------------------+-------+--------+
| Latency max       | 197.6 | 201.7  |
+-------------------+-------+--------+

The specs of the server are:

  • Fedora 24
  • Core I7-5960X @ 3Ghz (C-states disabled, XMP and Turbo enabled)
  • 64GB DDR4 2800Mhz
  • Samsung 950 Pro PCIe 256GB SSD

There are no independent benchmarks yet. Watch this space for more info on Scylla’s Thrift performance.

Limitations

Scylla supports the commonly used features of Thrift, but some functionality is not present in this release. Namely, we do not support:

  • Super columns;
  • Mixed column families, where you dynamically add columns to a static column family, are not supported;
  • Compressed query strings for the execute_cql3_query and prepare_cql3_query verbs;
  • Composite regular column names and values (for static column families).

Thrift-related issues are tagged [thrift-api] on the Scylla project on GitHub, and a full list of currently open issues can be seen here.

In the next installment of this series, we cover KairosDB, an open source time series database, and how it works with Scylla’s Thrift support. (Part 2)

Scylla Summit

Thrift at Scylla Summit

Thrift is one of many topics to be covered at the upcoming Scylla Summit. Come to Scylla Summit on September 6th, in San Jose, California, to learn more about Thrift and other new and upcoming Scylla features—along with info on how companies like IBM, Outbrain, Samsung SDS, Appnexus, Hulu, and Mogujie are using Scylla for better performance and faster development. Meet Scylla developers and devops users who will cover Scylla design, best practices, advanced tooling and future roadmap items.

Going to Cassandra Summit? Add another day of NoSQL Scylla Summit takes place the day before Cassandra Summit begins and takes place at the Hilton San Jose, adjacent to the San Jose convention Center. Lunch and refreshments are provided.

Register for Scylla Summit

Duarte NunesAbout Duarte Nunes

Duarte Nunes is a Software Engineer working on ScyllaDB. He has a background in concurrent programming, distributed systems and low-latency software. Prior to ScyllaDB, he worked on MidoNet, an open source distributed network virtualization platform, making it fast and scalable.


Tags: deep-dive, Thrift