fbpx

Join us at P99 CONF, the virtual event on all things performance. Oct 19 & 20. Registration is free

See all blog posts

ScyllaDB Student Projects: CQL-over-WebSocket

Introduction: University of Warsaw Student Projects

Since 2019, ScyllaDB has cooperated with the University of Warsaw by mentoring teams of students in their Bachelor’s theses. After SeastarFS, ScyllaDB Rust Driver and many other interesting projects, the 2021/2022 edition of the program brings us CQL-over-WebSocket: a way of interacting with ScyllaDB straight through your browser.

Motivation

The most popular tool for interacting with ScyllaDB is cqlsh. It’s a Python-based command line tool allowing you to send CQL requests, browse their results, and so on. It’s based on the Python driver for Cassandra, and thus brings in a few dependencies: Python itself, the driver library and multiple indirect dependencies. Forcing users to install extra packages is hardly a “batteries included” experience, so we decided to try and create a terminal for interacting with ScyllaDB right from the Web browser.

Sounds easy, but there are a few major roadblocks for implementing such a tool. First of all, browsers do not allow pages to establish raw TCP connections. Instead, all communication is done either by issuing HTTP(S) requests, or via the WebSocket protocol. WebSocket is interesting because it allows full duplex connections directly from the browser and can multiplex streams of messages over TCP. If you are a JavaScript developer, check out this page. Also, this article goes into greater detail on specific differences between HTTP and WebSocket.

Secondly, while Cassandra already has a JavaScript driver, it’s meant for Node.js, and is not directly usable from the browser, not to mention that JavaScript lacks the type safety guarantees provided by more modern languages, e.g., TypeScript.

CQL is stateful in its nature, and clients usually establish long living sessions in order to communicate with the database. That makes HTTP a rather bad candidate for a building block, given its stateless nature and verbosity. CQL is also a binary protocol, so transferring data on top of a text-based one would cause way too much pain for performance-oriented developers.

Disqualifying HTTP makes the choice rather clear — WebSocket it is! Once this crucial design decision had been made, we could move forward to implementing all the layers:

  1. WebSocket server implementation for Seastar
  2. Injecting a WebSocket server into ScyllaDB
  3. Making a TypeScript driver for CQL, intended specifically for web browsers
  4. Implementing an in-browser cqlsh terminal

Part I: WebSocket Server Implementation for Seastar

In order to be able to receive WebSocket communication in ScyllaDB, we first need to provide a WebSocket server implementation for Seastar, our asynchronous C++ framework. Seastar already has a functional HTTP server, which was very helpful, as the WebSocket handshake is based on HTTP. As any other server implementation embedded into Seastar, this one also needed to be latency-friendly and fast.

Protocol Overview

WebSocket protocol is specified in RFC 6455 and the W3C WebSocket Living Standard. We also used a great tutorial on writing WebSocket servers published by Mozilla. In short, a minimal WebSocket server implementation needs to support the following procedures:

  • a HTTP-based handshake, which verifies that the client wishes to upgrade the connection to WebSocket, and also potentially specifies a subprotocol for the communication
  • exchanging data frames, which also includes decoding the frame headers, as well as masking and unmasking the data
  • handling PING and PONG messages, used in a WebSocket connection to provide heartbeat capabilities

Interface

The implementation of Seastar WebSocket server is still experimental and subject to change, but it’s already fully functional and allows the developers to implement their own custom WebSocket servers.

At the core of implementing your own WebSocket server in Seastar, there’s a concept of a “handler”, allowing you to specify how to handle an incoming stream of WebSocket data. Seastar takes care of automatically decoding the frames and unmasking the data, so the developers simply operate on incoming and outgoing streams, as if it was a plain TCP connection. Each handler is bound to a specific subprotocol. During handshake, if a client declares that they want to speak in a particular subprotocol, they will be rerouted to a matching handler. Here’s an example of how to implement an echo server, which returns exactly the same data it receives:

A full demo application implementing an echo server can be found here: https://github.com/scylladb/seastar/blob/master/demos/websocket_demo.cc

And here’s a minimal WebSocket client, which can be used for testing the server, runnable directly in IPython:

Part II: Injecting a WebSocket server into ScyllaDB

Now that we have WebSocket support in Seastar, all that’s left to do server-side is to inject such a server into ScyllaDB. Fortunately, in the case of CQL-over-WebSocket, the implementation is rather trivial. ScyllaDB already has a fully functional CQL server, so all we need to do is reroute all the decoded WebSocket traffic into our CQL server, and then send all the responses back, as-is, via WebSocket straight to the client. At the time of writing this blog post, this support is still not merged upstream, but it’s already available for review: https://github.com/scylladb/scylla/pull/10921

In order to enable the CQL-over-WebSocket server, it’s enough to simply specify the chosen port in scylla.yaml configuration file, e.g.:

cql_over_websocket_port: 8222

After that, ScyllaDB is capable of accepting WebSocket connections which declare their subprotocol to be “cql“.

Part III: TypeScript Driver for CQL

Neither ScyllaDB nor Cassandra had a native TypeScript driver, and the only existing JavaScript implementation was dedicated for server-side Node.js support, which made it unusable from the browser. Thus, we jumped at the opportunity and wrote a TypeScript driver from scratch. The result is here: https://github.com/dfilimonow/CQL-Driver.

While still very experimental and, notably, lacking concurrency support, load balancing policies, and other vital features, it’s also fully functional and capable of sending and receiving CQL requests straight from the browser via a WebSocket connection, which is exactly what we needed in order to implement a browser-only client.

Part IV: In-browser cqlsh Terminal

The frontend of our project is an in-browser implementation of a cqlsh-like terminal. It’s a static webpage based on TypeScript code compiled to JavaScript, coded with React and Material UI. It supports authentication, maintains shell history and presents query results in a table, with paging support.

The source code is available here: https://github.com/gbzaleski/ZPP-ScyllaDB-Front

Ideally, in order to provide full “batteries included” experience, the page could be served by ScyllaDB itself from a local address, but until that happens, here’s some preview screen captures from the current build of the frontend:

Note that since this code isn’t available in a production release of ScyllaDB, it requires a custom-compiled version of ScyllaDB running locally to get this to work. If you are curious about running this yourself, chat me up in the ScyllaDB user Slack community.

Source Code and the Paper

One of the outcomes of this project, aside from upstreaming changes into Seastar and ScyllaDB, is a Bachelor’s thesis. The paper is available here:

DOWNLOAD THE THESIS

and its official description and abstract can be found here:

READ THE ABSTRACT

Congratulations to those students who put in the time and effort to make this project so successful: Barłomiej Kozaryna, Daniel Filimonow, Andrzej Stalke, and Grzegorz Zaleski. Thanks, team! And, as always, thank you to the instructors and staff of the University of Warsaw for such a great partnership. We’re now looking forward to the 2022/2023 edition of the student projects program, bringing new cool innovations and improvements to our ScyllaDB open source community!

JOIN OUR OPEN SOURCE COMMUNITY

 

About Piotr Sarna

Piotr is a software engineer very keen on open-source projects and C++. He previously developed an open-source distributed file system (LizardFS) and had a brief adventure with Linux kernel during an apprenticeship at Samsung Electronics. Piotr graduated from University of Warsaw with MSc in Computer Science.