Register for access to all 30+ on demand sessions.

Enter your email to watch this video and access the slide deck from the ScyllaDB Summit 2022 livestream. You’ll also get access to all available recordings and slides.

In This NoSQL Presentation

WebAssembly, also known as Wasm, is a binary format for representing executable code, designed to be easily embeddable into other projects. It's also a perfect candidate for a user-defined functions (UDFs) back-end due to its ease of integration, performance and popularity. ScyllaDB already supports user-defined functions expressed in WebAssembly in experimental mode, based on an open-source runtime written natively in Rust - Wasmtime. This talk will cover a few examples of how to create Wasm functions in ScyllaDB, how to combine them into powerful user-defined aggregates and what are the future plans of integrating with Wasmtime and Rust even further.

Piotr Sarna, Software Engineer, ScyllaDB

Piotr is a software engineer very keen on open-source projects, C++ and Rust. He previously developed an open-source distributed file system (LizardFS) and had a brief adventure with Linux kernel during an apprenticeship at Samsung Electronics. Piotr graduated from the University of Warsaw with MSc in Computer Science.

Video Transcript

Hi, my name is Piotr, and today, I will be talking about how ScyllaDB integrates with WebAssembly including current use cases and the future plans. I work on ScyllaDB Core daily and lately, I also started leading and maintaining the ScyllaDB Rust driver project, and I also worked on WebAssembly integration, kind of like a weekend project. Let’s start by a short introduction to WebAssembly.

WebAssembly is a format for executable code, designed first and foremost to be portable and embeddable. As its name suggests, it’s a good fit for web applications, but not only that. It’s generally a good choice for an embedded language, especially that it’s also quite fast. One of WebAssembly’s core features is isolation. Each module is executed in a sandboxed environment separate from the host, the host application, and such a limited trust environment is really desired for embedded language because it vastly reduces the risk of somebody running malicious code from within your project. WASM is a binary format, but it also specifies a human readable text format called WebAssembly Text Format, or WAT. In order to integrate WebAssembly into a project, one needs to pick an engine to use. The most popular one is Google’s v8 implemented in C++ with support for Javascript and a very, very rich feature set. It’s unfortunately also quite heavy and not very easy to integrate with asynchronous frameworks like C*, which is a building block of ScyllaDB. On the other side of this slide is Wasmtime, a smaller but not small project implemented in Rust. It only supports WebAssembly, not Javascript, which also makes it more lightweight, and it has good support for asynchronous environments and has C++ bindings, which makes it a good fit for injecting into ScyllaDB. For a proof-of-concept implementation in ScyllaDB, Wasmtime was used due to being less heavyweight than v8 and for having more potential for being asynch-friendly. Right now, we simply use the existing C++ bindings provided by Wasmtime, but we already had it in our road map to implement this whole integration layer in Rust and then compile it directly into ScyllaDB. And so how would one create a WebAssembly program? First of all, modules can be coded directly in the WebAssembly-type format. It’s not the most convenient way, at least for me, due to Wasm’s limited type system and the specific syntax and lots of parentheses, but it’s, of course, possible. All you need, in this case, is a text editor and [Indistinct]. C or C++ enthusiasts can compare their language of choice to Wasm. With the client compiler, the binary interface is well-defined, and the resulting binaries are also quite well-optimized. Underneath the code is compiled to WebAssembly the use of a lot of VM representation, which makes many optimizations possible. Rust also has the capability of producing Wasm output in its ecosystem, and a Wasm vertical target is already supported in the official deal to chain cargo. And then there is AssemblyScript, which is a type-script-like language, which compares directly to WebAssembly. AssemblyScript is especially nice for quick experiments because, well, it’s a scripting language. It’s also the only language that was actually invented and designed with WebAssembly as a compilation target in mind. What do we need WebAssembly for? Our first use case inside ScyllaDB is user-defined functions, also known as UDF. It’s a CQL feature that allows defining a function in a given language and then using this function when querying the database. The function will be applied on the arguments by the database itself and only then sent to the user. It’s also possible to express nested calls and other more complex operations with UDF, as you can see on the slide. So user-defined functions are already cool, but their most important purpose is enabling user-defined aggregates. These are custom accumulators, which combine data from multiple rows from the database into potentially complex outputs. The user-defined aggregates consists of two functions, one for accumulating the result for each argument and one for finalizing the results and transforming to the output type. On this slide, you can see an example of an aggregate that computes the average length of all requested strings. One function accumulates partial results by storing the sum of all lengths and the number of strings, and the finalizing function divides one by the other in order to return the results. I think, in this case, that this result is in the form of an already-rendered tax. As you can see, the potential here is quite large. User-defined aggregates allow using the database queries in a more powerful way, for instance, gathering complex statistics or transforming whole partitions into different formats and so on. How to create user-defined function in WebAssembly, first, we need to write or compile a function to Wasm text format. Then, the function body is simply registered in a secure statement called Create Function, and that’s it. Now that the declared language here is X Wasm, which stands for experimental Wasm, right, that’s because support for this language is currently still experimental in ScyllaDB. The exact interface for expressing types and return values, including [Indistinct] link value values, is kind of out of scope for this lightning talk, but there’s a design document I recommend everyone to read explaining all the specific choices. And Wasm-powered user-defined functions are already available in experimental mode in ScyllaDB, so don’t hesitate, and spin a test cluster, and try it out. Here is how you can set your notes up to enable experimental features including WebAssembly for user-defined functions. Thanks. I encourage everyone to stay in touch, and feel free to reach out to me if you have any questions related to WebAssembly.

Read More