Q&A with Nadav Harel on Project Alternator, our DynamoDB-compatible API
As we prepare for Scylla Summit 2019, we are producing a series of blogs highlighting this year’s featured presenters. And a reminder, if you’re not yet registered for Scylla Summit, please take the time to register now!
Today we are speaking with Nadav Harel, a ScyllaDB Distinguished Engineer who will be hosting the session Free and Open DynamoDB API for Everyone, which will take attendees on a tour of Project Alternator, ScyllaDB’s DynamoDB-compatible API.
Tell us a bit of your interests outside of Scylla. From what I understand you also have an abiding interest in linguistics.
Indeed. By the turn of the century, Linux distributions started to add support for languages other than English, including my native tongue – Hebrew. But while I was writing parts of my MSc thesis in Hebrew, I noticed that a free Hebrew spell-checker was missing – and was not likely to suddenly appear unless a Hebrew speaker sat down and worked on it. So I did. The result was Hspell (http://hspell.ivrix.org.il/), which is used today by Linux distributions, as well as by Google and Apple. Beyond being a useful project, I really learned a lot during this work and had a lot of fun.
Beyond my fairly eclectic interests in computers and computer programming, I also enjoy the other sciences (I have an MSc in Mathematics, of all things), I am an avid fan of free software (even before it was called “open source” :-)), I am a history buff, and of course – I love to spend time with my family – my wife and three kids.
You joined ScyllaDB very early on — in 2013. What are some of the most significant events, turning points or other memories that have stayed with you over that time?
Working in ScyllaDB for six years has been one heck of a roller coaster ride! I learned a lot of exciting new technology and worked on diverse state-of-the-art problems. We actually started out writing an operating system kernel – OSv – from scratch, which was an amazing opportunity. But although this project was technically successful, it did not yield the business that we had hoped it would, so the biggest turning point for us was putting OSv aside, and instead of running Cassandra on OSv and aiming for modest performance gains – we rewrote Cassandra completely and got huge performance gains (a 10-fold throughput increase).
Another significant memory I will always treasure from my time with ScyllaDB will be the amazing people I’ve been working with. ScyllaDB’s founders, Dor Laor and Avi Kivity, are big believers in open-source and remote development methodologies, and this allowed us to attract some of the best software developers from around the globe to work with us. At my last count, we had developers from 15 different countries in 4 continents – each amongst the best and brightest in their field. This has been a great experience for me, both culturally and intellectually.
What key contributions did you make to the Scylla code base that you take greatest satisfaction in?
I contributed to many areas of the project – from the SSTable file format, repair, compaction, materialized views and – most recently – Alternator, so it’s hard to pick one. Maybe I’ll surprise you if I answer that perhaps the most gratifying was my contribution to documenting Seastar. Seastar is a new asynchronous-programming framework which we developed as the basis for Scylla. I wanted to make it easier for our new employees to learn Seastar, and moreover to encourage external projects to use Seastar too. So with funding from a European research project — which also wanted to use Seastar — I set out to write a Seastar tutorial; a draft of which you can find here. I can now feel satisfaction whenever a new developer – inside Scylla or in a growing number of other software projects – tells me that my tutorial helped him or her get started with Seastar.
How did you get involved with Project Alternator? Was it your idea to begin with?
Ever since Avi Kivity started implementing Scylla in late 2014, it was clear that Cassandra compatibility was just the beginning for us. Using the same techniques that we used to build a much faster, lower latency and more reliable implementation of Cassandra – we can implement additional database flavors and APIs.
Earlier this year, we announced Scylla Cloud, Scylla’s “database as a service” (DBaaS). Amazon’s DynamoDB is one of the most popular DBaaS, so it became a natural question how we can help users migrate their application from DynamoDB to Scylla. I can’t take the credit for the idea, but when the opportunity arose I decided to lead our effort to support the DynamoDB API in Scylla — so applications written for DynamoDB could simply run on Scylla, without any porting effort.
What parts of Project Alternator are you most satisfied with?
Alternator was a really fun project, and there were many satisfying parts in it. Piotr Sarna and myself divided the work on Alternator, and cooperating with him was a real joy despite more than 3,000 kilometers separating us (Piotr lives in Poland, while I live in Israel). One part of Alternator with which I am particularly pleased is Alternator’s test framework, based on pytest. This framework made it easy for us to write extensive functional tests for our DynamoDB API implementation, and to compare its correctness against Amazon’s DynamoDB. Test-first development allowed us to build our DynamoDB-API compatibility more quickly, write higher quality code, and more easily refactor existing code without the risk of breaking features.
Finally, of course the most satisfying part of the Alternator project was to finally see it all coming together: Running a complex DynamoDB application on it for the first time (Amazon’s Tic-Tac-Toe game server demo), or running a massive 100,000 request-per-second workload on it for the first time.
What are the next steps for Project Alternator? What more needs to be done?
Although Alternator can already run several DynamoDB benchmarks and example applications, DynamoDB’s API is large and there are still several cases which we have not implemented – and we are working on completing them now. Alternator is an open-source project, so you can look at our bug tracker to see exactly which features we are still missing, or see a summary in alternator.md.
Beyond the small missing features, one important difference between Scylla’s and DynamoDB’s data models is that DynamoDB natively supports conditional updates (where an item is updated with new data, but only if its existing data matches some condition) – but until recently Scylla did not support such operations. The best we could do is to approximate conditional writes with separate read and write operations – but that does not have the desired isolation between concurrent operations. These days, we are adding native support for conditional updates in Scylla – called LWT (lightweight transactions), and later we should start using them in Alternator too.
For people who have read up on Project Alternator already, what will you be speaking about at Scylla Summit beyond what has already been published?
We’ll be showing off the beta implementation of Alternator on Scylla Cloud to all our users to try it out themselves. This will be the first chance for people to get their hands on a running cluster using the Alternator API.
Thank you for taking the time to speak with me today. I am sure that everyone is looking forward to attending your presentation at Scylla Summit!