See all blog posts

Why, how, and when to use Secondary Indexing at Scylla Summit 2017

Scylla Summit 2017

An index in a database provides an easier and faster way to access data by using attributes.  The end-user experience will result in faster matching and queries. In Scylla, there are different indexes that you can choose from and I was able to sit with Pekka Enberg of ScyllaDB to learn more about them from his upcoming talk at  Scylla Summit 2017. Let’s begin the interview and learn more about Pekka and the latest news on indexing for Scylla.

Please tell us about yourself and what you do at ScyllaDB?

I’m Pekka and I work at ScyllaDB as a Software Engineer and have been with the company almost since it was founded. I have done work on Scylla’s CQL front-end (parser, AST, on-wire protocol) and schema management, as well as our Docker image and other release engineering-related things.

What will you be talking about at Scylla Summit 2017?

I will be giving a presentation on Scylla’s upcoming secondary index implementation, which builds upon the Materialized Views work ScyllaDB’s Duarte Nunes and Nadav Har’El are doing.

What are Materialized Views?

Materialized Views are a mechanism to automatically denormalize data from a base table to a view table using a different partition key. For example, a database user can  create a  Materialized View of a table with a specific query and store the results into another table for faster access. Applications would normally do this themselves but Materialized Views pushes the burden to the server instead of the application.

What type of audience will be interested in your talk?

The talk will be interesting for both users who are planning to use Scylla’s secondary indexes and for people who are interested in Scylla’s internals.

Can you please tell me more about secondary indexes and your talk?

Secondary indexes in Apache Cassandra are implemented as local indexes, which means that the index is stored on the same node as the data it’s indexing. As index size is proportional to the size of the indexed data, it would be impractical to store the full index on a single node because it limits the size of the index to the capacity of a single node, not the capacity of the whole cluster. The benefit of a local index is that writes are very fast, but the downside is that reads have to potentially query every node to find the index to perform a lookup on, which makes local indexes unscalable to large clusters.

In my talk, I will go over the current status of secondary indexes, the benefits, when you should use it, and explain how it is implemented in  in Scylla.

How is secondary indexing related to materialized views?

Materialized Views solves the scalability issue of local indexes but comes at a storage cost because you need to duplicate the whole table in the worst case scenario. However, materialized views provides the necessary infrastructure to implement secondary indexes using global indexing, which is the implementation approach taken for Scylla.

Where can we learn more information about your talk?

I am currently writing a blog post on Secondary Indexes for our the Scylla Developers Blog. Please look for it there in the near future.

How can the people get in touch with you?

The best way to get in touch with me is through Twitter and our public Slack channel. There’s where I spend most of my time interacting with people.

Thank you very much, Pekka. We can not wait to see your talk in person and learn more.

Scylla Summit is taking place in San Francisco, CA on October 24-25. Check out the full agenda on our website to learn about the rest of the talks—including technical talks from the Scylla team, the Scylla roadmap, and a hands-on workshop where you’ll learn how to get the most out of your Scylla cluster.