Troubleshooting Large Rows and Large Cells in ScyllaDB

By Laura Novich

April 8, 2020

We’ve talked in the past about how ScyllaDB helps you find large partitions. But sometimes you need to get even more granular to get to the heart of what might be causing a hiccup in your database performance. Below we describe how to detect large rows and large cells in ScyllaDB.

While it tries its best to handle them, ScyllaDB is not optimized for very large rows or large cells. They require allocation of large, contiguous memory areas and therefore may increase latency. Rows may also grow over time. For example, many insert operations may add elements to the same collection, or a large blob can be inserted in a single operation.

Similar to the large partitions table, the large rows and large cells tables are updated when SSTables are written or deleted, for example, on memtable flush or during compaction. We added the means to search for large rows and large cells when we added the SSTable “mc” format with ScyllaDB 3.0 and ScyllaDB Enterprise 2019.1. This SSTable format is enabled by default on ScyllaDB Open Source 3.1 and above.

Find Large Rows

For example, look at an example of the system.large_rows table:

Let’s break down what each of these columns and values represent:

Parameter	Description
`keyspace_name`	The keyspace name that holds the large partition
`table_name`	The table name that holds the large partition
`sstable_name`	The SSTable name that holds the large partition
`row_size`	The size of the row
`clustering_key`	The clustering key that holds the large row
`compaction_time`	Time when compaction occur

Next, let’s look at a simple CQL query for a specific keyspace and or table within all the large rows. For example if we were looking for the keyspace demodb and table tmcr:

SELECT * FROM system.large_rows WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Find Large Cells

To find large cells, let’s look at the system.large_cells table:

And similarly, let’s understand the specific parameters of this table:

Parameter	Description
`keyspace_name`	The keyspace name that holds the large partition
`table_name`	The table name that holds the large partition
`sstable_name`	The SSTable name that holds the large partition
`row_size`	The size of the row
`clustering_key`	The clustering key that holds the large row
`column_name`	The column of the large cell
`compaction_time`	Time when compaction occur

The main difference between the large row and large cell tables is the addition of the column_name in the latter.

For example, if we were looking for the keyspace demodb and table tmcr, use this CQL query:

SELECT * FROM system.large_cells WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Configure

Configuration of the large row and cell detection threshold in the scylla.yaml file uses the following parameters:

compaction_large_row_warning_threshold_mb parameter (default: 10MB)
compaction_large_cell_warning_threshold_mb parameter (default: 1MB)

Once the threshold is reached, the relevant information is captured in the large_row / large_cell table. In addition, a warning message is logged in the ScyllaDB log (refer to our documentation on logging).

Storing

Large rows and large cells are stored in system tables with the following schemas:

Expiring Data

In order to prevent stale data from appearing, all rows in the system.large_rows and system.large_cells tables are inserted with Time To Live (TTL) equal to 30 days.

Conclusion

Large rows and large cells are unfortunate but frequently found artifacts when users are first beginning data modeling with ScyllaDB. Often users don’t anticipate how their early data modeling decisions will impact performance until they go into production. That’s why we feel it is vital to put tools in the hands of users to be able to easily detect and quickly troubleshoot these data phenomena.

Have you run into problems with large partitions, rows or cells that caused you some worried hours or sleepless nights? How did you solve them? We’d love to hear your war stories. Write to us, or join our Slack channel to tell us all about it.

In the meanwhile, if you want to improve your skills with ScyllaDB, make sure you take our ScyllaDB University course on data modeling, with sections for both beginners and advanced users. It’s completely free!

LEARN MORE AT SCYLLA UNIVERSITY

Previous Post Next Post

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog

Troubleshooting Large Rows and Large Cells in ScyllaDB

Find Large Rows

Find Large Cells

Configure

Storing

Expiring Data

Conclusion

Start scaling with the world's best high performance NoSQL database.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog

Troubleshooting Large Rows and Large Cells in ScyllaDB

Find Large Rows

Find Large Cells

Configure

Storing

Expiring Data

Conclusion

Related Posts

Start scaling with the world's best high performance NoSQL database.