See all blog posts

Troubleshooting Large Rows and Large Cells in ScyllaDB

We’ve talked in the past about how ScyllaDB helps you find large partitions. But sometimes you need to get even more granular to get to the heart of what might be causing a hiccup in your database performance. Below we describe how to detect large rows and large cells in ScyllaDB.

While it tries its best to handle them, ScyllaDB is not optimized for very large rows or large cells. They require allocation of large, contiguous memory areas and therefore may increase latency. Rows may also grow over time. For example, many insert operations may add elements to the same collection, or a large blob can be inserted in a single operation.

Similar to the large partitions table, the large rows and large cells tables are updated when SSTables are written or deleted, for example, on memtable flush or during compaction. We added the means to search for large rows and large cells when we added the SSTable “mc” format with ScyllaDB 3.0 and ScyllaDB Enterprise 2019.1. This SSTable format is enabled by default on ScyllaDB Open Source 3.1 and above.

Find Large Rows

For example, look at an example of the system.large_rows table:

Let’s break down what each of these columns and values represent:

Parameter Description
keyspace_name The keyspace name that holds the large partition
table_name The table name that holds the large partition
sstable_name The SSTable name that holds the large partition
row_size The size of the row
clustering_key The clustering key that holds the large row
compaction_time Time when compaction occur

Next, let’s look at a simple CQL query for a specific keyspace and or table within all the large rows. For example if we were looking for the keyspace demodb and table tmcr:

SELECT * FROM system.large_rows WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Find Large Cells

To find large cells, let’s look at the system.large_cells table:

And similarly, let’s understand the specific parameters of this table:

Parameter Description
keyspace_name The keyspace name that holds the large partition
table_name The table name that holds the large partition
sstable_name The SSTable name that holds the large partition
row_size The size of the row
clustering_key The clustering key that holds the large row
column_name The column of the large cell
compaction_time Time when compaction occur

The main difference between the large row and large cell tables is the addition of the column_name in the latter.

For example, if we were looking for the keyspace demodb and table tmcr, use this CQL query:

SELECT * FROM system.large_cells WHERE keyspace_name = 'demodb' AND table_name = 'tmcr;

Configure

Configuration of the large row and cell detection threshold in the scylla.yaml file uses the following parameters:

  • compaction_large_row_warning_threshold_mb parameter (default: 10MB)
  • compaction_large_cell_warning_threshold_mb parameter (default: 1MB)

Once the threshold is reached, the relevant information is captured in the large_row / large_cell table. In addition, a warning message is logged in the ScyllaDB log (refer to our documentation on logging).

Storing

Large rows and large cells are stored in system tables with the following schemas:

Expiring Data

In order to prevent stale data from appearing, all rows in the system.large_rows and system.large_cells tables are inserted with Time To Live (TTL) equal to 30 days.

Conclusion

Large rows and large cells are unfortunate but frequently found artifacts when users are first beginning data modeling with ScyllaDB. Often users don’t anticipate how their early data modeling decisions will impact performance until they go into production. That’s why we feel it is vital to put tools in the hands of users to be able to easily detect and quickly troubleshoot these data phenomena.

Have you run into problems with large partitions, rows or cells that caused you some worried hours or sleepless nights? How did you solve them? We’d love to hear your war stories. Write to us, or join our Slack channel to tell us all about it.

In the meanwhile, if you want to improve your skills with ScyllaDB, make sure you take our ScyllaDB University course on data modeling, with sections for both beginners and advanced users. It’s completely free!

LEARN MORE AT SCYLLA UNIVERSITY

 

About Laura Novich

Laura, a Technical Writer Manager at ScyllaDB, has 15 years of documentation experience for start-ups and fortune 500 companies alike. Before joining ScyllaDB, Laura was a technical writer at Verint Systems and Red Hat where she documented for the KVM and Fedora projects. She is passionate about open source projects, innovation, and special needs children. Laura has presented at technical writing conferences, contributed to the Open Source magazine, and is currently writing her first book.