See all blog posts

What’s Trending on the ScyllaDB Community Forum: Summer 2023

Since earlier this year, new and experienced “sea monsters” across the global ScyllaDB community have been connecting and learning from one another’s experiences over at the ScyllaDB Community Forum. It’s a great place to:

  • Get quick access to the most common getting started questions
  • Troubleshoot any issues you come across
  • Engage in in-depth discussions about new features, configuration tradeoffs, and deployment options
  • Search the archives to see how your peers are setting up similar integrations (e.g., ScyllaDB + JanusGraph + Tinkerpop)
  • Propose a new topic for us to cover in ScyllaDB University
  • Share your perspective on a ScyllaDB blog, ask questions about on-demand videos, or tell us more about what types of resources your team is looking for
  • Engage with the community, share how you’re using ScyllaDB (the fastest distributed database), what you learned along the way, and get ideas from your peers

Say hello & share how you’re using ScyllaDB

Recent Trending NoSQL Discussions

Here’s a look at some of the forum’s recent trending discussions.

What’s the maximum number of records in a ScyllaDB table?

Discussion recap: Does having 1 trillion records in one table, with each record in a separate partition, make sense? Will it cause any performance issues? ScyllaDB, the fastest NoSQL Database, performs well with many partitions, especially small ones, because it ensures even data distribution among nodes and shards — thus preventing performance slowdowns due to hot shards or nodes. ScyllaDB can also handle a vast number of small partitions without any hard limits. It is considered the “sweet spot” for ScyllaDB’s performance and scalability.

View Full Discussion

 

What’s the maximum number of tables in a keyspace?

Discussion recap: ScyllaDB is known to work with a few thousand tables, but it’s not recommended.  Having a big number of tables would slow down topology changes and it would slow down any other operation that works per table, like repairs backups.
We are testing with 5000 tables in one keyspace regularly.

View Full Discussion

 

Different CPU consumption by ScyllaDB threads with different Linux kernels after the “nodetool drain” command

Discussion recap: The question is about different CPU consumption by ScyllaDB threads with different Linux kernels after the nodetool drain command. All the results are from a single node system, but behavior of multi-node systems is nearly the same. We suspect, that different kernel functions are called depending on an OS kernel version at least, and this explains different behavior.
Is this known behavior?
Does it work as designed?
The answer is in the strace output. On the older kernels, ScyllaDB falls back to use epoll for polling the kernel for I/O. This is not as efficient and involves the DB busy-polling the kernel, hence the 100% CPU usage, even when idle. On newer kernels, where it is available, ScyllaDB will use the AIO kernel interface to poll for I/O completion, which is much more efficient.

View Full Discussion

 

Redistribution of data when adding and removing nodes

Discussion recap: What happens (in terms of performance) when a node recovers after a disk failure,? Will the streaming affect performance? ScyllaDB has CPU and IO schedulers. User workloads and background maintenance work, like repair, streaming, and compaction use different scheduling groups. These are isolated from each other and have different priorities to make sure the performance is not reduced.

View Full Discussion

 

Does ScyllaDB limit data per row?

Discussion recap: There is no limit to the row size. However,  keep in mind that if the row/cell is too big,  performance degradation might occur and the CPU will take more time to process the data, increasing latency. If the per row data is very large ScyllaDB will take more time to process the rows – mostly for writes. Read queries should be fine since we are a column-based database.

View Full Discussion

 

What are the differences between column families in Cassandra’s data model compared to Bigtable?

Discussion recap: When Cassandra was first introduced, it followed a schema-less data model similar to BigTable. Rows could have any number of columns with name-value pairs, and rows did not have to adhere to a fixed schema. Over time, as Cassandra matured, developers realized the drawbacks of a schema-less approach. Schemas play a crucial role in ensuring application correctness. The typical use case didn’t involve rows with a thousand individually-named fields, but rather a smaller set of fields repeated for multiple entries. The concept of “clustering key” was introduced, allowing the definition of fixed fields for each entry within a row. With the advent of CQL (Cassandra Query Language) around Cassandra 0.8, schema support was introduced. The “table” nomenclature replaced “column-family,” and users could declare a known list of fields for a table.

View Full Discussion

 

What’s the best way to query a table in ScyllaDB: count, limit, or paging?

Discussion recap: It really depends on the application, the queries, and the specifics of the use case. Generally, paging should be used – otherwise, you risk adding pressure to the query coordinator, which may introduce latency and memory fragmentation.

View Full Discussion

About Guy Shtub

Head of Training: Guy is experienced in creating products that people love. Previously he co-founded two start-ups. Outside of the office, you can find him climbing, juggling and generally getting off the beaten path. Guy holds a B.SC. degree in Software Engineering from Ben Gurion University.