See all blog posts

Understanding and Mitigating Distributed Database Network Costs with ScyllaDB

Considerations and strategies that will help you optimize ScyllaDB’s performance as well as reduce networking costs

ScyllaDB is a high-performance NoSQL database that’s widely adopted for use cases that require high throughput and/or predicable low latency. As a distributed database, ScyllaDB frequently replicates data to other replicas, which inevitably translates to networking costs. Such costs reflect the additional network bandwidth necessary to uphold the consistency and availability of data across multiple nodes.

Factors that influence distributed database network traffic, and thus costs, include:

  • Replication factor, which determines the number of data copies in the cluster. More copies means higher availability and more data traversing the network (increasing cost).
  • Consistency level, which determines the number of nodes required to acknowledge the client operation before it is considered successful.
  • Other aspects like the payload size, Materialized Views, topology, and throughput.

Each factor plays a unique role in determining the overall network cost. Understanding each concept will help you optimize ScyllaDB’s performance as well as reduce networking costs.

Impact of Replication Factor and Consistency Level

ScyllaDB uses a distributed architecture that allows data to be stored and replicated across multiple nodes. The replication factor determines the number of copies of data that are stored across different nodes. The higher the replication factor, the higher the network traffic gets since more data needs to get replicated across the network to maintain consistency and availability.

During reads ScyllaDB applies an optimization in order to reduce traffic: the query coordinator requests only a digest from all but one replica (If using Consistency Level Local Quorum). If there is a mismatch, then it initiates a process known as read repair. This is done to reduce both network costs and latencies.

For write operations, the consistency level has minimal impact on network traffic because it only eliminates the write-ack from the replica, saving only a small amount of bandwidth.

Impact of Payload

Payload size refers to the amount of data that is sent or received over the network. The larger the payload size, the higher the network amplification since more bandwidth is required to transmit and (afterward) replicate the data.

Larger payloads can also increase your storage footprint, as well as your CPU and memory utilization, undermining latencies. These following strategies ensure that your ScyllaDB deployment remains efficient and performant, even with large payloads:

  • Efficient Data Modeling: By optimizing your data model, you can store data more efficiently, reducing the size of payloads. This involves strategies like using appropriate data types, denormalizing data where necessary, and avoiding the storage of large blobs or unnecessary data by following a query-driven approach.
  • Caching Strategies: Smaller payloads mean more room for caching. Intelligent caching can mitigate the impact of larger payloads on memory usage. By avoiding expensive data scans (or by properly specifying the BYPASS CACHE clause) you can ensure efficient memory usage, therefore avoiding round-trips to disk for your most valuable queries.
  • Asynchronous Processing: Asynchronous operations can help in managing larger payloads without an impact on throughput. They allow other operations to continue before the data operation is completed, thereby optimizing CPU utilization and response times.
  • Load Balancing and Partitioning: Proper load balancing along with evenly distributed access patterns on high cardinality keys ensures that large payloads don’t overwhelm specific nodes, preventing potential hotspots.

Impact of Materialized Views

Materialized Views are a powerful feature in ScyllaDB, designed to improve read performance. They are essentially precomputed views that save the results of a query and are automatically updated whenever the base table changes. Given that, the use of Materialized Views results in:

  • Increased Data Transfer: Each Materialized View represents an additional copy of the data, which means that any write to the base table also results in writes to its corresponding view. This increases the volume of data transferred across the network.
  • Enhanced Read Performance: Although base tables paired up with a view increase the data transfer during write operations, they can significantly boost read performance by eliminating the need for complex joins and aggregations at read time.

Secondary Indexes are implemented under the same semantics as Materialized Views. Although indexes will typically (but not always) be considerably smaller than views, they may also contribute to elevated network traffic during operations.

Multi-AZ, Multi-DC, and Single Availability Zone (SAZ) Topologies in ScyllaDB: Trade-offs and Considerations

With distributed databases like ScyllaDB, cluster topology is critical. It significantly influences data availability, resilience, and performance, as well as costs. Let’s evaluate three common setups: Multi-AZ, Multi-DC, and SAZ.

Multi-AZ Architecture

In a Multi-AZ setup, data is replicated across several availability zones but stays within a singular region.

Pros:

  • Enhanced Availability: By distributing data across multiple zones, you gain a safety net against zone-specific failures, ensuring uninterrupted data access.
  • Local Data Durability: Replicating data within the same region but across different zones enhances data durability

Cons:

  • Increased Network Cost: Transmitting data across different zones, albeit within the same region, can elevate network costs, especially in cloud environments.

Multi-DC Setup

A Multi-DC configuration implies that data is replicated across several data centers, often spread across distinct geographical regions. Although it is perfectly possible to have multi-region clusters where data is confined to specific regions with no replication among them, this is beyond the scope of this article.

Pros:

  • Disaster Recovery: Should a catastrophic event strike a region, data remains accessible from other regions, providing robust business continuity capabilities.
  • Geo Replication: For globally distributed applications, having data replicated in multiple regions can reduce latency for end-users spread across the globe.

Cons:

  • Network Amplification Costs: Replicating data across regions intensifies data transfer volumes, leading to higher costs, especially in cloud deployments.

Single Availability Zone (SAZ)

In a SAZ configuration, all nodes of the ScyllaDB cluster reside within one availability zone.

Pros:

  • Reduced Network Costs: Keeping all nodes in one zone significantly curbs inter-node data transfer costs.
  • Simplified Configuration: Having all nodes in a unified zone simplifies setup and eliminates complexities of inter-zone data replication.
  • Minimized Latency: With nodes in close proximity, latencies are lower.

Cons:

  • Potential for Reduced Availability: Operating under a single zone introduces a single point of failure. If the zone faces an outage, downtime occurs.
  • Data Durability Concerns: Without replicas in other zones, there’s an elevated mean time to recovery (MTTR) in case of major zone-specific calamities.

For SAZ users, ScyllaDB Cloud ensures that nodes don’t reside on the same hardware via Placement Groups, further enhancing fault tolerance within the zone.

Considerations for High Availability with Single AZ and Two DCs

Deploying ScyllaDB in a Single AZ (SAZ) is the most cost-efficient model, but it comes with a trade-off in high availability (HA). On the other end of the spectrum, a Multi-DC setup offers the highest availability at the higher costs. However, note that combining Multi-DC with SAZ may not always be the perfect solution.

Understanding the SAZ and 2DC Setup

This approach involves operating one data center within a SAZ to minimize costs, while the second data center — potentially in a different geographical location — serves as a failover to ensure data availability. While this sounds like an optimal balance between cost efficiency and HA, there are significant caveats:

  • Dual SAZ in a Single Region: This setup essentially doubles your infrastructure costs. Although you’re cutting down inter-AZ data transfers; you’re also doubling your infrastructure, negating the cost benefits you might get from avoiding multi-AZ setups.
  • Dual SAZ in Multi-Region: This becomes even more expensive, especially if the application doesn’t specifically require multi-region replication. It doubles the infrastructure and introduces cross-region data transfer costs.

When Does SAZ and 2DC Make Sense?

The only scenario where a SAZ combined with a 2DC strategy becomes cost-effective is when there’s a specific need for multi-region replication, perhaps due to business continuity requirements, geographic data regulations, or the need to serve a globally dispersed user base. In such cases, this setup allows you to leverage the benefits of geographical dispersion while avoiding constant cross-AZ data transfer costs.

Key Takeaway

It’s crucial for decision-makers to understand that while a hybrid approach of SAZ and 2DC might seem like a middle ground, it’s often closer to a high-cost model. This approach should only be considered when the application’s requirements justify the extra expense. Instead of looking at it as a default step up from SAZ, treat it as a specialized solution for particular operational needs.

Leveraging ScyllaDB Manager for Efficient Backup Strategies

Backup strategies in ScyllaDB require a careful balance between ensuring data durability and managing network costs. While it’s challenging to estimate network costs due to factors like application-specific patterns, ScyllaDB Manager simplifies this by offering robust, efficient, and manageable backup solutions.

ScyllaDB Manager is a centralized management system that automates recurrent maintenance tasks, reduces human error, and improves the predictability and efficiency of ScyllaDB. One of its key features is streamlining and optimizing backup processes:

  • Automated Backups: ScyllaDB Manager automates the backup process, scheduling regular backups according to your needs. This automation ensures data durability without the operational overhead of manual backups.
  • Backup Validation: The ‘backup validate’ feature verifies the integrity of the backups. It checks that all the data can be read and that the files are not corrupted, ensuring the reliability of your backups.
  • Efficient Data Storage with Deduplication: ScyllaDB Manager uses deduplication, storing only the changes since the last backup instead of a full copy every time. This process significantly reduces storage requirements and associated network costs.
  • Flexible Retention Policies: You can configure policies to retain backups for a specified period. After that period, older backups are automatically deleted, freeing up storage space.
  • Optimized Network Utilization: By relying on incremental backups and deduplication, ScyllaDB Manager minimizes the amount of data sent across the network, reducing network costs and avoiding bandwidth saturation.

For a practical illustration of how ScyllaDB Manager handles backups, see the documented strategies in its official documentation. These examples provide real-world scenarios and best practices for managing backups efficiently and reliably.

Balancing Act: Data Durability vs. Network Costs

Backups are essential for maintaining data integrity and availability, but they must be managed judiciously, especially considering network costs in cloud environments. A few things to consider:

  • Frequency: Align the frequency of backups with your data’s volatility and your acceptable Recovery Point Objective (RPO). Regular snapshots minimize data loss, but also incur elevated network consumption.
  • Backup Compression: ScyllaDB Manager utilizes rclone for efficient data transfers, which supports wire compression to reduce the size of the data being transferred over the network. Note that this is not the same as SSTable compression in ScyllaDB.
  • Backup Location: The proximity of your backup storage can significantly impact costs. Storing backups within the same data center or region is generally less expensive than cross-region or cross-provider transfers due to reduced data transfer costs.

Compression in ScyllaDB and Application-Level Strategies

ScyllaDB employs various data compression techniques, which are crucial for reducing network costs. The primary advantages are seen with client-side and node-to-node compression; both are pivotal in minimizing the data transmitted over the network. Additionally, application-level compression strategies can further enhance efficiency.

  • Client-Side Compression: ScyllaDB drivers support client-server compression, meaning data is compressed from the client’s side before being sent to the cluster, and vice-versa. This approach reduces the amount of data sent over the network, leading to lower bandwidth usage.
  • Node-to-Node Compression: ScyllaDB supports inter-node compression. Data exchanged between nodes within the cluster — for replication, satisfying consistency requirements, data streaming, as well as other operations — is compressed before transmission.
  • Application-Level Compression: Beyond ScyllaDB’s internal mechanisms, implementing compression at the application level can provide additional benefits. This involves compressing data payloads before they are sent to ScyllaDB, using standard compression libraries, or employing efficient serialization methods.

These strategies reduce the size of the data packets transmitted over the network, saving bandwidth, and greatly magnifying savings for applications dealing with large data volumes or operating over constrained networks.

By understanding and leveraging client-side, node-to-node, and application-level compression strategies, users can significantly reduce the amount of data transmitted over the network. This leads to lower bandwidth consumption and decreased network-related costs.

Application Specific Optimizations

Your application plays a crucial role in optimizing network amplification in ScyllaDB. By implementing certain strategies and best practices, the application deployment can help reduce the amount of data transmitted over the network, thereby minimizing network amplification. Here are some approaches to consider:

  • Data Modeling: Effective data modeling is essential to minimize network amplification. By carefully designing the data model and schema, you can avoid over-replication or excessive data duplication, leading to more efficient data storage and reduced network transmission. See the ScyllaDB University Data Modeling course to learn more.
  • Query Optimization: Optimizing queries can significantly impact network amplification. Ensure that queries are designed to retrieve only the necessary data, avoiding unnecessary data transfers over the network. This involves using appropriate WHERE clauses, limiting result sets, and optimizing data retrieval patterns. See this blog on query optimization to learn more.
  • Connection Pooling: Efficiently reusing database connections reduces the overhead of establishing new connections for each request. This enhances network hops and minimizes network amplification caused by connection setup and teardown. Learn more in this video about How Optimizely (Safely) Maximizes Database Concurrency
  • Application-Level Compression: Beyond the compression features provided by ScyllaDB and its drivers, application-level compression offers additional reductions in data transmission size. This involves integrating compression functionalities directly into your application, ensuring data is compressed before transmission to ScyllaDB.

ScyllaDB’s Zone-Awareness

Zone awareness is an important feature in various ScyllaDB drivers, playing an important role in minimizing cross-AZ data transfer costs. In a standard ScyllaDB deployment, nodes are strategically distributed across multiple Availability Zones (AZs) to ensure heightened availability and fault tolerance. This distribution inherently involves data movement across nodes in different AZs, a process that continues even when using a zone-aware driver.

Zone awareness differs from other load balancing policies in the sense that it prioritizes coordinator nodes located under the same availability zone as the application. This preference for “local” nodes doesn’t eliminate cross-AZ data transfers, especially given the replication and other inter-node communication needs, but it does significantly reduce the volume of data transfers initiated by application requests.

By prioritizing local nodes, the zone-aware driver achieves more cost-effective access patterns. It leverages the fact that data is already replicated across AZs due to ScyllaDB’s distribution strategies, allowing for local access to data despite its wide distribution for fault tolerance. This approach not only reduces data transfer costs but also enhances application performance. Local requests typically have lower latency compared to those traversing AZs, leading to faster data access.

In summary, the zone-aware driver doesn’t alter the way data is distributed across your ScyllaDB deployment; instead, it optimizes access patterns to this distributed data. By intelligently routing requests to local nodes whenever possible, it capitalizes on ScyllaDB’s cross-AZ replication, fostering both cost efficiency and performance improvement.
For further details on zone awareness, see Piotr Grabowski’s P99 CONF talk, Conquering Load Balancing: Experiences from ScyllaDB Drivers.

Conclusion

Optimizing network costs in ScyllaDB is a multifaceted endeavor that requires a strategic approach to both application deployment and database configuration. Through the adept implementation of strategies such as effective data modeling, application-level and built-in compression techniques, and thoughtful load balancing, it’s possible to drastically reduce the volume of data traversing the network. This reduction minimizes network amplification, enhancing the price-performance of your ScyllaDB deployment.

It’s vital to recognize that network amplification costs are not static. They fluctuate based on various factors, including workload characteristics, database settings, and the chosen topology. This dynamic nature necessitates regular monitoring and proactive adjustments to both application behavior and database configuration. Such diligence ensures your ScyllaDB ecosystem is not only high-performing but also cost-optimized.

By embracing a comprehensive approach — one that harmonizes application-level tactics with ScyllaDB’s zone-aware drivers, data distribution methodologies, and compression mechanisms — you position your database deployment to leverage the best of ScyllaDB’s efficiencies. This holistic strategy ensures robust performance, network cost savings, and overall operational excellence, enabling you to extract the maximum value from your ScyllaDB investment.

About Ricardo Borenstein

Ricardo is a Senior Solutions Architect at ScyllaDB. He has over 10 years of experience leading pre-sales activities and technical guidance for data management solutions at fast-growing technology companies.