See all blog posts

How Hinted Handoff Works in ScyllaDB

Hinted

When data is written to ScyllaDB, one or more replicas may become unresponsive or unreachable. The reasons for that may range from a heavy load on a particular replica node, network congestion, hardware issues, etc. As a result, the write to a replica will fail, usually with the timeout error. To restore the consistency of the data across all replicas, a user will have to run a repair, which is a very expensive—and usually long—procedure.

Hinted Handoff is a new feature that is coming in ScyllaDB 2.1 that will reduce the level of data inconsistency from write failures. It mimics the Apache Cassandra Hinted Handoff implementation but ScyllaDB’s version won’t allow hints to consume all of your disk space. We delayed work on this feature because we hoped Hinted Handoff would be irrelevant if we had the automatic incremental repair. However, after some research, we reached the conclusion that Hinted Handoff would allow replicas to synchronize faster and more efficiently. Kudos to the Apache Cassandra designers.

So, how does Hinted Handoff work? Every time the coordinator fails to write to a replica, it will write a “hint.” When the replica node recovers and can be written again, the coordinator sends the pending hints to the replica node to ensure that it has the originally intended data. At first sight, this process resembles the repair operation and it may seem that if Hinted Handoff is enabled—that there is no need to run repair anymore. Unfortunately, this is not true. You find more details about why this is not true at the end of this post.

Enabling and configuring Hinted Handoff

The feature is controlled by three configuration parameters (via scylla.yaml or command line arguments):

  • hinted_handoff_enabled: Enables or disables the Hinted Handoff feature completely or enumerates DCs for which hints are allowed. By default “true” which enables hints to all nodes.
  • max_hint_window_in_ms: Does not generate hints if the destination node has been down for more than this value. The hints generation should resume once the node is seen as up. By default, it is set to 3 hours.
  • hints_directory: Directory where ScyllaDB will store hints. By default, it is $SCYLLA_HOME/hints.

Example:

If scylla.yaml has the following:

hinted_handoff_enabled: DC1, DC2, DC3

Hints are going to be enabled to nodes in data centers ‘DC1’, ‘DC2’ and ‘DC3’ only.

With the following enabled:

hinted_handoff_enabled: true

Hints are going to be enabled to all nodes.

Please note that hinted_handoff_enabled is not a list YAML element, but rather a string that may either have a case-insensitive “true|false” or “0|1” value or be a string that has a comma-separated list of DCs.

Hinted Handoff and Consistency

Hints are not counted towards the consistency level. The only exception is a Consistency Level: Any (see more on Consistency Levels here). This is because counting hints towards the consistency level of the corresponding write operation may break the contract that ensures the most recent data version is read when W + R > RF, where W is the consistency level used for writes, R for reads, and RF is the replication factor of the corresponding keyspace.

For example, let’s take a cluster with 3 nodes (A, B, and C) and a keyspace with RF=3 and we write a row with the key K with CL=QUORUM(2) and we use A as a coordinator. In this scenario, B is down when the write happens. Data will be written to C and A and then A will write a hint to B.

Let’s imagine that hints are counted towards consistency by the following timing:

  1. Node A receives the WRITE(K) command with CL=QUORUM.
  2. Node A issues a write to nodes A and C and writes a hint to node B.
  3. A write to node A and a write of the hint to node B completes and node A reports a success to the client.
  4. Node C goes down before the write completes.
  5. Node C goes up and node A goes down.
  6. A client issues a READ(K) command with CL=QUORUM.
  7. Neither node C nor node B has the row with the key K and the operation returns an empty result which is obviously not the most recent version of the data. BANG!!!

The empty result occurred because we counted the hint towards the consistency level of the WRITE operation in step three above.

Hinted Handoff guarantee

After all of the hints are successfully delivered to their replicas, there is going to be more data consistency among those replicas. But does delivering hints to their replicas ensure that all of the data is consistent and that repair does not need to be run? Unfortunately, storing the hint may fail by itself. Hinted Handoff doesn’t provide any guarantee—it is a best efforts approach. The user may be able to run repair less frequently but will still have to run a full repair to ensure data is consistent across the cluster.

About Vlad Zolotarov

Vlad specializes in networking, mostly L2. He has worked at on projects for Mellanox, the bnx2x Linux device driver for Broadcom, and on the ScaleMP Virtual Device System for network interfaces. Vlad studied at the Israel Institute of Technology (Technion) and holds a B.Sc. in Computer Science.