ScyllaDB Enterprise 2018.1.12 and 2018.1.13

The ScyllaDB team announces the release of ScyllaDB Enterprise 2018.1.12 and 2018.1.13, which are production-ready ScyllaDB Enterprise patch releases. ScyllaDB Enterprise 2018.1.12 and 2018.1.13 are bug fix releases for the 2018.1 branch, a stable branch of ScyllaDB Enterprise. As always, ScyllaDB Enterprise customers are encouraged to upgrade to ScyllaDB Enterprise 2018.1.13 in coordination with the ScyllaDB support team.

The major fix in 2018.1.12 is an update to the Gossip protocol, improving the stability of ScyllaDB cluster while adding nodes. The following gossip-related bugs are fixed in this release:

  • When adding a node to a cluster, it announces its status through gossip, and other nodes start sending it writes requests. At this time, it is possible the joining node hasn’t learned the tokens of other nodes, which can cause error messages like:
    token_metadata - sorted_tokens is empty in first_token_index!
    storage_proxy - Failed to apply mutation from
    std::runtime_error (sorted_tokens is empty in first_token_index!) #3382
  • Waiting for 30 seconds for the gossip to stabilize is not enough. Instead, ScyllaDB will wait for the number of nodes reported by gossip to stabilize – stays on the same value for at least 30 sec. #2866

The major fix in 2018.1.13 is fixing a ScyllaDB crash on rare cases while reading old SSTable (ka/la) file format. The root cause is a bug in sstable reader which makes it present data of the next partition as belonging to the previous partition in some rare cases. Only users using range deletions may be affected by this problem.

The error can manifest as incorrect query result (if clustering ranges for the involved partitions are not overlapping) or crashes due to violation of internal constraints. If this happens during compaction or streaming, the error will persist.

Other issues fixed in 2018.1.12

  • In some cases, when --abort-on-lsa-bad-alloc is enabled, ScyllaDB aborts even though it’s not really out of memory #2924
  • Schema changes: schema change statement can be delayed indefinitely when there are constant schema pulls #4436
  • Move from Python 3.4 to Python 3.6 for ScyllaDB scripts, following changes in CentOS EPEL repository
  • Operations which involve flush, like restart or drain, may take too long even under low load since the controller assigns a smaller number of shares than it can.
  • row_cache: potential abort when populating cache concurrently with MemTable flush #4236
  • Repair: repair failed with an error message: std::system_error (error system:98, Address already in use)
    Serialization decimal and variant data types to JSON can cause an exception on the client-side (e.g. CQLSh) #4348
  • Fix performance regression from solving an issue in Leveled Compaction Strategy
  • CQL: on rare cases, when executing a prepared select statement with a multicolumn IN, the system can return incorrect results due to a memory violation. For example:
    SELECT * FROM atable WHERE id=17 AND (id1, id2, id3) IN ((1, 2 , 3),(4, 5 , 6))
  • ScyllaDB node crashes upon prepare request with multi-column IN restriction #3692, #3204

Related Links

06 August 2019