Scylla Enterprise 2018.1.12 and 2018.1.13

The ScyllaDB team announces the release of Scylla Enterprise 2018.1.12 and 2018.1.13, which are production-ready Scylla Enterprise patch releases. Scylla Enterprise 2018.1.12 and 2018.1.13 are bug fix releases for the 2018.1 branch, a stable branch of Scylla Enterprise. As always, Scylla Enterprise customers are encouraged to upgrade to Scylla Enterprise 2018.1.13 in coordination with the Scylla support team.

The major fix in 2018.1.12 is an update to the Gossip protocol, improving the stability of Scylla cluster while adding nodes. The following gossip-related bugs are fixed in this release:

  • When adding a node to a cluster, it announces its status through gossip, and other nodes start sending it writes requests. At this time, it is possible the joining node hasn’t learned the tokens of other nodes, which can cause error messages like:
    token_metadata - sorted_tokens is empty in first_token_index!
    storage_proxy - Failed to apply mutation from 127.0.4.1#0:
    std::runtime_error (sorted_tokens is empty in first_token_index!) #3382
  • Waiting for 30 seconds for the gossip to stabilize is not enough. Instead, Scylla will wait for the number of nodes reported by gossip to stabilize – stays on the same value for at least 30 sec. #2866

The major fix in 2018.1.13 is fixing a Scylla crash on rare cases while reading old SSTable (ka/la) file format. The root cause is a bug in sstable reader which makes it present data of the next partition as belonging to the previous partition in some rare cases. Only users using range deletions may be affected by this problem.

The error can manifest as incorrect query result (if clustering ranges for the involved partitions are not overlapping) or crashes due to violation of internal constraints. If this happens during compaction or streaming, the error will persist.

Other issues fixed in 2018.1.12

  • In some cases, when --abort-on-lsa-bad-alloc is enabled, Scylla aborts even though it’s not really out of memory #2924
  • Schema changes: schema change statement can be delayed indefinitely when there are constant schema pulls #4436
  • Move from Python 3.4 to Python 3.6 for Scylla scripts, following changes in CentOS EPEL repository
  • Operations which involve flush, like restart or drain, may take too long even under low load since the controller assigns a smaller number of shares than it can.
  • row_cache: potential abort when populating cache concurrently with MemTable flush #4236
  • Repair: repair failed with an error message: std::system_error (error system:98, Address already in use)
    Serialization decimal and variant data types to JSON can cause an exception on the client-side (e.g. CQLSh) #4348
  • Fix performance regression from solving an issue in Leveled Compaction Strategy
  • CQL: on rare cases, when executing a prepared select statement with a multicolumn IN, the system can return incorrect results due to a memory violation. For example:
    SELECT * FROM atable WHERE id=17 AND (id1, id2, id3) IN ((1, 2 , 3),(4, 5 , 6))
  • Scylla node crashes upon prepare request with multi-column IN restriction #3692, #3204

Related Links

06 August 2019