Scylla on Oracle Cloud Infrastructure: A Look at Stable Performance in the Event of a Node Failure
Scylla is now available on the Oracle Cloud Marketplace. The ScyllaDB team has completed testing and benchmarking of the bare metal instances available at Oracle Cloud Infrastructure (OCI). Scylla takes advantage of the excellent resources available on OCI bare metal servers: high CPU count, ample amount of DRAM, and fast and large NVMe drives. In our testing, we looked for ease of installation, throughput, and latency performance.
One particular item of interest we found during the testing is the ability to maintain performance in the event of node failure. Node drops are a “granted feature” of every distributed system. Scylla’s Heat weighted load balancing feature introduced in Scylla 2.0 helps elevate the impact of read performance once a node departs from the cluster.
Operators should not need any additional tuning to maintain a constant throughput from a system in case of node failure. The main task of the operator should be to bring back failed nodes. The following graph shows the impact of a node dropping from the cluster.
As can be seen above, the cluster is pulling ~2M read operations per second when a node is dropped out of the ring. The operations per-second services by the cluster are not affected by the missing node as each of the other servers takes the additional load. The coordinator node is informed that there is a missing node and will not try to send a query request to the missing node, eliminating dropped request scenarios.
Normal operations, with all of the nodes online, show that the system can support at least 4 million operations per second, while each node maintains over seven terabytes of data. Data is stored with a replication factor of three and transactions are carried with quorum consistency levels.