How ScyllaDB performs on the new I8g and I8ge instances, across different workload types
Let’s start with the bottom line. For ScyllaDB, the new Graviton4-based i8g instances improve i4i throughput by up to 2x with better latency – and the i8ge improves i3en throughput by up to 2x with better latency. Benchmarks also show single-digit millisecond latency during maintenance operations like scaling. Fast and smooth scaling is an important part of the new ScyllaDB X Cloud offering.
The chart below shows ScyllaDB max through under a latency SLA of 10ms latency for different workloads, for the old i4i, i3en and the new i8g, i8ge.
AWS recently launched the I8g and I8ge storage-optimized EC2 instances powered by AWS Graviton4 CPUs and 3rd-generation AWS Nitro SSDs. They’re designed for I/O-intensive workloads like real-time databases, search, and analytics (so a nice fit for ScyllaDB).
| Instance Family | Use Case | Number of vCPUs per instance | Storage |
|---|---|---|---|
| i8g | Compute bound | 2 to 96 | 0.5 to 22.5 TB |
| i8ge | Storage bound | 2 to 192 | 1.25 to 120 TB |
Reduced TCO in ScyllaDB Cloud
Based on our performance results, ScyllaDB users migrating to Graviton4 can reduce infrastructure requirements by up to 50% compared to i4i and i3en previous generations. This translates into significantly lower total cost of ownership (TCO) by requiring fewer nodes to sustain the same workload.
These improvements stem from a few factors – both in the new instances themselves, and in the match between ScyllaDB and these instances.
The new I8g architecture features:
- vCPU-to-core mapping: On x86, each vCPU uses half a physical core (a hyperthread); for i8g (ARM), each core matches one physical core
- Larger caches: 64kB instruction cache and 64kB data cache, compared to 32/48kB on Intel (shared between the two hyperthreads)
- Faster storage and networking (see spec above)
In addition, ScyllaDB’s design allows it to take full advantage of the new server types:
- The shard-per-core architecture scales with linear performance to any number of cores
- The IO scheduler can take full advantage of the 3rd-generation AWS Nitro SSD, fully utilizing the higher IO rate, and lower latency without overloading it and increasing latency
- ARM’s relaxed memory model suits Seastar applications. Since locks and fences are rare, the memory subsystem has more opportunities to reorder memory accesses to optimize performance.
What this means for you
I8g and i8ge are now available on ScyllaDB Cloud.
If you’re running ScyllaDB Cloud, the net impact is:
- Compute-bound workloads: Move from I4i to I8g. This should provide up to 2x throughput at the same ScyllaDB Cloud price.
- Storage-bound workloads: Move from I3en to I8ge. Here, you should expect up to 2x higher throughput at the same ScyllaDB Cloud price. Note that using the new ScyllaDB dictionary-based compression can lower the storage cost further.
For both use cases, ScyllaDB can keep the 10ms P99 latency SLA during maintenance operations, including scaling out and scaling down.
What we measured
- Max Throughput: The maximum requests per second the database can handle
- Max Throughput under SLA: The maximum request per second under a P99 latency of 10ms. Only throughput with latency below this SLA counts. This throughput can be sustained under any operation, like scaling and repair. This is the number you should use when sizing your ScyllaDB Database on i8g instances.
- P99 Latency: Measures the p99 latency for the Max Throughput under SLA
Results
Read Workload – cached data
Cached data: working set size < available RAM, resulting in close to 100% cache hit rate.
| Instance type | Max throughput | Max Throughput Under Latency SLA | Improvement | P99 in ms |
|---|---|---|---|---|
| i4i.4xlarge | 1,062,578 | 750,000 | 100% | 7.84 |
| i8g.4xlarge | 1,434,215 | 1,300,000 | 135% | 6.29 |
| i3en.3xlarge | 585,975 | 550,000 | 100% | 4.37 |
| i8ge.3xlarge | 962,504 | 800,000 | 164% | 6.38 |
Read Workload – non-cached data, storage only
Non-cached data: working set size >> available RAM, resulting in 0% cache hit rate. When most of the data is not cached, storage becomes a significant factor for performance.
| Instance type | Max throughput | Max Throughput Under Latency SLA | Improvement | P99 in ms |
|---|---|---|---|---|
| i4i.4xlarge | 218,674 | 210,000 | 100% | 4.56 |
| i8g.4xlarge | 444,548 | 300,000 | 203% | 4.24 |
| i3en.3xlarge | 145,702 | 140,000 | 100% | 6.83 |
| i8ge.3xlarge | 259,693 | 255,000 | 178% | 7.95 |
Write Workload
| Instance type | Max throughput | Max Throughput Under Latency SLA | Improvement | P99 in ms |
|---|---|---|---|---|
| i4i.4xlarge | 289,154 | 150,000 | 100% | 2.4 |
| i8g.4xlarge | 689,474 | 600,000 | 238% | 4.02 |
| i3en.3xlarge | 217,072 | 200,000 | 100% | 5.42 |
| i8ge.3xlarge | 452,968 | 400,000 | 209% | 3.41 |
Tests under maintenance operations
ScyllaDB takes pride in testing under realistic use cases, including scaling out and in, repair, backups, and various failure tests.
The following results represent the P99 average latency (across all nodes) of different maintenance operations on a 3-node cluster of i8ge.3xlarge. It’s using the same setup as above.
Setup
- ScyllaDB version: 2025.3.1-20250907.2bbf3cf669bb
- DB node amount: 3
- DB instance types: i8ge.3xlarge
- Loader node amount: 4
- Loader instance type: c5.2xlarge
- Throughput: Read 41K, write 81K, Mixed 35K
Results
Read Test: Read Latency
| Operation | Read P99 latency in ms |
|---|---|
| Base: Steady State | 0.95 |
| During Repair | 4.92 |
| During Add Node (out scale) | 2.68 |
| During Replace Node | 3.10 |
| During Decommission Node (downscale) | 2.44 |
Write Test: Write Latency
| Operation | Write P99 latency in ms |
|---|---|
| Steady State | 2.22 |
| During Repair | 3.24 |
| Add Node (scale out) | 2.49 |
| Replace Node | 3.07 |
| Decommission Node (downscale) | 2.37 |
Mixed Test: Write and Read Latency
| Operation | Write P99 Latency in ms | Read P99 Latency in ms |
|---|---|---|
| Steady state | 2.03 | 2.11 |
| During Repair | 3.21 | 4.70 |
| Add Node (scale out) | 2.19 | 2.71 |
| Replace Node | 3.00 | 3.37 |
| Decommission Node (downscale) | 2.20 | 3.05 |
The results indicate that ScyllaDB can meet the latency SLA under maintenance operations. This is critical for ScyllaDB Cloud, and in particular ScyllaDB X Cloud, where scaling out and in scaling are automatic, and can happen multiple times per day. It’s also critical in unexpected failure cases, when a node must be replaced rapidly, without hurting availability and the latency SLA.
Test Setup
ScyllaDB cluster
- 3-node cluster
- I4i.4xlarge vs. i8g.4xlarge
- I3en.3xlarge vs. i8ge.3xlarge
Loaders
- Loader node amount: 4
Loader instance type: c7i.8xlarge
Workload
- Replication Factor (RF): 3
- Consistency Level (CL): Quorum
- Data size 650GB for read/mixed, 1.5T for write

