The ZombieLoad Pragmatist: Tips for Surviving in a Post-Meltdown World
We live in a post-meltdown world. After the discovery of Meltdown and Spectre, it was only a matter of time before bad actors realized that side-channels were a promising attack vector and started looking for more.
Once the floodgates opened, the natural prediction is that side-channel attacks will keep coming. And, as predicted, there is a new processor vulnerability, or rather, a family of vulnerabilities in town. Due to the fact that these flaws were discovered by many different research teams, you may see them referred by many names; RIDL, Fallout, MDS, or my personal favorite, “ZombieLoad.”
Meltdown and spectre exploited the speculative execution unit in the processor to allow unprivileged applications to read memory from the operating system. All operating systems were patched to be more strict in the way the operating system’s memory is mapped, causing a slowdown in most applications. Scylla, due to its mostly-userspace architecture, escaped relatively unscathed.
Zombieload is an attack that exploits Intel’s Hyperthreading implementation, allowing attackers to infer the values of protected data located in architectural components of the processor such as the store Buffer, fill Buffer or register load ports. Some vendors have rushed to recommend users to disable Hyperthreading, and shortly after Intel released a microcode update to protect against the issue.
While savvy organizations will be quick to follow such instructions and apply patches to protect against the known vulnerabilities through code, protecting against the vulnerabilities yet to come is possible only through solid and safe architectural and operational decisions. For example, if you fundamentally can’t trust your neighbors, you’re better off not having any.
One good example of how wise decisions can increase security is the decision by the major cloud providers to not oversubscribe CPUs. Because these new attacks rely on threads on the same core being made visible to each other, the decision of never assigning parts of a core, or sharing the same core with different VMs, mitigates these particular attacks.
For the future attacks, security-conscientious developers and infrastructure maintainers can protect against side-channel attacks by minimizing the amount of shared infrastructure. This means VMs and Container infrastructure. This was always a theoretical concern, but the new flaws catapult this front and center to the main stage.
As usual, security is in a constant tug-of-war against economics and flexibility: for the application layer, the microservices architecture is just too convenient and powerful to pass, and developers may want to look at alternative ways to make sure they are safe in the presence of side-channel attacks.
For infrastructure, the situation is different. Modern databases adopt horizontally scalable architectures, and as a side effect of that usually don’t worry much about scaling up. On the other hand, public cloud providers have been trying to benefit from economies of scale and get the most out of their instances, which often means they are trying to encourage multi-tenancy in large physical servers.
Users also have the choice of being the single tenant in their own bare-metal instances, which implicitly protects against side-channel attacks by enforcing isolation, but if the choice of horizontal scaling is imposed by the database’s lack of ability to scale up well, the extra resources brought in by a large physical server will not translate to better performance or reduction of node count and be wasted as a result, tipping the balance in favor of enduring a systemic security risk.
One of the reasons Scylla is such an attractive piece of infrastructure for engineers who want to protect against side-channel attacks: Scylla scales linearly both up and out, and the user has the choice between doubling the size of their machines while reducing their number in half, or the other way around. It can efficiently utilize all of the cores of a big bare-metal box, meaning service providers aren’t losing out due to idle resources, and customers, seeing high system utilizations, are getting the most return out of their infrastructure investments.
In Scylla, all operations— from foreground ingestion and reads to maintenance operations, like data streaming and compactions, scale up linearly with the amount of resources (disk speed, number of cores, etc).
Another key benefit of scaling up is also minimizing your threat attack surface. A fewer large nodes are far easier to defend than a sprawling horizontal cluster. Thus, if you redeploy from a sprawling containerized horizontal topology today to a vertically-scaled baremetal or reserved instance topology tomorrow, many of the risks associated with ZombieLoad and other side-channel attacks are reduced greatly or even effectively eliminated.
At Scylla, we believe users should never have to make the fool’s choice between economics and security. Scylla’s ability to scale up and out makes sure that the user can efficiently run larger machines effectively utilizing the entire physical server, without breaking the bank. With nobody listening to the side channel, side channel attacks, present or future, are of no concern.
ZombieLoad Survival Pocket Guide
- If you can’t trust your neighbors, don’t have any; avoid multi-tenancy where possible.
- If you need hyperthreading for performance, make sure you are running systems that are isolated; again, avoid multi-tenancy if you can and use applications that dominate entire cores.
- Reduce your threat surface by having fewer, taller boxes. But make sure your systems, like your database, can utilize those resources and scale vertically.