Case Study: Zillow Scales Background and Real-time Workloads with ScyllaDB

By Dan Podhola, Principal Software Engineer, Zillow

About Zillow

With $2.7B of annual revenue in 2019, Zillow is the most visited real estate website in the United States. Zillow maintains data on approximately 110 million homes across the U.S., offering a range of features for home buyers, sellers, and renters, including value estimates, value changes of each home in a given time frame, aerial views of homes, and prices of comparable homes in the area. It also provides basic information on any given home.

The Challenge

To drive these services, the IT team at Zillow is responsible for processing property and listing records, mapping those to common identifiers, and translating messages into a common interchange format that can be shared across teams. A listing processor service uses ScyllaDB as a storage layer, taking messages from two queues, publishing to a Kinesis stream, and backfilling messages to an S3 bucket.

Consuming messages from multiple sources in the same cluster can be problematic, especially when those sources provide data from multiple queues and cannot guarantee message ordering.

Dan Podhola is a Principal Software Engineer at Zillow. Podhola explains, “Our biggest challenge is that we have a highly threaded application, and we receive two different message queues from two different producers. As a result, messages can be received out of order, simply by the nature of the queues. The application cannot go ‘back in time’ to reorder messages.”

“No one even realizes that we are processing the entirety of Zillow’s property and listings data.”

– Dan Podhola, Principal Software Engineer, Zillow

The Solution

The Zillow team was able to solve the problem without resorting to application locks, or even using ScyllaDB’s lightweight transactions (LWTs). According to Podhola, “We provide the ScyllaDB write timestamp and employ a couple other tricks to provide correct and consistent data to our consuming services and avoid doing transactions.”

Today, Zillow runs ScyllaDB in production on a cluster of three i3.4.xlarge AWS instances. The listing service supports real-time workloads on a single c5.xlarge instance, which autocales to three instances to support traffic spikes. The backfill services auto-scale to 35 c5.xlarge instances.

According to Podhola, the real benefit of ScyllaDB is its ability to scale in support of both background processes and real-time, user-facing workloads.

“No one even realizes that we are processing the entirety of Zillow’s property and listings data, maybe to correct some data issue or change a business rule,” explains Podhola. “The real beauty is that, on just three nodes, we can scale up to 35 of the c5.xl instances and we’ll process over 6500 records per second, while also running the real-time workloads.”