Case Study: Dstillery Customizes Adtech Audiences at Scale with ScyllaDB

By Hang Chan, Site Reliability Engineer, Dstillery

About Dstillery

Dstillery is the leading applied data science company serving the marketing and advertising industries. Dstillery provides custom AI, tailored, and pre-built audience models built on ten million attributes to build the best audiences for companies across industries. Dstillery’s models score hundreds of millions of candidate members to identify the audience matching the brand’s ‘DNA’ at scale. Candidates are scored in and out of audiences every 24 hours, meaning audiences are always up-to-date and on-target.

Dstillery evaluates the trustworthiness of data sources by identifying correlations between location and online activity. Using data to create location-based audiences, Dstillery can identify customers who have previously visited a location or are predicted to visit one in the future. Advertisers can use this data to target messages to those audiences, as well as layer on other location, web, and app behaviors to generate insights that help better understand these users. Location visits can also be used to better understand users from a web-based audience, providing insights into their real-world habits and helping marketers better understand that audience’s activities and motivations.

The Challenge

To do this accurately at scale, Dstillery relies on device data, which is refreshed daily. Working with such a huge dataset requires the ability to read and write at the scale of hundreds of billions of requests per day. To complement that level of scale, Dstillery also needs incredibly low latency, with timeouts of 25 milliseconds or less.

“Since ScyllaDB tunes itself to the workload, we just set the host IP, cluster name, seeds, and let it go.”

Hang Chan, Site Reliability Engineer, Dstillery

Striving to achieve this level of performance and scale, Dstillery struggled with Cassandra. Dstillery hit a barrier with Cassanadra; no matter how many nodes were added to the cluster, the failure rate stayed the same. Even during periods of very low traffic, the team saw a 0.1% failure rate with Cassandra, which amounts to thousands of failures. In Dstillery’s business, thousands of failures significantly impacts revenue and customer churn.

The team clearly needed to find an alternative database.

The Solution

After a thorough evaluation process, Dstillery chose to migrate to ScyllaDB.

“With Cassandra, we needed to tune about 40 different settings for garbage collection, JVM, memtables, compaction, and cache sizes,” said Hang Chan, Site Reliability Engineer at Dstillery. “Since ScyllaDB tunes itself to the workload, we just set the host IP, cluster name, seeds, and let it go, knowing that the performance will normalize itself over time without any manual intervention.”

By migrating to ScyllaDB, Dstillery reduced the failure rate to nearly 0.0%, or hundreds of failures, rather than the thousands of failures they endured with Cassanadra. In one extreme case, ScyllaDB helped Dstillery reduce the failure rate from 14% to 0.8%.

Today, Dstillery runs ScyllaDB in 8 clusters of around 130 nodes in 2 DCs, using bare metal nodes with 40 CPU cores each. On one cluster, Dstillery reduced the server count by about 40% by switching to ScyllaDB.

ScyllaDB now stores the device location history that Dstillery algorithms uses to detect inaccurate location data and to validate location data quality. The team uses ScyllaDB as a source of various attributes for devices, and to map devices between internal systems and external partners.

“We still have around 200 Cassandra nodes that we didn’t migrate because they don’t have the performance requirements of the clusters we migrated to ScyllaDB,” said Chan. “These are mostly write-heavy workloads that don’t impact us as much.”

Chan also reported that Dstillery found support on the Slack channel and Google Group to be top notch. “We were having an issue and Glauber [ScyllaDB’s VP of Field Engineering] suggested we try a new setting. I really appreciate that engineers like Glauber are always ready to help when you have issues.”

Looking forward, Dstillery is excited about the SSTable format change in ScyllaDB Open Source 3.0. Reducing storage usage will have a substantial impact, since usage has been one of the biggest reasons why Dstillery adds nodes.