Sizmek Turns to Scylla to Power Real-Time-Bidding Ad Platform

About Sizmek

Operating in more than 70 countries, Sizmek is one of the world’s largest independent advertising platforms. The company helps brands to build meaningful, long-lasting relationships with their customers by factoring in each moment of customer interaction to deliver and optimize media across channels including display, video, mobile, and social. Sizmek helps its clients make smarter ad placements by taking into account real-time data on marketing goals, advertising opportunities, user profiles, device, and the contextual attributes of each placement.

Leveraging all these critical attributes for its real-time bidding requires processing massive amounts of data. To streamline the process, Sizmek tries to get as much information on the context of the ad placements and the users as possible. It receives much of this data in regular updates from third-party vendors.

The Challenges of Real-Time Bidding

Bidding on and placing a contextually relevant ad on the specific page that a user is visiting leaves precious little time to act, especially when processing 240 billion such requests per day. That’s why Sizmek has very stringent SLAs for its real-time bidding service — just 70 milliseconds to decide if they want to place an ad and, if so, which ad to display.

Sizmek’s first use-case for Scylla was fast caching of contextual information about pages where they bid on ad placements for its 2,000 clients. Sizmek works with third-party vendors whose job it is to provide updated information on users and page context. However, the roundtrip of a request that’s sent to the vendor’s API would be prohibitively long. Instead they cache all the information about page categories, so that all subsequent bids are able to utilize this information from cache instead of from the vendor APIs.

Additionally, Sizmek must reference a set of IP addresses to ensure they are not placing ads on pages that have been deemed invalid. Every 24 hours they receive a file of more than 60 million blacklisted IP addresses (bots, hijacked devices, etc.) from a third-party vendor. They insert this file into Scylla every 24 hours and receive a refresh every 15 minutes.

Accomplishing all of this calls for not only high throughput, but also very low and consistent latency.

“Scylla fit our needs perfectly because it doesn’t suffer the issues like garbage collection that come with other databases.”

Andrej Chu, Software Engineer, Sizmek

The Right Database for the Job

Sizmek considered several databases before choosing Scylla. Their primary reasons for choosing Scylla were performance, latency and ease of use. They also wanted a database with Time to Live (TTL) capabilities. The ease of implementation was key for Sizmek’s Streaming Infrastructure Team, who were given a very aggressive timeline for taking their systems into production. “We picked Scylla and just got it done–seven data centers up and running in two months,” explains Sizmek software engineer Lubos Kosco.

Another software engineer at Sizmek, Andrej Chu, gives further insight into their selection of Scylla. “While we’re all impressed by Scylla’s speed, the more pressing issue for us was finding a database that would be able to give us really low latencies. Scylla fit our needs perfectly because it doesn’t suffer the issues like garbage collection that come with other databases.”

By the Numbers

  • 2,000 clients (ad bidders) requiring information
  • 240 billion requests per day
  • 7 data centers
  • 3-node Scylla cluster for each datacenter
  • 15,000 requests per second on each node of the Scylla cluster
  • 70ms SLA for each bid
  • Read latencies under 1 ms for 99 percentile
  • Write latencies about 1 ms for 99 percentile