Gumgum is a global media company that analyzes digital media in a new way, scanning sites for relevant or risky content before ads are served. Using the latest in deep learning technology, GumGum has spent ten years optimizing its content analysis capabilities. Today, GumGum’s contextual intelligence platform is unique in its ability to scan text, images, videos and audio when evaluating online content.
The heart of GumGum’s platform is Verity, an artificial intelligence platform that provides sophisticated in-image and in-video contextual targeting. Verity uses natural language processing along with computer vision technology to analyze the context of a website. It’s able to not only to scan the images, but also to interpret those images in the context of the rest of the page to derive very accurate marketing segments. This enables GumGum to find the correct context for an ad. Verity is also able to target ads without relying on cookies, providing insulation against a more restrictive regulatory environment.
But at the end of the day, what powers GumGum is an ad server. According to Keith Sader, engineering manager of the Ad Server team at Gumgum, “this is what I’d call a typical high scalability ad servicing system.” Requests come in from direct campaigns, server-side providers who are looking for ads that GumGum is trying to supply, and from the publishers themselves, who have ad contracts to serve ads on GumGum’s system.
To support the system, GumGum runs a set of data stores. Campaign information is stored in a relational data store. Information such as targeted platforms and associated creaties are all managed through a relational data store. DynamoDB is used for items dynamically linked to an ad, such as visitor cookie tracking, images, and web URL.
The backend of the Verity platform is Tally. As its name suggests, Tally counts everything, including impressions, views, clicks, revenue, video streams, and play percentages, all in the course of serving advertisements. Tally lets the system decide which ad to serve based on a number of factors, such as proportions of impressions, revenue optimization, whether an ad has fulfilled its campaign goal, and so on.
“We got a much better and performant data system around the globe.”
– Keith Sader, Engineering Manager of the Ad Server Team, Gumgum
The Tally backend was originally implemented in Cassandra, running in a cluster of 51 i32.xls instances around the globe, including Virginia, Oregon, Japan, and Ireland. The basic cost for Cassandra was almost $200,000 per year, not including staff time. With at least 20% of two staff time on Cassandra per year, the additional operational cost ranged between $30,000 and $60,000 a year.
Sader explained that “Cassandra is fantastic if you can throw a development team at it, and we’re not going to do that. We’re making an ad server with computer vision, which is almost an entirely different problem.”
Beyond the operational overhead and costs, Cassandra also presented other issues. Cassandra simply did not perform well under high loads. Adding new nodes to support more traffic proved to be a laborious manual procedure. The team also found themselves in a position where they lost vendor support and were required to run a legacy version of Cassandra, 2.1.15.
In search of alternatives, the GumGum team started evaluating a range of vendors, including Datastax, DynamoDB, Redis, and ScyllaDB.
The GumGum team eliminated Datastax from the evaluation based on its high cost. They eliminated DynamoDB because it would require major changes to the Tally backend. Redis proved to be a poor fit because Tally is read-heavy and also requires consistent writes.
The team saw ScyllaDB as an obvious choice. As a fully managed service, ScyllaDB Cloud provides a good combination of price and performance, while virtually eliminating operational overhead. Thanks to ScyllaDB’s API compatibility with Cassandra, the migration would require no major changes to the Tally system.
Today, GumGum runs ScyllaDB on 21 servers in North America and six servers in Ireland, and three servers in Japan.
“(Moving to ScyllaDB Cloud) was the big win for us,” Sader explained. “For just $13,000, we got a much better and performant data system around the globe. With ScyllaDB being more efficient at the hardware level, we will even be able to scale some of these node sizes back.”
By The Numbers
Cassandra Tally Cluster
Cluster size: 51 instances
Total annual cost: $199,252
Staff time: $30K – $60K
TCO: $229,252 – $259,252
ScyllaDB Tally Cluster
Cluster size: 30 instances
Total annual cost: $212,528
Staff time: N/A