MediaMath is one of many companies coming to Scylla Summit 2017. Their technology and services enable marketers to reach audiences at scale. With that being said, they are serious when it comes to big data and scalability. John Turner of MediaMath will be giving a talk at Scylla Summit 2017 on how they use Apache Cassandra/Scylla to store user data and access it to support Real-time bidding (RTB). Let’s see what John will be presenting.
Please tell us about yourself and what you do at MediaMath?
I am the VP of Engineering at MediaMath. Before joining MediaMath, I was CTO of a small startup that was acquired in 2014. Since then, I’ve led a team utilizing machine learning to transform a huge dataset into actionable audiences.
How did you get involved with machine learning?
I joined a startup in 2013 that was looking at using neural networks as the basis for modeling audiences. Prior to that I had only been exposed to AI in graduate school.
How do you think most organizations are using machine learning these days?
In the ad-tech space it is mostly manual modeling by teams of data scientists. We have taken a different approach, where our custom models are built by software and not people. This is closer to the way other companies use machine learning for things like recommendation systems. If you have a user base the size of Amazon or Netflix you can’t rely on people in this process.
What do you think are the barriers for organizations to adopt machine learning?
The size of the data, sparsity of the data, and processing time. It can take a very large cluster to process the data. It can take a very long time to get back results. These two things can add up to a large bill. I think the key is teaming up good engineers with good data scientists to build smart solutions.
What will you be talking about at Scylla Summit 2017?
I will be giving a talk on how MediaMath uses Apache Cassandra/Scylla to store user data and access it to support Real-time bidding (RTB). RTB is the process of buying and selling online ad impressions through real-time auctions.
What are some of the outcomes of Applied Learning for Real-time for MediaMath?
Our platform was based on a large batch process that does all the heavy lifting. This worked fine up to a point, but to keep scaling as we plan requires us to be smarter about how we do things. Moving to a process that is more on-demand and streaming means we can only compute what is needed and we can do it closing to the time it is needed.
What problems have you experienced with Apache Cassandra?
Performance at scale. My understanding is there were struggles getting the response times low enough for our high demand as we scaled up the data. We spent a long time working with Apache Cassandra and feel better at this point, but there is still concerns around adding more data.
What type of audience will be interested in your talk?
People working in ad-tech or other big data fields.
Can you please tell me more about your talk?
In my talk, I will go over what MediaMath is and talk about the MediaMath Data Management Platform (DMP). DMP empowers marketers to control and activate their data seamlessly, giving them the tools to onboard, segment, understand and activate their data – in real time and without data loss – in omnichannel environments.
I will also go over in detail about the current architecture, adaptive segments, and how we use Apache Cassandra. I will also talk about our future architecture that uses Scylla and how we are using it now.
Where can we learn more information about your talk?
To learn more, check out our War Stories post on our developer blog. The post is part one of a multi-part series exploring the successes (and scars) that we’ve had while tuning Apache Cassandra to perform well in MediaMath’s Data Management Platform.
How can the people get in touch with you?
The best way to get in touch with me is on Twitter.
Thank you very much, John. We can not wait to see your talk in person and learn more. If you want to attend Scylla Summit 2017 and enjoy more than 40 talks like this one, please register here.
Scylla Summit is taking place in San Francisco, CA on October 24-25. Check out the full agenda on our website to learn about the rest of the talks—including technical talks from the Scylla team, the Scylla roadmap, and a hands-on workshop where you’ll learn how to get the most out of your Scylla cluster.
Apache®, Apache Cassandra®, are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. No endorsement by The Apache Software Foundation is implied by the use of these marks.