Natura’s Short and Straight Road from Cassandra to Scylla
Natura is a multibillion-dollar Brazil-based consumer beauty products company that prides itself on its ecological friendliness, bioethics and sustainable business practices. Its operations span over 70 countries, 3,200 stores, 17,000 employees, and a human network of 1.8 million sales consultants. These operations supports sales to over 100 million consumers.
The global powerhouse is comprised of three major brands: Natura, founded out of Brazil, Aēsop, founded out of Australia, and The Body Shop, founded out of the United Kingdom.
To support Natura’s global operations requires managing a tremendous amount of consumer data. Felipe Moz, Big Data Engineer at Natura, took the time at Scylla Summit 2018 to describe Natura’s corporate data infrastructure, and why they migrated from DataStax Enterprise (a proprietary version of Apache Cassandra) to Scylla to continue scaling their business.
Natura’s architecture is front-ended with NGINX and Node.js. Data from user activities with their website are streamed over Apache Kafka into Apache Spark jobs — performing about 160,000 RDDs of streaming data as well as processing around 40 “long batch” jobs on a daily basis.
In their architecture, Scylla replaced DSE for key-value data. MongoDB is still used for document-oriented data. The reason DSE was replaced was to avoid JVM issues. Natura had suffered significant performance issues.
Figure 1: Natura’s Big Data Architecture, deployed and managed using Docker and Kubernetes, includes NGINX, Apache Kafka and Apache Spark, NoSQL databases including MongoDB and Scylla, as well as a Talend integration to an Oracle SQL database.
They aren’t our provider. They are a part of our team.
— Felipe Moz, Natura
Beyond the product aspects of Scylla, Felipe emphasized the value of ScyllaDB’s enterprise support. “They know our use case and data modeling.” He was also glad to be able to get direct access to developers when needed through Scylla’s Slack channel.
Then Felipe got down to the bottom line, favorably comparing the cost of Scylla versus DataStax Enterprise. To provision hardware for Datastax Enterprise on Microsoft Azure required five DS14 v2 servers, which cost $2,000 per node for 744 hours (equivalent to a 31-day month), resulting in a monthly cost of $10,000, thus an annual cost of over $120,000 for the required hardware. With Scylla running on AWS i3.4xlarge instances, the monthly cost per node dropped to $913.54, making the monthly hardware costs around $4,600, annualized to less than $55,000. This alone saved Natura over half their annual hardware expenditure.
On top of the cost savings, Felipe noted significant performance benefits. Batch processing times on Scylla dropped on average to 10% of what they used to take on DataStax. In some cases, batch jobs that took 6 hours to run on DSE took less than 10 minutes to run on Scylla (1/36th, or 0.27% of the time).
For streaming analytics, Spark jobs that used to take 76 ms to complete on DSE were accomplished in as little as 6 ms on Scylla.
And for pure database performance, for p95 writes latencies dropped from 220 milliseconds (ms) on DSE to around 500 microseconds (μsec) on Scylla.
Scylla provided Natura with performance an order of magnitude greater than DataStax Enterprise for half the cloud server cost.
You can watch the Scylla Summit 2018 video below, or read Felipe’s slides on the related Tech Talk page.
If you’ve done comparisons of Scylla versus other NoSQL solutions in your own organization, or are about to conduct a performance-related comparison, we’d love to hear from you. Please drop us a line or feel free to join our Slack channel to discuss your results.