Case Study: CERN Optimizes Computing Resources with ScyllaDB

About CERN

Founded in 1954, the CERN laboratory was one of Europe’s first joint ventures and now has 22 member states. CERN use the world’s largest and most complex scientific instruments to study the basic constituents of matter – the fundamental particles. The instruments used at CERN are purpose-built particle accelerators and detectors. Accelerators boost beams of particles to high energies before the beams are made to collide with each other or with stationary targets, a process that gives the physicists insights into the fundamental laws of nature.

One of seven experiments on CERN’s Large Hadron Collider, ALICE (A Large Ion Collider Experiment) studies the hadrons, electrons, muons and photons produced in the collisions of heavy nuclei. In the process, it creates matter that is much hotter than the sun.

NoSQL vs RDBMS Performance

In support of its nuclear research, CERN’s distributed computing infrastructure must scale massively. The laboratory’s AliEn framework coordinates resources from computing centers spread around the globe, supporting as many as 150K concurrent jobs and amassing over 80 petabytes of storage. The AliEn Global File Catalogue is a meta-data index of every single file of the experiment and spread across 80 computer centers in 5 continents. The data is stored from the moment it’s taken in the experiment itself and made available for other phases of research.

At CERN, things tend to grow very quickly. “We will have great enhancements in the experiments that will result in much more data,” explains Miguel Martinez Pedreira, Computer Engineer, at CERN. For example, the next phase of ALICE is expected to have five times more computing resources, ten times more disk and tape storage and ten times more files to manage. So improving performance and scalability over its existing MySQL-based Global File Catalogue was crucial, along with freeing up physical server space and controlling budget expenses for CERN’s growing team.

“We needed something that scales by design, not artificial sharding on MySQL,” Pedreira continues. “We looked at a number of different solutions before narrowing it down to Cassandra and ScyllaDB. These NoSQL options met our base requirements of high availability, horizontal scalability, no single point of failure, consistency and sharding transparently.” After thorough research, the team determined that CERN computing needed NoSQL performance to scale with their growing organization.

ScyllaDB Outperforms Cassandra

As part of its evaluation process, CERN conducted a series of performance tests. ScyllaDB’s throughput proved to be as much as 6X that of Apache Cassandra.

“The best thing about ScyllaDB so far, apart from the product’s performance out of the box, has been the team.”

– Miguel Martinez Pedreira, Computer Engineer, CERN

CERN found in its evaluation that, despite a painful tuning exercise, Cassandra was not saturating its resources. “ScyllaDB exploited the resources of our machines to the fullest,” recalls Pedreira. “This is something we’re managing to do with ScyllaDB out of the box. On the other hand we’ve had problems doing this with Cassandra.”

Another ScyllaDB advantage has been its ability to keep things under control in CERN’s environment. “We don’t want to run huge clusters,” explains Pedreira. “We have constraints on space and budget and the people who work on this project. ScyllaDB is helping us to keep all this under control, while also keeping all our base requirements in place.”

CERN has also benefited from working closely with the ScyllaDB team. “The best thing about ScyllaDB so far, apart from the product’s performance out of the box, has been the team, which has been very supportive and helpful,” said Pedreira.