Jul7

An Interview with Christopher Quinones on the newest graph database, JanusGraph

Subscribe to Our Blog

This is a modified interview with Christopher Quinones, Project Lead, and Developer on Compose for JanusGraph, previously that was published on Compose.com

Compose for JanusGraph is a fully managed, Apache TinkerPop-enabled, scalable graph database optimized for storing and querying highly-interconnected data modeled as millions or billions of vertices and edges. Graph databases are particularly adept at use cases such as fraud detection, counter-intelligence, behavior modeling, and social media or wherever complex interactions and networks exist.

IBM Compose for JanusGraph uses Scylla for the backend, why did you select Scylla? How was your experience adopting it?

Christopher Quinones is the project lead on Compose for JanusGraph. He has a passion for finding bugs and fixing production code and automating repetitive tasks he doesn’t like doing. He enjoys bowling in competitive leagues.

JanusGraph continues to evolve as part of the efforts of the open source community. At the time of our implementation, JanusGraph supported five different backends: BerkeleyDB, HBase, BigTable, Cassandra, and In Memory. All of these backends are good, but one needs to understand their application’s requirements before selecting the right tool for the job.

In our case, our goal was to offer a highly-available service that is partition tolerant. Out of these backends, Cassandra fit the bill as it’s oriented towards availability and partition tolerance over consistency. As the name implies, the “In Memory” backend does not persist data outside of memory, which eliminated it as an option. BerkeleyDB is more geared towards data on a single node. HBase and BigTable are more oriented towards consistency and partition tolerance over availability.

We use Scylla as a drop in replacement for Cassandra. It has numerous advantages over Cassandra such as user space networking, avoiding the page cache and thrashing by providing its own row cache, and a share nothing thread model.

We had experience in operating a Cassandra cluster for a different service.  With regard to Scylla adoption, we were able to drop it right into our architecture with a minor fix in our client to accommodate a difference on how the Cassandra Thrift client, connected to a Scylla cluster, is returning empty slices.

You didn’t start out working on graph databases. What is your background and what did you have to learn to become a proficient user?

I started off my career as a software developer focusing on automating deployments and testing, developing Java-based applications and tooling, and consulting with large customers to automatically deploy new software across their enterprise.

Once you understand that graph databases are about modeling your data as nodes and relationships, and are very flexible, you really start to think about all of the problems you’re presented as “graph problems.” Also, there are a lot of documentation and tutorials out there in the Apache Tinkerpop Community, JanusGraph Community, and IBM DeveloperWorks that helped me understand and experiment with graph data models and queries.

There’s a range of resources that I recommend for people getting started:

Inspecting sample applications is also a great way to learn more about using graph databases. Here’s some that are worth taking a look at:

With the project being built around open source projects, the official JanusGraph and Tinkerpop documentation are also good to refer to, especially if you’re looking for more hands-on experience and knowledge with the tools at play.

Thanks Christopher!

 

 


Tags: IBM Compose, JanusGraph