As we prepare for ScyllaDB Summit 2019, we’re highlighting some of this year’s featured presenters. And a reminder, if you’re not yet registered for ScyllaDB Summit, please take the time to register now!
Today we’ll share our conversation with Brian Hall of Expero, and Ryan Stauffer of Enharmonic. Together they will present back-to-back sessions entitled Incorporating JanusGraph into Your ScyllaDB Ecosystem and Powering a Graph Data System with ScyllaDB + JanusGraph. These two practitioners bring tremendous experience in getting the most out of your graph data.
There are countless different technologies today employed in big data and data analytics. What paths did you follow to get involved with graph databases specifically?
Ryan: A few years ago, when I was working at a large Private Equity-backed holding company, I was working on bringing data together from across different subsidiaries into a single unified data model. It became clear that thinking in terms of rows, columns, and tables was taking us down some very awkward and indirect routes to data insights. On the other hand, a graph data model maps directly to real-world business concepts. If the model is laid out clearly, it can be much easier for non-engineer decision-makers to conceptualize. I’ve experimented and iterated with different pieces in the graph “tech stack,” but since then I’ve viewed graph databases as a core part of a solution for how to ask better questions and reach deeper insights.
Brian: At Expero, we build products and applications for our clients… and we bring the tools and technologies required to do the job. Only sometimes do they require graph databases. And that’s exactly how we got into this business. A client came to us with a problem that we all thought needed a graph database. As part of project work, we evaluated several for them to consider and then recommended to them what technology they should pick and why. They allowed us to write up and publish our findings. This was back in 2014 when not many people were talking about them and boom! our phone started ringing off the hook. Voila! We were in the graph database business.
JanusGraph is a well-known graph database solution. It’s growth is in great part due to its flexibility via pluggable back-end storage systems. What makes ScyllaDB different and better as a back-end from other options you can use under JanusGraph?
Brian: While there are options for the back-end storage, ScyllaDB stands out for a couple of reasons: obviously first and foremost – performance. I don’t believe anyone would dispute that. Secondly, it really is simple to setup and maintain. While there are countless settings and knobs that can be adjusted on some of the other options, you don’t have to spend years learning how to tune ScyllaDB. It’s performant right away. Lastly, unlike some others, you have some measure of cloud provider portability if that’s important to you.
Ryan: Brian already hit on performance and ease of setup – which I certainly agree with, and which on their own make ScyllaDB the go-to storage backend choice for JanusGraph. I like being able to sidestep the JVM for this piece of the deployment (as opposed to dealing with HBase or Cassandra). I also feel very strongly about the commitment from the ScyllaDB team. They provide an incredible breadth of knowledge and technical insight into their product and how to operate it – all the way down to hardcore performance metrics, and explaining “why” things work the way they do. It’s all very reassuring. And when hiccups do arise around deployments or ops, I’ve found them to be incredibly accessible and skilled at helping to quickly resolve issues.
Gremlin/Tinkerpop have now been around for a decade. GraphQL has been around since 2012. OpenCypher since 2015. Yet while these APIs aren’t new, there still seems to be confusion as to which to use, or not use. What’s your take?
Ryan: There’s no perfect answer this question, and it’s certainly not going to be decided here. There’s a Big Question that underpins all of this: What are you trying to accomplish? Are you just looking for a new form of API to access your existing data? If so you should probably at least investigate GraphQL (but, spoiler alert, it really has little to do with an actual “graph” model of data, nothing to do with graph databases, and nothing to do with graph algorithms).
On the other hand, if you’re looking for an expressive language that you can use to explore a graph at scale, I believe Gremlin is the way to go. I love the conceptual backing of Gremlin and TinkerPop. The language itself is expressive, and you can choose to write either imperatively or declaratively based on context and preference. You can also build and submit traversals in a variety of languages (Python, Ruby, etc.), which lets you avoid string manipulation while also streamlining automated query-building and testing.
Brian: Practically speaking when you select your graph database, you implicitly are also selecting your query and traversal language. Arguably, by selecting open standards you allow for graph database portability in the future to avoid vendor lockin. By choosing JanusGraph, you are also choosing Gremlin/Tinkerpop.
There is a movement afoot currently within the vendor community to lay down a universal query language, GQL, with many in the community participating. While that initiative is still very early, it would be good to align along more consistent syntax and processing paradigm. We would then benefit from shared skillsets like classic ANSI SQL. RDBMS vendors ultimately would all agree on selects, joins and unions and compete on more advanced features. I’m happy I don’t have to learn the syntax on how to do a 2 table customer/address join in any new SQL engine. I’d love the graph db market to get to the same place.
People in the ScyllaDB community are likely aware of your prior work, from Ryan’s blog on Powering a Graph Data System with ScyllaDB + JanusGraph, to Brian’s webinar on Architecting a Corporate Compliance Platform with Graph and NoSQL Databases. What will you present at ScyllaDB Summit that goes beyond what people have seen or read from you both in the past?
Ryan: My goal will be to highlight just how quickly you can get up and running with a pretty powerful deployment of JanusGraph and ScyllaDB. It’s easy to get stuck in documentation, or worry about a particular configuration setting, but really it doesn’t take much to get set up in a way that lets you start throwing meaningfully large amounts of data at your graph. I’m also really interested to hear what questions and experiences people in the audience have had. I find the fact that you can get everyone together in the same room to be the biggest draw of an event like this.
Brian: Since we’ll be tag teaming this event, we can cast the net a bit more broadly. I’ll be covering a lot more of the “art of the possible” with JanusGraph on top of ScyllaDB. Since this will be primarily a cohort of ScyllaDB folks who may know little to nothing about graphs in general, I want people to be introduced to graph and what it can do for them. There are a lot of mind-blowing things that are possible if you just know to look.
Lastly, we’d like to get to know a bit about you both outside of technology. What interests or hobbies do you each have beyond creating Big Data solutions?
Ryan: I think I’m an explorer at heart. My most recent big adventure was Iceland this past summer, but I also love adventuring around San Francisco. Sometimes that means seeking out new running routes (my current favorite is tracing the Embarcadero from Oracle Park, then cutting in and taking the stairs up to Coit Tower), and sometimes that means just relaxing at a new restaurant or bar. I also have a few computer music projects that help me reset my thoughts.
Brian: Primarily travel, food and wine with my friends and family. Recent journeys include Kenya, Tanzania and Spain and I’m looking forward to Austria and Hungary in the near future. To paraphrase Mark Twain: “travel is fatal to bigotry, prejudice and narrowmindedness… happy traveling!”
Thank you both for taking the time to speak with me today. I’m sure our attendees will find your presentations incredibly valuable!