Developing Enterprise Consciousness: Building Modern Open Data Platforms

Rahul Xavier Singh21 minutes

ScyllaDB, alongside some of the other major distributed real-time technologies, gives businesses a unique opportunity to achieve enterprise consciousness - a business platform that delivers data to the people that need it when they need it anytime, anywhere.

This talk covers how modern tools in the open data platform can help companies synchronize data across their applications using open source tools and technologies and more modern low-code ETL/ReverseETL tools.

Topics

  • Business Platform Challenges
  • What Enterprise Consciousness Solves
  • How ScyllaDB Empowers Enterprise Consciousness
  • What can ScyllaDB do for Big Companies
  • What can ScyllaDB do for Smaller Companies
Share this

Video Slides

Video Transcript

Thanks everybody for joining us. I want to thank ScyllaDB for hosting hosting me and hosting this amazing conference. Today I’ll be talking about how to build modern open data platforms and more specifically how to try to achieve this thing called Enterprise Consciousness which is real-time data synchronization across your Enterprise.

a little bit about myself um this is just for context it’s uh you know I started my in my career um building servers and and switches and built out data centers uh I got into applications content heavy knowledge heavy uh and then I got into Enterprise search because you know when you have a lot of content you need to search it and that’s how I got into big data and that’s how I got into uh real-time data and that’s what I focus on these days um subject matter expert in the Cassandra world and still ADB is one of the better uh you know Cassandra variants and it’s got more than that but that’s why I’m here I want to talk about how we can use those Concepts um with ScyllaDB

today I’m going to talk a little bit about uh The Playbook that we use to build out platforms you can always go to our website and and see our principles there but basically it’s broken up into a design principles which framework to use we have a reference framework we recommend and then an approach to managing the platform going forward um we’re probably going to get to about you know one or two of the use cases because we have a pretty short uh you know time frame here but my goal is that you walk away with an understanding on how to build out open data platforms for your Enterprise and how to make the right choices between Cloud native open source commercial tools the big challenge that businesses have is they’re trying to connect all of their platforms together in some way or form uh this is a graphic from Gartner’s paper on digital business technology platforms and and these blocks are just actually categories of platforms these aren’t even platforms themselves and when we say you know business trying to achieve Enterprise Consciousness what we’re talking about is how can we get information connected and synchronized between systems so that both systems and people can get access to it when they need it

and you know if you’ve ever uh read the reactive Manifesto or the 12 Factor app there are principles on how to you know build scalable applications and so for us Enterprise Consciousness goes beyond that and asks can we do something similar for the whole Enterprise can we make information available to people when they need it of course secured and governed properly how to manage cold raw and and hot data but eventually what we really get trying to get to is can the Enterprise be up and running all the time and we have a you know recovery point a recovery time objective of zero and with Technologies like ScyllaDB that’s that’s possible today but when we think about solid VB is just a database right how does the rest of the Enterprise benefit from this concept

there are lots of challenges um and we’re focusing on data platforms um because data platforms are at the center of the business platform and uh you know this isn’t anything new I didn’t come up with this but essentially you know most businesses go through these phases of Enterprise architecture there’s a really good book called Enterprise architecture as business strategy it’s a Harvard Business School book I highly recommend reading it for Technical and non-technical leaders but every business starts out with silos and then they start to standardize and then they start to create this optimized data core and that optimized data core is what enables them to be modular and and scalable in the future and ultimately that’s where we get involved uh my job is to help people get to that optimized data core and over the years the technologies have changed but this progression has not every company has to do this and if they don’t do the step before uh the optimized data core of standardizing their platform they basically have to do it again

what is a data platform well everybody has their own definition I just call it hey you got to get data in got to get data out there’s a bunch of stuff that you have to do inside it uh and a data platform is not just one database there’s data warehouses there’s nosql databases there’s relational databases there’s cues um but the common element is that there is more and more Ingress points coming in for data and there are more and more egress points for how data goes out but ultimately your data platform needs to be able to service those requests and as a company grows especially in today’s day there’s more and more data either user generated machine generated system generated that we need to account for

distributed data helps us with that transformation uh before when we had monolithic relational databases there was a limit that we would hit or you just just have to keep buying more and more expensive Hardware um and of course more and more expensive software to handle the amount of Ingress and egress with databases like Cassandra and couchbase and extend expanded by ScyllaDB we have the ability to have operational transactional data uh happening in the same system uh as analytical workloads as uh you know servicing workloads and the cross data center replication this masterless architecture allows the same exact data to be worked on with apis bringing data in API is bringing data out and some sort of query or analysis happening at the same time that magic when I first found out about how Cassandra could do it that’s really when I dove into Cassandra and then in ScyllaDB world which we’ll you know learn about later is we take that concept and we supercharge it with the better memory management and you know of course it’s written in C plus so it’s going to be much better but the concept is the same we have data in two different places we can act on it in two different places

the modern data platform is hard to describe you go Google it if you talk to chat GPT it’ll give you some some answer as well maybe we’ll talk on that at some point but the modern data open data platform everybody has a different perspective right and what we try to do is to say let’s let’s forget the technology for a second let’s think about the context of people process Information Systems let’s think about what are the responsibility areas or departments in the organization and apply an approach to management using a common framework and with the right tool so this is the first part of our Playbook the design uh uh you know principles are don’t don’t reinvent the wheel basically and if you go out on Google and you look for what is a you know open data platform or modern open data platform what are the modern Stacks out there there’s tons of reference architectures now you’ll notice some logos recurring here and there um and not all of them are really geared towards Big Data um I think in the medium to small data there’s lots of companies that are saying hey you can do this but when we get to Big Data there isn’t really a lot out there

and so we take what we learn from everybody else and we take our Playbook and we go through this design process the discovery process of let’s let’s inventory all the people all the processes all the information or objects we need to deal with and the systems that where this data is coming from once we have that inventory we can take a look at a highly opinionated uh you know framework focused on real-time big data right there’s other databases that do uh real time but they may not be big data and our Focus uh as a team uh you know is to focus on helping people build big global data platforms and not every tool works with uh databases like ScyllaDB so we have to think about curating this toolkit for what can work with database like facility B uh and can it scale like ScyllaDB and you know we’re really focusing on the data side here but there’s also a whole you know slew of tools in the user experience world that allow people to build applications pretty quickly um and that’s kind of the new new world of technology everything is becoming low code or no code and even the data engineering tools are becoming low code or no code so the lift of creating an open data platform is much easier today than it was even let’s say six months or maybe you know two years ago but the goal of managing the platform going forward should be not necessarily programming but configuring or or clicks not code like Salesforce has one of their mantras if we use the right automations if we use a right uh you know infrastructure as code tools then the system should really run itself uh of course we want we want to add new data we have to do a configuration uh but you know I want to say if you were making a new platform today use as many low code no cool code tools out there because that’s the future you know why why reinvent the wheel why write code that a robot is already writing for you

I’ve talked about the framework um framework for us is any component that’s out there that meets our Ruger brick is it distributed is it real time is it Expendable or is it open does it have apis um is it something that we can automate and of course can we Monitor and manage it properly so there are certain tools out there that are in the commercial world or the SAS world that meet four out of five of these open source tools generally fit into this all of every single item in this checklist an open source tool like ScyllaDB would meet uh without a problem you can talk about spark it meets these criterias so we go through this checklist because we want to make sure that if we recommend a tool or if we add a tool to our framework it’s something that can scale to either millions of employees or millions of users whatever way you look at it

if you go the public Cloud native route with let’s say Amazon you can build systems that they offer right they have great toolkit of course the problem there is that you can only do it on Amazon you can’t necessarily do it on the same thing let’s say you’re you’re a company that um Amazon is competing with because they bought another company then what do you do right um same thing with with Microsoft they’ve got a great toolkit they have a combination of their own cloud native they have partnered with companies like databricks but ultimately it’s the same problem you can do the whole architecture on Azure AWS or Google without a problem the question really comes down to is how are you growing in terms of features and will the features that Amazon or Azure or or Google are they offering features that meet your needs so for example awss Dynamo Google has bigtable both of these are kind of you know precursors to Technologies like Cassandra and solidity but they’re a little stuck where they’re at right what can we do now with ScyllaDB it’s like light years ahead of what we can do with bigtable or light years ahead of what we can do with Dynamo demo is still a good API layer but what we can do with ScyllaDB with the cql layer is just amazing compared to what we could do with just dynamodb or even bigtable I mean it’s kind of funny we look at uh how to talk to bigtable these days and we’re really just using tools from 10 years ago or five years ago

so I’m not saying you should do everything on cloud or don’t do anything anything with public Cloud native Technologies what I’m recommending is try to optimize what are the right tools we want to use with the cloud native offerings combination with tools that we know are the best to breathe

and there are lots of open core distributed data platform Technologies out there from the data storage layer to the processing to the query layer as well as streaming and stream processing and every technology out here is either you know open database open open source or it’s open core there’s also as a service offerings out there that mimic a particular technology let’s say Amazon key spaces or Cosmos DB but they’re closed in the sense that they’re only available on Azure uh however if you use ScyllaDB uh you can host that database on any cloud and essentially switch between clouds if you needed to and you know what one of the clients said it’s not necessarily that they’re going to switch clouds that they care about uh you know openness is that maybe they have a customer um you know maybe they’re on Google Cloud let’s say right and their software is meant for them for for business well what if a company comes and says I have everything on Amazon I only want to use your software on Amazon and you know what do you do then right are we going to send data to you on on Microsoft or Google uh a separate Cloud to process and bring it back to me or can I just host your your tool in my cloud and if you’re building software this is a big requirement can it run on different clouds but if you’re building Enterprises you can think about you know what happens if Amazon or uh let’s say Google or Azure not that they’re going to go down I think that was like an argument before for multi-cloud but you know let’s say that they start to get into your industry how do you pivot from One Cloud to the other uh Technologies like you know open source Technologies like ScyllaDB can allow you to switch between clouds because you can literally move the database over even have it work on both clouds at the same time without having any downtime and that applies to spark that applies to uh you know Technologies like Kafka as well

beyond the actual data platforms out there there’s lots of great Automation and integration tools that are available that are open source we have catalogs data catalogs that are open source we have orchestration tools like like airflow that can help with not only orchestrating infrastructure but also orchestrating the different pipelines that are out there we have low code tools like grouparoo and air byte which we’ll take a look at a little bit later we have systems like post hog that basically can mimic um you know and different user analytics tools and that data is stored in an open database so in addition to the open data platforms there’s a whole collection and growing collection of tools and technologies that we would normally pay a SAS company or a platform as a service company and that data is locked in there today we can just bring it in-house and maybe five years ago or maybe 10 years ago I would say don’t host it yourself it’s a pain in the neck um like with kubernetes basically everything can be hosted pretty easily I’m not saying that it’s a it’s a it’s a catch-all to make everything easy but with the standardization of kubernetes across all the clouds all those tools and Technologies can be deployed the same exact way uh pretty much um across all the different clouds in the bi tools space we have a growing toolkit like metabase or redash and of course on you know the the data processing side we have things like air bite uh and Rudder stack is another competitor to air bite so anytime you’re talking about open core or open data platforms we’re talking about technologies that you can either pay the company to host for you or you can do it yourself

and when we bring all of these components together um we look at it as a computer and a computer that can be built in different strategies there can be an open source strategy to build that computer they can be a Google native a strategy to build a computer but ultimately it’s a combination of these tools that is right for the business it’s not one thing meets all the requirements every business is slightly different and your business may need something that is both open source and Google native or maybe it’s AWS and Google and open source

this reference architecture kind of brings all this together it combines low code no code tools from data processing for user interface as well as tools to bring out apis pretty quickly the lower level of it is really data and devops heavy um and we put these things together because these are common elements we didn’t come up with architecture spark Cassandra Kafka has been a trio that’s been used for oh you know many many years today you don’t have to use that you can use spark Cilla and let’s say red panda if you wanted to right they’re kind of synonymous Technologies

once you have a platform we have to manage IT and management basically for us is a simple way of documenting the stack setup training Administration customization and Knowledge Management this is the single most important thing in a platform is once you build it how are you going to manage it afterwards um and we have a schema that basically makes it easy for people to have one knowledge base whether you’re doing Confluence or markdown doesn’t really matter but a standardized way to make sure any platform you’re creating is documented like a software product that you’d be publishing for other people you keep it in-house for your team members

bringing it all together for this Enterprise Consciousness the standard data fabric um you know we talked about how Cassandra came out with the the cool technologies that are out there ScyllaDB has gone light years ahead they have better memory management better CPU management um it also has a Dynamo layer if you want to use that to get off of AWS and you know getting data into solid EB is much easier now there’s lots of tools like air byte and Rudder you can continue to use Kafka connect or pulsario to do this

and once it’s in there uh you know there’s technologies that you can use to serve it out pretty quickly

you can also send data back using tools like grouperoo so you have data that’s coming in you’re processing it you’re serving it you can serve it out why is this important for Enterprise Consciousness is this is the main reason why we’re talking about today is that Technologies are easier to get data in and get data out

and this is what the complete picture looks like now you may not use any of these tools so I’m going to give you one specific example that may resonate so imagine you have HubSpot MailChimp QuickBooks you want to get data into cell DB air byte is a good tool to bring it into an operational data store you can do some spark analysis to make your master data store and then once you’re happy with it you can serve it out using graphql or dynamos interface um next the X the alternator interface and if you want to send it back you can use groupview to send that data back from your master data store out to HubSpot and QuickBooks this technology is all low code and you know when you think about implementation time you’re talking about days not weeks and months

key takeaways don’t read the wheel identify your objectives identify which objects you want to move prioritize devops and data Ops because somebody’s going to maintain it let a robot maintain it run it for you use open tools as the preference and then finally document it the next employee or the staff member that comes on it shouldn’t be a chore for them to understand how the whole system works

thank you for having me here today Sila DB and you know if you’re interested in some of the stuff we talked about check us out we have tons of great content out there on our website we have a weekly webinar series for data engineering and for Cassandra topics where we top we also cover cell ADB um and you know anytime you want to get reach out you can also reach out to me um happy to help thank you again everybody for coming and looking forward to talking to you guys in the future [Applause] foreign

Read More