Get started on your path to becoming a ScyllaDB NoSQL database expert.
Take a CourseTim Spann is a Developer Advocate @ StreamNative where he works with Apache Pulsar, Apache Flink, Apache NiFi, Apache MXNet, TensorFlow, Apache Spark, big data, the IoT, machine learning, and deep learning. Tim has over a decade of experience with the IoT, big data, distributed computing, streaming technologies, and Java programming. Previously, he was a Principal Field Engineer at Cloudera, a Senior Solutions Architect at AirisData and a senior field engineer at Pivotal. He blogs for DZone, where he is the Big Data Zone leader, and runs a popular meetup in Princeton on big data, the IoT, deep learning, streaming, NiFi, the blockchain, and Spark. Tim is a frequent speaker at conferences such as IoT Fusion, Strata, ApacheCon, Data Works Summit Berlin, DataWorks Summit Sydney, and Oracle Code NYC. He holds a BS and MS in computer science.
David is a committer on the Apache Pulsar project, and also the author of “Pulsar in Action” and co-author of “Practical Hive”. He currently serves as a Developer Advocate for StreamNative where he focuses on strengthening the Apache Pulsar community through education and evangelization. Prior to that he was a principal software engineer on the messaging team at Splunk, and Director of Solutions for two Big Data startups; Streamlio and Hortonworks.
Hi and welcome to my talk sink your teeth into streaming at any scale. Today we’re going to talk about how Apache pulsar and ScyllaDB can help you drive streaming analytics applications. My name is David Karengaard I’m an Apache pulsar committer and also the author of Pulsar in action.
and so yeah we think that by by adding these capabilities that still a DB and stream native which is a provider of Apache Pulsar make a great system uh for building real-time analytics applications of you know uh has all the different capabilities you have a very fast distributed database system at a very fast stream storage system and integration to all the different stream processing engines that you need so that you can service different use cases again real-time analytics real-time machine learning uh any sort of High High Street high-speed data streaming pipelines you need for log analytics uh you know iot use cases uh again calculating feature sets for machine learning models as the data comes in all this capabilities are now made very easily with these two technologies merged together next slide please uh and just uh last one talk about reference architecture about how we envisioned putting all these pieces together again best practices Tim started off the talk with these are the different tools that you want to use for your tool set this is sort of how we map them out together so again you can have data coming uh into solid DP or coming in from flick processing so these microservices these uh mqtt protocols these sensor devices will get fed into Pulsar their their core of the sources at the bottom uh you can then use the Apache Flink SQL tool to do some analytics on that information uh you know calculate some running averages for example over time Windows uh and then publish that back into Apache Pulsar topic and then from that particular topic feed that data into the sylla DB uh directly again using that sync connector that we talked about in the previous slides and so you can have a continuous sort of ETL pipeline doing you know extractions Transformations loading analytics on that data and making it available so directly from your iot sensor devices a consumable to scilla tables almost you know you know as fast as pulsar and and still it can do it you can also do some additional analytics on other these visualization tools things like Apache Pinot or superset these other query tools that you can either go directly against Pulsar or again you can point um at other different data sources as well so this is sort of the data in the data out sort of uh reference architecture that we envisioned for pulsar and solid DB working together next slide please uh last but not not least again we’ve mentioned you know talked about this in the previous Slide the right the right tool for the right job you bring a lot of tools to a real-time pipeline uh developing exercise and so again the flank SQL that we mentioned here is for continuous analytics doing it for ingests doing real-time joins joining the different data sets together let’s say you know you have iot sensor data uh you may want to join that to some static data to say this is in you know my plan a or this is plant B this is the sensor type or that sensor type enriching this data continuously enriching that you can use things like Flink for that once you have that information in there you publish it to Apache Pulsar you can do some additional routing or transformation on that store the data there so backing up and then eventually again once you attach a sink you can route those pre-calculated values into Silla DB so you can have instant real-time lookup so these applications can again do like feature calculations for machine learning model I go look at what’s the key for this particular sensor type I can get the last reading for the last you know hour two hours three hours all these different averages and store massive amounts of data uh very quickly and access it in your near real time and then there’s also Apache Pinot for low latency instant query results if you want to use that as well to attach it to these other systems as well
and thank you so much for attending our talk please stay in touch this is both my contact information here as well as Tims scan the query code for more information free downloads uh reach out to us on Twitter email our GitHub accounts LinkedIn all different ways to get a hold of us so hopefully you found this educational and I’ll let Tim give any parting thoughts as well yeah definitely uh stay in touch there’s a lot of different things you could do with these Technologies so reach out if you have some Innovative things that you want to have questions on or help us solve some hard application problems [Applause]
2023 ©ScyllaDB | ScyllaDB, and ScyllaDB Cloud, are registered trademarks of ScyllaDB, Inc.
Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.