See all blog posts

Analytics Show Time: Presto Powered by ScyllaDB

Analytics ScyllaDB Presto

Presto is a popular, open source, distributed ANSI SQL query engine. It is used to run ad-hoc, interactive analytic queries on many data sources including HDFS, S3, Cassandra, MySQL, Kafka, PostgreSQL, Redis and ScyllaDB.

Unlike Apache Hive, Presto is not a layer on top of Map Reduce (Hadoop). It was designed from scratch to execute SQL queries, on data sizes ranging from gigabytes to petabytes.

In particular, Presto is attractive for organizations who have multiple databases and are interested in running queries, including SQL JOIN, across more than one of them at a time. It has a simple install, which makes it easy to get started with (no ZooKeeper, thank you)

Presto logo

Presto connects to ScyllaDB using the same connector as Cassandra (hooray for driver compatibility) allowing you to take advantage of ScyllaDB’s superior throughput and latency.

I recently gave a short introduction to ScyllaDB and Presto at ScyllaDB Summit 2016. slides and video below.
(I’m the guy with the ScyllaDB t-shirt)

Try it yourself

Want to give Presto and ScyllaDB a spin? There is a Docker image just for that:

Run

sudo docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image

Provision ScyllaDB with CQLSh

$ sudo docker exec -it some-scylla-presto cqlsh
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use mykeyspace ;
CREATE TABLE air_quality_data (
    sensor_id text,
    time timestamp,
    co_ppm int,
    PRIMARY KEY (sensor_id, time)
);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:00', 17);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:01', 18);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:02', 19);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:03', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 30);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 31);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:10', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:00', 200);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:01', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:02', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:03', 401);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:04', 402);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:10', 1000);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:11', 2000);
exit;

Run Presto CLI

$ sudo docker exec -it some-scylla-presto ./presto --server localhost:8080 --catalog cassandra --schema default
presto:default> select sensor_id, avg(co_ppm) as AVG from cassandra.mykeyspace.air_quality_data group by sensor_id;

 sensor_id |       avg
-----------+--------------------
 your_home |  629.2857142857143 
 my_home   | 20.833333333333332 
(2 rows)

Any questions about ScyllaDB, Presto or ScyllaDB with Presto? Join the discussion at the ScyllaDB user group, get our blog RSS feed, or follow @ScyllaDB on Twitter.

About Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.