Sep28

Analytics Show Time: Presto Powered by Scylla

Subscribe to Our Blog

Presto is a popular, open source, distributed ANSI SQL query engine. It is used to run ad-hoc, interactive analytic queries on many data sources including HDFS, S3, Cassandra, MySQL, Kafka, PostgreSQL, Redis and Scylla.

Unlike Apache Hive, Presto is not a layer on top of Map Reduce (Hadoop). It was designed from scratch to execute SQL queries, on data sizes ranging from gigabytes to petabytes.

In particular, Presto is attractive for organizations who have multiple databases and are interested in running queries, including SQL JOIN, across more than one of them at a time. It has a simple install, which makes it easy to get started with (no ZooKeeper, thank you)

Presto logo

Presto connects to Scylla using the same connector as Cassandra (hooray for driver compatibility) allowing you to take advantage of Scylla’s superior throughput and latency.

I recently gave a short introduction to Scylla and Presto at Scylla Summit 2016. slides and video below.
(I’m the guy with the Scylla t-shirt)

Try it yourself

Want to give Presto and Scylla a spin? There is a Docker image just for that:

Run

sudo docker run --name some-scylla-presto -d tzachl/scylla-and-presto-image

Provision Scylla with CQLSh

$ sudo docker exec -it some-scylla-presto cqlsh
cqlsh> CREATE KEYSPACE mykeyspace WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 1 };
use mykeyspace ;
CREATE TABLE air_quality_data (
    sensor_id text,
    time timestamp,
    co_ppm int,
    PRIMARY KEY (sensor_id, time)
);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:00', 17);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:01', 18);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:02', 19);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:03', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 30);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:04', 31);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('my_home', '2016-08-30 07:01:10', 20);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:00', 200);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:01', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:02', 201);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:03', 401);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:04', 402);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:10', 1000);
INSERT INTO air_quality_data(sensor_id, time, co_ppm) VALUES ('your_home', '2016-08-30 07:01:11', 2000);
exit;

Run Presto CLI

$ sudo docker exec -it some-scylla-presto ./presto --server localhost:8080 --catalog cassandra --schema default
presto:default> select sensor_id, avg(co_ppm) as AVG from cassandra.mykeyspace.air_quality_data group by sensor_id;

 sensor_id |       avg
-----------+--------------------
 your_home |  629.2857142857143 
 my_home   | 20.833333333333332 
(2 rows)

Any questions about Scylla, Presto or Scylla with Presto? Join the discussion at the Scylla user group, get our blog RSS feed, or follow @ScyllaDB on Twitter.

Tzach LivyatanAbout Tzach Livyatan

Tzach Livyatan has a B.A. and MSc in Computer Science (Technion, Summa Cum Laude), and has had a 15 year career in development, system engineering and product management. In the past he worked in the Telecom domain, focusing on carrier grade systems, signalling, policy and charging applications.


Tags: 3rd-party-integration, analytic, presto, summit, video