See all blog posts

Slow queries logging in ScyllaDB

Motivation

Many times in real life installations, one of the most important parameters of the system is the longest response time. Naturally, the shorter, the better. Therefore capturing the request that took a long time and understanding why it took it so long is a very critical and challenging task. Slow query tracing to the rescue!

Example

One of the common problems of ill-formed queries is that a query has to get to the remote Data Center in order to achieve a required consistency level.

Let’s see how Slow Query Logging is going to help us catch such problems.

Let’s create a clear cluster in an Amazon AWS environment with 3 Nodes in a us-east time zone and 3 in us-west time zone. We will use Ec2MultiRegionSnitch for simplicity:

$ nodetool status
Datacenter: us-east
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns    Host ID                               Rack
UN  54.87.27.55    237.34 KB  256     ?       7bda100f-ea8e-4e20-8d11-54d64d76c661  1c
UN  54.90.129.115  247.68 KB  256     ?       ad71327a-b750-4817-8c78-79801d21dee3  1c
UN  50.19.22.61    247.68 KB  256     ?       44448021-f96d-4de9-b018-daa5853acea5  1c
Datacenter: us-west
===================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address        Load       Tokens  Owns    Host ID                               Rack
UN  52.53.167.135  235.41 KB  256     ?       00e6fe23-be5c-4d9b-b933-68030d0e6a5b  1c
UN  54.153.93.219  274.82 KB  256     ?       3212ee29-17fd-437c-9e9c-2b9b09f0fd36  1c
UN  54.67.34.6     321.74 KB  256     ?       ce0899a8-e898-465e-9c36-34e29cede5b4  1c

Then lets create a keyspace and a table for the demonstration. We’ll run cqlsh here and after from a Node with an address 50.19.22.61:

cqlsh> CREATE KEYSPACE my_ks with replication = {'class': 'NetworkTopologyStrategy', 'us-east' : '2', 'us-west' : '2'};
cqlsh> CREATE table my_ks.my_cf1 ( c1 text PRIMARY KEY, c2 text);

Let’s add a row to it:

cqlsh> INSERT INTO my_ks.my_cf1 (c1, c2) VALUES ('key1', 'value1');

Let’s enable the Slow Query Logging:

curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/slow_query?enable=true&ttl=86000&threshold=10000"

Now let’s read the row above with the “bad” consistency level: QUORUM

cqlsh> CONSISTENCY QUORUM
Consistency level set to QUORUM.
cqlsh> SELECT * FROM my_ks.my_cf1 where c1='key1';

 c1   | c2
------+--------
 key1 | value1

(1 rows)

Now let’s stop the Slow Query Logging:

curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/slow_query?enable=false&ttl=86000&threshold=10000"

Let’s see if we have anything in the system_traces.node_slow_log:

cqlsh> SELECT * from system_traces.node_slow_log ;

 start_time                           | node_ip     | shard | command                                     | date                     | duration | parameters                                                                                                                                                                                | session_id                           | source_ip | table_names      | username
--------------------------------------+-------------+-------+---------------------------------------------+--------------------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------------------------+-----------+------------------+-----------
 bd212320-94d5-11e6-925a-000000000003 | 50.19.22.61 |    25 | SELECT * FROM my_ks.my_cf1 where c1='key1'; | 2016-10-18 01:54:00+0000 |   211827 | {'consistency_level': 'QUORUM', 'page_size': '100', 'query': 'SELECT * FROM my_ks.my_cf1 where c1=''key1'';', 'serial_consistency_level': 'SERIAL', 'user_timestamp': '1476755640338012'} | bd20ff20-94d5-11e6-925a-000000000003 | 127.0.0.1 | {'my_ks.my_cf1'} | anonymous

(1 rows)

Gotcha!!!
Now let’s check what actually had happen there. Let’s see the corresponding records in system_traces.sessions and system_traces.events tables:

cqlsh> SELECT * from system_traces.sessions where session_id=bd20ff20-94d5-11e6-925a-000000000003;

 session_id                           | client    | command | coordinator | duration | parameters                                                                                                                                                                                | request            | started_at
--------------------------------------+-----------+---------+-------------+----------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+--------------------+--------------------------
 bd20ff20-94d5-11e6-925a-000000000003 | 127.0.0.1 |   QUERY | 50.19.22.61 |   211827 | {'consistency_level': 'QUORUM', 'page_size': '100', 'query': 'SELECT * FROM my_ks.my_cf1 where c1=''key1'';', 'serial_consistency_level': 'SERIAL', 'user_timestamp': '1476755640338012'} | Execute CQL3 query | 2016-10-18 01:54:00+0000

(1 rows)
cqlsh> SELECT * from system_traces.events where session_id=bd20ff20-94d5-11e6-925a-000000000003;

 session_id                           | event_id                             | activity                                                                                                                                                                                       | source      | source_elapsed | thread
--------------------------------------+--------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------+----------------+----------
 bd20ff20-94d5-11e6-925a-000000000003 | bd21232e-94d5-11e6-925a-000000000003 |                                                                                                                                                                            Parsing a statement | 50.19.22.61 |              1 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd2126f3-94d5-11e6-925a-000000000003 |                                                                                                                                                                         Processing a statement | 50.19.22.61 |             97 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd212c1a-94d5-11e6-925a-000000000003 | Creating read executor for token 1573573083296714675 with all: {54.90.129.115, 54.87.27.55, 54.67.34.6, 54.153.93.219} targets: {54.90.129.115, 54.87.27.55, 54.67.34.6} repair decision: NONE | 50.19.22.61 |            229 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd212cff-94d5-11e6-925a-000000000003 |                                                                                                                                                 read_digest: sending a message to /54.87.27.55 | 50.19.22.61 |            252 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd2131af-94d5-11e6-925a-000000000003 |                                                                                                                                                  read_digest: sending a message to /54.67.34.6 | 50.19.22.61 |            372 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd21335f-94d5-11e6-925a-000000000003 |                                                                                                                                                 read_data: sending a message to /54.90.129.115 | 50.19.22.61 |            415 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd21759b-94d5-11e6-925a-000000000003 |                                                                                                                                                    read_data: got response from /54.90.129.115 | 50.19.22.61 |           2111 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd217950-94d5-11e6-925a-000000000003 |                                                                                                                                                    read_digest: got response from /54.87.27.55 | 50.19.22.61 |           2206 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd417280-94d5-11e6-925a-000000000003 |                                                                                                                                                     read_digest: got response from /54.67.34.6 | 50.19.22.61 |         211747 | shard 25
 bd20ff20-94d5-11e6-925a-000000000003 | bd41750f-94d5-11e6-925a-000000000003 |                                                                                                                                                           Done processing - preparing a result | 50.19.22.61 |         211812 | shard 25

(10 rows)

If we look at the traces above we may immediately see that one of the targets of this query is a Node 54.67.34.6 from the remote DC. We see then that 50.19.22.61 (our Coordinator from us-east DC) sends a digest request to 54.67.34.6 from us-west and it takes 211375us till the response arrives.

That was clearly a reason why that query was so slow. – Problem found!

A fix for this problem may be choosing a different CL, e.g. LOCAL_QUORUM.

If we run all the steps above but choose a consistency level LOCAL_QUORUM instead of QUORUM we will notice that there wouldn’t be any new entry added to the system_traces.node_slow_log and that’s why:

1243us!!! As you may see – it’s an absolutely different story!.. 😉

Final notes

Unlike other databases, we have decided not to hold back such an important functionality and to provide this in our open source version. Query trace is a basic feature, almost as queries themselves. Additionally, we went all-in and we trace not just the culprit query but all of the intermediate timing traces which help to realize why the query is slow. It’s not enough to have a separate detailed- trace mechanism. In many cases, the same query execution is not going to reproduce the long latency that was caused the query to be recorded as slow in the first place. Therefore we trace both the query and its execution trace.

This, of course, doesn’t come for free.
On AWS cluster with 3 c3.8xlarge Nodes when a Slow Query Logging was enabled with a really huge threshold the throughout in a standard cassandra-stress read test behaved as follows (number of threads was controlled with a `-rate threads=N parameter):

  • for a single thread test dropped 0.5%: from 903.2 to 898.6 ops/sec
  • for 250 threads test dropped 5.09%: from 116314.13 to 110390.63 ops/sec
  • for 500 threads test dropped 17.6%: from 176246.63 to 145297.75 ops/sec

The latency wasn’t influenced at all: 1.1 ms (measured in a single thread test).

Naturally, the more loaded CPU-wise the system is, the heavier is a penalty for extra CPU the Slow Query Logging requires.

So, although the feature is disabled by default, it’s quite safe to enable it on systems where each shard is not more than 50%-60% loaded (use scyllatop "*reactor*load*" to verify that).

node_slow_log table schema

Below is a description of system_traces.node_slow_log schema and its columns.

Columns description

  • start_time and date: time when the query has began
  • node_ip: Address of a Coordinator Node
  • shard: shard ID on a Coordinator, where the query has been handled
  • command: the query command, e.g. select * from my_ks.my_cf
  • duration: the duration of a query handling in microseconds
  • parameters: query parameters like a parameters column in a system_traces.sessions table
  • session_id: the corresponding Tracing session ID
  • source_ip: Address of a Client that sent this query
  • table_names: a list of tables used for this query, where applicable
  • username: a user name used for authentication with this query

How to enable and configure

By default slow query logging is disabled.
There is a REST API that allows configuring and querying the current configuration of this feature.
To set the parameters run:

curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://<Node's address>:10000/storage_service/slow_query?enable=<true|false>&ttl=<in seconds>&threshold=<threshold in microseconds>"

e.g. in order to enable the feature on a Node with an address 127.0.0.1, set the ttl to 86000s and a threshold to 10000us:

curl -X POST --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/slow_query?enable=true&ttl=86000&threshold=10000"

To get the current configuration run:

curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://<Node's address>:10000/storage_service/slow_query"

e.g. after the POST command above the query result will look as below:

$ curl -X GET --header "Content-Type: application/json" --header "Accept: application/json" "http://127.0.0.1:10000/storage_service/slow_query"
{"threshold": 10000, "enable": true, "ttl": 86000}

$

About Vlad Zolotarov

Vlad specializes in networking, mostly L2. He has worked at on projects for Mellanox, the bnx2x Linux device driver for Broadcom, and on the ScaleMP Virtual Device System for network interfaces. Vlad studied at the Israel Institute of Technology (Technion) and holds a B.Sc. in Computer Science.