Jun20

User use case: Mogujie

Guest post by By FengLin(MengYe Shen), Mogujie

Mogujie is a large fashion retail site, with more than 3 billion Yuan turnover, serving more than 60 million online users, shopping for clothes, shoes, bags, accessories, cosmetics and beauty products. With that kind of traffic, a resilient, low-latency database is critical.

Mogujie is now running ScyllaDB for the last six months, most of this time on production. Current deployment is six servers, serving 600,000 requests per second with Scylla 1.0.1

Mogujie has more than 60 million users

In November 2015 we started exploring columnar databases such as Cassandra. Cassandra was our first implementation of a column store database. Our target use case is real-time storage of monitoring data and providing offline query functionality.

Cassandra or HBase?

We decided to use Cassandra for its no single point of failure architecture and high scalability features, while maintaining good performance for mixed read and write workloads. HBase was, at that time, missing those features.

1. Test Cassandra

Our targeted performance TPS is 50k/s.
We configured on-premise five servers to deploy Cassandra. Each server has the following configuration:

  • 24 cores, Intel® Xeon® processor E5-2600 product family
  • 64GB memory
  • 60TB located on magnetic Hard Drives
  • Gigabit Ethernet

For our test data model we created three tables, with the following settings:

// 1. Create keyspace
create keyspace sentry2 with replication = { 'class': 'SimpleStrategy', 'replication_factor': '2'};

// 2 custom data types.
create type sentry2.scmm (sum double, count int, min double, max double);

// 3. Data Sheet
create table sentry2.datapoints (key text, time timestamp, data FROZEN <sentry2.scmm>, PRIMARY KEY (key, time));

// 4. Index table
create table sentry2.key_index (metric text, tags text, tagsMap map <text, text>, PRIMARY KEY (metric, tags));
create index tagMap_idx on sentry2.key_index (ENTRIES (tagsMap));
  • Schema: a User Defined Type, describing the following fields (sum double, count int, min double, max double)
  • Datapoints: by key and timestamp can be found in a particular data point
  • Key_index: metric corresponding to a plurality of tags;

At the time, we created this data model we were not highly familiar with Cassandra, and we later realized our model is not optimized. However, it was a good model to test Cassandra, with the following results:

  1. Transactions Per Second (TPS) maintained at 30k ~ 40k / s. When increased to a higher rate we experienced dropped mutations
  2. After 2 to 3 days of continuous operation, Cassandra would crash

Based on the above issues, we were ready to increase our Cassandra cluster footprint to handle the additional capacity requirements. We had to purchase additional servers to continue the testing, so we suspended the Cassandra testing.

2. Meet Scylladb

We met ScyllaDB by chance, while trying to resolve the above issues. One of our team members discovered an open-source C++ implementation of Cassandra, called Scylla. We were excited! as C++ developers ourselves, we are less familiar with Java. We found information about Scylla on a Chinese site, oschina.net and the ScyllaDB website and believed we found the right solution to our issues.

At that time, we had a limited number of adequate servers to meet the requirements. We only had six machines, but after deploying Scylla on those machines our performance SLA requirements were met! Scylla delivered 100k/s TPS. Instantly our resources and efficiency issues were solved!

Thanks to Scylla’s superior performance saved three servers, operational and capital costs. Scylla is highly efficient, so we meet TPS production environment requirements with no problem; However, we still have to measure Scylla under high write workload, when TPS exceeds 40k/s, to discover if and when there will be dropped mutations.
For these testing, we use Scylla version 0.11, with the same server setup mentioned above.

When we concluded our testing with Scylla, we found it meeting our expectations. So, we decided to waive the tests with Cassandra, and move Scylla to our production environment.

Issues and lessons learned

As expected, we did face some challenges while deploying Scylla, which was pre-GA at the time. Helping us to mitigate those issues, we received excellent support from Scylla team: Tzach Livyatan, Asias He, and others. In our initial tests Scylla 0.11 had good performance, but did suffer from problem and glitches. Any problem encountered in the following days was solved in the follow-up releases or earlier.

One of the new key concepts we had to learn is correct data modeling. Scylla and Cassandra have a few concepts that need to be understood. (Stack Overflow Original):

  • Primary-key: a combination of partition key and a cluster key;
  • Partition-key: mainly used for the distribution of identification data;
  • Composite key: When multiple primary key columns when it becomes a key combination;
  • Clustering-key: ordering columns in the partition;
create table stackoverflow (
      k_part_one text,
      k_part_two int,
      k_clust_one text,
      k_clust_two int,
      k_clust_three uuid,
      data text,
      PRIMARY KEY ((k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)
  );
  • Primary-key: (k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)
  • Partition-key: (k_part_one, k_part_two)
  • Clustering-key: k_clust_one, k_clust_two, k_clust_three

When performing a CQL query, specifying the partition key, is the right way to locate the right node in a timely fashion.

According to Scylla and Cassandra recommendation, no single partition (row) should consume more than 50% of memory, for Scylla it is (total_machine_memory / (number_of_lcores * 2)) Initially, we violated this advice which leads us to extremely wide rows (many GBs). To solve this issue, we re-module the data such that each row will smaller than 10MB.

Lesson Learned

  • Data Model. Before using Scylla, I highly recommend understanding your data model better. Understanding your data will help you design an efficient system.
  • Monitoring. I recommend using a Scylla monitoring application, to help you monitor and observe Scylla operations.

Hitting hardware limitations, and solving them

Disk

After the trial with Scylla, we found its performance has exceeded our expectations; but in the process we use, we found that when we increase query rates up to 300K/s, they will yield a timeout. Our initial suspect was the network card, but once we start using netdata monitoring, we discovered that in fact our timeouts were the result of disk IOPS.

When the rate of arrival 300K/s, disk utilization has reached 100%, which leads to Scylla latencies. Look like we had to follow ScyllaDB team’s recommendation and move from spinning disk to SSD. Trying to delay the move to SSD a little longer, we decided to optimize our data modeling, updating the schema from

create table sentry2.datapoints (key text, time timestamp, data FROZEN <sentry2.scmm>, PRIMARY KEY (key, time));

To

CREATE TABLE sentrydata.datapoints (
 metric bigint,
 tags bigint,
 wholehour bigint,
 data map <int, frozen <tuple <double, bigint, bigint, double, double >>>,
 PRIMARY KEY (metric, tags, wholehour)
);

Inserting 360 data point at a time, allowing 100 million data points per minute. With this update the bottleneck moved from disk to the network card.

AIO Optimization

To use the full power of asynchronous I/O (AIO), Scylla documentation recommends kernel version 3.15 or later. Since we had an earlier kernel version we had to use developer mode. Once we move to a later kernel version, our tests showed increased throughput of 20%. Below are the test details:

  • Test Platform – Server
    • Three server machines
    • 40 core
    • 64G memory
    • 60T mechanical hard drive
    • Gigabit Ethernet
  • Test Platform – Client
    • 4 machines
    • 24 core
    • 64G memory
    • 400G ssd
    • Gigabit Ethernet
  • Table structure
CREATE TABLE sentrydata.newdatapoint1 (
 metric bigint,
 tags bigint,
 wholehour bigint,
 data map <int, frozen <tuple <double, bigint, bigint, double, double >>>,
 PRIMARY KEY (metric, tags, wholehour)
)
  • Results for kernel version 3.10
Data points Req / min inserted datapoint CPU RAM Commit Disk utilization Commit Disk BW Disk Use Rate Disk BW Ingress Network BW Egress Network BW
1 15M 15M 75% 62.70% 2.60% 45M 99% 37M 510 448
60 1.5M 92M 50% 62.70% 5.10% 95M 94% 20M 840 573
360 288K 102M 30-40% 63% 5.50% 105M 80~98% 15M 915 623
3600 288K 102M 10-30% 62.80% 6.50% 110M 70~95% 14M 925 640
  • Results for kernel version 3.18
Data points Req / min inserted datapoint CPU RAM Commit Disk utilization Commit Disk BW Disk Use Rate Disk BW Ingress Network BW Egress Network BW
1 17M 17M 75% 60% 2.33% 59M 99% 44M 610 550
60 1.908M 114.2M 35~40% 60.70% 6.50% 110M 99% 36.5M 930 640
360 357K 128.2M 24%~30% 61% 7.80% 110M 67~90% 21M 934 575
3600 35.8K 128~130M 10%~30% 62.80% 6.50% 110M 50~70% 10~22M 937 480

Summary

Once we had Scylla up and running we took several steps to remove bottlenecks and improve performance:

  • To bypass spinning disk IO problem, we moved to a data model which inserts data points in batches of 60
  • To bypass network bottlenecks we increased the bulk size further to 360 and 3600
  • Moving to latest kernel and enabling Scylla production mode increases performance by 20%

4. future plans

We plan to migrate more and more applications to Scylla. We are now planning

  • Build data backup and restore architecture
  • Integrate Spark, for off-line analysis
  • Use Scylla to store Kafka time index offset, etc.
  • Create a storage platform (relatively long-term plans)

My team and I are very excited using Scylla and Seastar and have confidence in the team behind these products. We are constantly learning these projects ideas, hoping to use similar methods and concepts to our other products.

Lastly, although Scylla is still growing, it really is a super powerful product. ScyllaDB team: I hope you continue to develop good products. My team and I will continue to follow your contributions.

Scylla TeamAbout Scylla Team

Scylla is the world’s fastest column-store database: the functionality of Apache Cassandra with the speed of a light key/value store.

Tags: usecase