User use case: Mogujie

By ScyllaDB Team

June 20, 2016

Guest post by By FengLin(MengYe Shen), Mogujie

Mogujie is a large fashion retail site, with more than 3 billion Yuan turnover, serving more than 60 million online users, shopping for clothes, shoes, bags, accessories, cosmetics and beauty products. With that kind of traffic, a resilient, low-latency database is critical.

Mogujie is now running ScyllaDB for the last six months, most of this time on production. Current deployment is six servers, serving 600,000 requests per second with ScyllaDB 1.0.1

In November 2015 we started exploring columnar databases such as Cassandra. Cassandra was our first implementation of a column store database. Our target use case is real-time storage of monitoring data and providing offline query functionality.

Cassandra or HBase?

We decided to use Cassandra for its no single point of failure architecture and high scalability features, while maintaining good performance for mixed read and write workloads. HBase was, at that time, missing those features.

1. Test Cassandra

Our targeted performance TPS is 50k/s.
We configured on-premise five servers to deploy Cassandra. Each server has the following configuration:

24 cores, Intel® Xeon® processor E5-2600 product family
64GB memory
60TB located on magnetic Hard Drives
Gigabit Ethernet

For our test data model we created three tables, with the following settings:

// 1. Create keyspace
create keyspace sentry2 with replication = { 'class': 'SimpleStrategy', 'replication_factor': '2'};

// 2 custom data types.
create type sentry2.scmm (sum double, count int, min double, max double);

// 3. Data Sheet
create table sentry2.datapoints (key text, time timestamp, data FROZEN <sentry2.scmm>, PRIMARY KEY (key, time));

// 4. Index table
create table sentry2.key_index (metric text, tags text, tagsMap map <text, text>, PRIMARY KEY (metric, tags));
create index tagMap_idx on sentry2.key_index (ENTRIES (tagsMap));

Schema: a User Defined Type, describing the following fields (sum double, count int, min double, max double)
Datapoints: by key and timestamp can be found in a particular data point
Key_index: metric corresponding to a plurality of tags;

At the time, we created this data model we were not highly familiar with Cassandra, and we later realized our model is not optimized. However, it was a good model to test Cassandra, with the following results:

Transactions Per Second (TPS) maintained at 30k ~ 40k / s. When increased to a higher rate we experienced dropped mutations
After 2 to 3 days of continuous operation, Cassandra would crash

Based on the above issues, we were ready to increase our Cassandra cluster footprint to handle the additional capacity requirements. We had to purchase additional servers to continue the testing, so we suspended the Cassandra testing.

2. Meet ScyllaDBdb

We met ScyllaDB by chance, while trying to resolve the above issues. One of our team members discovered an open-source C++ implementation of Cassandra, called ScyllaDB. We were excited! as C++ developers ourselves, we are less familiar with Java. We found information about ScyllaDB on a Chinese site, oschina.net and the ScyllaDB website and believed we found the right solution to our issues.

At that time, we had a limited number of adequate servers to meet the requirements. We only had six machines, but after deploying ScyllaDB on those machines our performance SLA requirements were met! ScyllaDB delivered 100k/s TPS. Instantly our resources and efficiency issues were solved!

Thanks to ScyllaDB’s superior performance saved three servers, operational and capital costs. ScyllaDB is highly efficient, so we meet TPS production environment requirements with no problem; However, we still have to measure ScyllaDB under high write workload, when TPS exceeds 40k/s, to discover if and when there will be dropped mutations.
For these testing, we use ScyllaDB version 0.11, with the same server setup mentioned above.

When we concluded our testing with ScyllaDB, we found it meeting our expectations. So, we decided to waive the tests with Cassandra, and move ScyllaDB to our production environment.

Issues and lessons learned

As expected, we did face some challenges while deploying ScyllaDB, which was pre-GA at the time. Helping us to mitigate those issues, we received excellent support from ScyllaDB team: Tzach Livyatan, Asias He, and others. In our initial tests ScyllaDB 0.11 had good performance, but did suffer from problem and glitches. Any problem encountered in the following days was solved in the follow-up releases or earlier.

One of the new key concepts we had to learn is correct data modeling. ScyllaDB and Cassandra have a few concepts that need to be understood. (Stack Overflow Original):

Primary-key: a combination of partition key and a cluster key;
Partition-key: mainly used for the distribution of identification data;
Composite key: When multiple primary key columns when it becomes a key combination;
Clustering-key: ordering columns in the partition;

create table stackoverflow (
      k_part_one text,
      k_part_two int,
      k_clust_one text,
      k_clust_two int,
      k_clust_three uuid,
      data text,
      PRIMARY KEY ((k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)
  );

Primary-key: (k_part_one, k_part_two), k_clust_one, k_clust_two, k_clust_three)
Partition-key: (k_part_one, k_part_two)
Clustering-key: k_clust_one, k_clust_two, k_clust_three

When performing a CQL query, specifying the partition key, is the right way to locate the right node in a timely fashion.

According to ScyllaDB and Cassandra recommendation, no single partition (row) should consume more than 50% of memory, for ScyllaDB it is (total_machine_memory / (number_of_lcores * 2)) Initially, we violated this advice which leads us to extremely wide rows (many GBs). To solve this issue, we re-module the data such that each row will smaller than 10MB.

Lesson Learned

Data Model. Before using ScyllaDB, I highly recommend understanding your data model better. Understanding your data will help you design an efficient system.
Monitoring. I recommend using a ScyllaDB monitoring application, to help you monitor and observe ScyllaDB operations.

Hitting hardware limitations, and solving them

Disk

After the trial with ScyllaDB, we found its performance has exceeded our expectations; but in the process we use, we found that when we increase query rates up to 300K/s, they will yield a timeout. Our initial suspect was the network card, but once we start using netdata monitoring, we discovered that in fact our timeouts were the result of disk IOPS.

When the rate of arrival 300K/s, disk utilization has reached 100%, which leads to ScyllaDB latencies. Look like we had to follow ScyllaDB team’s recommendation and move from spinning disk to SSD. Trying to delay the move to SSD a little longer, we decided to optimize our data modeling, updating the schema from

create table sentry2.datapoints (key text, time timestamp, data FROZEN <sentry2.scmm>, PRIMARY KEY (key, time));

CREATE TABLE sentrydata.datapoints (
 metric bigint,
 tags bigint,
 wholehour bigint,
 data map <int, frozen <tuple <double, bigint, bigint, double, double >>>,
 PRIMARY KEY (metric, tags, wholehour)
);

Inserting 360 data point at a time, allowing 100 million data points per minute. With this update the bottleneck moved from disk to the network card.

AIO Optimization

To use the full power of asynchronous I/O (AIO), ScyllaDB documentation recommends kernel version 3.15 or later. Since we had an earlier kernel version we had to use developer mode. Once we move to a later kernel version, our tests showed increased throughput of 20%. Below are the test details:

Test Platform – Server
- Three server machines
- 40 core
- 64G memory
- 60T mechanical hard drive
- Gigabit Ethernet
Test Platform – Client
- 4 machines
- 24 core
- 64G memory
- 400G ssd
- Gigabit Ethernet
Table structure

CREATE TABLE sentrydata.newdatapoint1 (
 metric bigint,
 tags bigint,
 wholehour bigint,
 data map <int, frozen <tuple <double, bigint, bigint, double, double >>>,
 PRIMARY KEY (metric, tags, wholehour)
)

Results for kernel version 3.10

Data points	Req / min	inserted datapoint	CPU	RAM	Commit Disk utilization	Commit Disk BW	Disk Use Rate	Disk BW	Ingress Network BW	Egress Network BW
1	15M	15M	75%	62.70%	2.60%	45M	99%	37M	510	448
60	1.5M	92M	50%	62.70%	5.10%	95M	94%	20M	840	573
360	288K	102M	30-40%	63%	5.50%	105M	80~98%	15M	915	623
3600	288K	102M	10-30%	62.80%	6.50%	110M	70~95%	14M	925	640

Results for kernel version 3.18

Data points	Req / min	inserted datapoint	CPU	RAM	Commit Disk utilization	Commit Disk BW	Disk Use Rate	Disk BW	Ingress Network BW	Egress Network BW
1	17M	17M	75%	60%	2.33%	59M	99%	44M	610	550
60	1.908M	114.2M	35~40%	60.70%	6.50%	110M	99%	36.5M	930	640
360	357K	128.2M	24%~30%	61%	7.80%	110M	67~90%	21M	934	575
3600	35.8K	128~130M	10%~30%	62.80%	6.50%	110M	50~70%	10~22M	937	480

Summary

Once we had ScyllaDB up and running we took several steps to remove bottlenecks and improve performance:

To bypass spinning disk IO problem, we moved to a data model which inserts data points in batches of 60
To bypass network bottlenecks we increased the bulk size further to 360 and 3600
Moving to latest kernel and enabling ScyllaDB production mode increases performance by 20%

4. future plans

We plan to migrate more and more applications to ScyllaDB. We are now planning

Build data backup and restore architecture
Integrate Spark, for off-line analysis
Use ScyllaDB to store Kafka time index offset, etc.
Create a storage platform (relatively long-term plans)

My team and I are very excited using ScyllaDB and Seastar and have confidence in the team behind these products. We are constantly learning these projects ideas, hoping to use similar methods and concepts to our other products.

Lastly, although ScyllaDB is still growing, it really is a super powerful product. ScyllaDB team: I hope you continue to develop good products. My team and I will continue to follow your contributions.

usecase

Previous Post Next Post

Apache® and Apache Cassandra® are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries. Amazon DynamoDB® and Dynamo Accelerator® are trademarks of Amazon.com, Inc. No endorsements by The Apache Software Foundation or Amazon.com, Inc. are implied by the use of these marks.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog

User use case: Mogujie

Cassandra or HBase?

1. Test Cassandra

2. Meet ScyllaDBdb

Issues and lessons learned

Lesson Learned

Hitting hardware limitations, and solving them

Disk

AIO Optimization

Summary

4. future plans

Start scaling with the world's best high performance NoSQL database.

Why ScyllaDB?

Is ScyllaDB right for me?

ScyllaDB University

Check out the ScyllaDB Blog

User use case: Mogujie

Cassandra or HBase?

1. Test Cassandra

2. Meet ScyllaDBdb

Issues and lessons learned

Lesson Learned

Hitting hardware limitations, and solving them

Disk

AIO Optimization

Summary

4. future plans

Related Posts

Start scaling with the world's best high performance NoSQL database.