Full text search is required in many human-facing applications, such as where users need to interact with a datastore to retrieve and insert data based on partial, wildcard information, spell correction and autocompletion. Additional benefits of full text search is the ability to retrieve multiple results sorted by their relevance. Lucene, the common parent to Solr and Elasticsearch The most popular textual search engine in the market is Lucene. It is used by Solr, Elasticsearch, Lucidworks and other text search tools. Lucene is a great search engine. It is extremely fast, stable, and you probably can’t get much better than […]
Benchmarking is no easy task, especially when comparing databases with different “engines” under the hood. You want your benchmark to be fair, to run each database on its optimal setup and hardware, and to keep the comparison as apples-to-apples as possible. (For more on this topic, see our webinar on the “Do’s and Don’ts of Benchmarking Databases.”) We kept this in mind when conducting this Scylla versus Cassandra benchmark, which compares Scylla and Cassandra on AWS EC2, using cassandra-stress as the load generator. Most benchmarks compare different software stacks on the same hardware and try to max out the throughput. […]
KairosDB, a time-series database, provides a simple and reliable tooling to ingest and retrieve chronologically created data, such as sensors’ information or metrics. Scylla provides a large-scale, highly reliable and available backend to store large quantities of time-series data. Together, KairosDB and Scylla provide a highly available time-series solution with an efficiently tailored front-end framework and a backend database with a fast ingestion rate.
A fast in-memory database provides benefits that we all can appreciate such as optimal latency and throughput for workloads. What if you could utilize extremely fast NVMe drives to have similar latency and throughput results? The scope of this blog post is to examine the outcomes of using an in-memory like database combined with NVMe drives to resolve cold-cache and data persistence challenges. In this experiment, various testing scenarios were done with Scylla and Intel® Optane™ SSD DC P4800X drives with a goal of providing a solution with the performance of an in-memory like database without compromises on throughput, latency, […]
In mid-2015, Intel and Micron jointly unveiled a new kind of non-volatile memory storage device named 3D XPoint (pronounced “cross-point”) that is 1000x faster than NAND. Now that 3D XPoint is generally available and has hit the broad market, we can start testing it. 3D XPoint uses electrical resistance and is considered to be bit addressable. It’s also worth mentioning that the endurance is much better with 3D XPoint because the stated wear leveling is 30 full drive writes per day for 5 years. 3D XPoint developers indicate that it is based on changes in resistance of the bulk material. […]
Introduction A highly available time-series solution requires an efficient tailored front-end framework and a backend database with a fast ingestion rate. KairosDB provides a simple and reliable way to ingest and retrieve sensors’ information or metrics, while Scylla provides a highly reliable and performant backend database that scales indefinitely, and can store large quantities of time-series data.
A parallel full table scan is faster! By running a traditional serial full table scan on 475 million partitions (screenshot 1) from one client with a single connection per node, Scylla achieves only 42,110 rows per second. However, by using an efficient, parallel full table scan (screenshot 2), Scylla single client scans 475 million partitions in 510,752 rows per second rate—12x faster!
Background Samsung MSL (Memory Solutions Lab) recently released benchmark results from a YCSB evaluation they conducted. We are thrilled to share Samsung’s results, which reiterate previous benchmark findings that Scylla performs 10X better than Apache Cassandra. If you have high-end hardware, you can expect the same results. On smaller machines, the difference is in the range of 1.5X to 3X. We recommend using larger machines to reduce both your node count and your Total Cost of Ownership.