Full text search is required in many human-facing applications, such as where users need to interact with a datastore to retrieve and insert data based on partial, wildcard information, spell correction and autocompletion. Additional benefits of full text search is the ability to retrieve multiple results sorted by their relevance. Lucene, the common parent to Solr and Elasticsearch The most popular textual search engine in the market is Lucene. It is used by Solr, Elasticsearch, Lucidworks and other text search tools. Lucene is a great search engine. It is extremely fast, stable, and you probably can’t get much better than […]
In part 2 of our Scylla and Spark series, we will delve more deeply into the way data transformations are executed by Spark, and then move on to the higher-level SQL and DataFrame interfaces.
This article will shed light on the performance penalties of running Scylla on Docker and discuss the tuning steps to solve them.
Welcome to part 1 of an in-depth series of posts revolving around the integration of Spark and Scylla. In this series, we will delve into many aspects of a Scylla/Spark solution: from the architectures and data models of the two products, through strategies to transfer data between them and up to optimization techniques and operational best practices. The series will include many code samples which you are encouraged to run locally, modify and tinker with. The Github repo contains the docker-compose.yaml file which you can use to easily run everything locally. In this post, we will introduce the main stars […]
During our recent webinar, ‘Analytics Showtime: Spark Powered by Scylla’ (now available on-demand), there were several questions that we found worthy of additional discussion.
KairosDB, a time-series database, provides a simple and reliable tooling to ingest and retrieve chronologically created data, such as sensors’ information or metrics. Scylla provides a large-scale, highly reliable and available backend to store large quantities of time-series data. Together, KairosDB and Scylla provide a highly available time-series solution with an efficiently tailored front-end framework and a backend database with a fast ingestion rate.
Organizations are continuing to adopt Solid State Drives (SSD) in their data centers for optimal performance and lower latencies. With that in mind, it only makes sense to use them with a database solution like Scylla to get the most bang for your buck. One of the popular SSD’s that organizations are adopting now in their data centers is the Samsung Z-SSD drive. In this post, we will go over the Z-SSD and see how Scylla users can benefit from the drives.
What fascinates me most about databases is how they can be used for storing time series data. This use case is important for Internet of Things (IoT) devices and data analytics. Everyone should at least be able to relate to a time series database use case as many are likely to have devices in their home collecting and sending data such as a smart thermostat or phone or wrist device gathering your fitness activity. Wouldn’t it be nice to know how the infrastructure works behind the scenes and how to create a time series database? In this post, I will […]
When creating applications that communicate with a database such as Scylla, it is crucial that the programming language being used has support for database connectivity Go, for example, was introduced in 2009 but it took time for the community to add packages and drivers for database communication. Luckily In 2014, GoCQL was created to address developers needs.
Scylla is now available on the Oracle Cloud Marketplace. The ScyllaDB team has completed testing and benchmarking of the bare metal instances available at Oracle Cloud Infrastructure (OCI). Scylla takes advantage of the excellent resources available on OCI bare metal servers: high CPU count, ample amount of DRAM, and fast and large NVMe drives. In our testing, we looked for ease of installation, throughput, and latency performance.