Join us on April 14th for free instructor-led training at ScyllaDB University LIVE | Register now

Extreme Scale. Extreme Performance.​

Purpose-built for high performance at scale

Precision recall

Tuned ANN indexes maintain recall (e.g. 98%) across billions of vectors with high dimensionality (e.g., 1536+ dimensions).

Real-time index rebuilds

Online rebuilds keep queries serving while you refresh your embeddings — without downtime or performance degradation.

Real-time search

New embeddings become searchable in real time without traditional ML pipeline lag or batch processing delays.

Real-time upserts

They’re handled via incremental index maintenance on dedicated vector nodes with sub millisecond write performance.

Consistent P99 latency

Vector and operational workloads run on dedicated nodes so neither can impact the other’s latency, even during spikes.

Predictable scaling

Independently scale compute for search vs. storage for data, ensuring you only pay for the resources you actually need.

Supporting real world vector search

Target Recall

+95%

Accurate search results

Throughput

+350 QPS

More queries per vCPU

P99 Latency

< 15ms

Consistent tail latency

Key performance comparisons

Platform Workload Type Throughput (QPS) P99 Latency
MongoDB 15.3M / 2048D / k=10
20
10ms
Pinecone 10M / 768D / k=10
75
30ms
OpenSearch 10M / 768D / k=100
60
6.8ms
ScyllaDB 10M / 768D / k=10
+350
7.6ms
Platform Cost per Hour $/QPS/Hour
MongoDB
$3.54
$0.177
Pinecone
$3.96
$0.053
OpenSearch
$1.85
$0.030
ScyllaDB
$1.16
$0.003

Architectures for real-time AI

The secret to our efficiency is total isolation. By decoupling the vector indexing engine from the main database storage engine, we eliminate the “noisy neighbor” effect that plagues integrated vector solutions

Customer success with real-time AI

Serving 5 Million Features per Second

Tripadvisor uses ScyllaDB on AWS to power real-time ML personalization. At peak, they handle ~500K ops/sec with P99 latencies of 1-3 ms. Their feature store serves up to 5 million static features/sec and 0.5 million user features/sec.

Read More

Facial Recognition for Driver Safety

Nauto applies AI to camera and sensor data for fleet safety. ScyllaDB provides the fast, unified data layer needed for on-the-fly facial recognition and driver behavior analysis. ScyllaDB replaced a fragmented stack of Redis, Elasticsearch, Kafka, and Postgres.

Read More

Foundations of a Feature Store

Medium built a fast, scalable feature store on ScyllaDB to drive its content recommendations. ScyllaDB powers the “lists” data layer in their ML infrastructure, enabling rapid retrieval of personalized story lists and features for users.

Read More

Predictive Analytics of Equipment

TRACTIAN uses ScyllaDB to handle continuous streams of time-series sensor data for real-time ML in industrial IoT. After replacing MongoDB, they achieved 10x better throughput and latency, enabling faster predictive maintenance analytics across their customer base.

Read More

Hear from the Experts

Explore tradeoffs and strategies related to real-time AI at scale – including high-volume feature ingestion, fast retrieval, and low-latency vector search.

Felipe Cardeneti Mendes

Felipe Cardeneti Mendes

Technical Director
Gui Nogueira

Gui Nogueira

Technical Director

Ready to Get Started?

Frequently Asked Questions

ScyllaDB Vector Search enables real-time similarity search at scale. It allows you to store high-dimensional embeddings alongside operational data and query them with millisecond latency within a single database, eliminating the need for a standalone vector database.

No. ScyllaDB integrates vector search directly into the database. This eliminates data duplication, synchronization pipelines, extra infrastructure, and operational complexity.

Common workloads include:

  • Retrieval-augmented generation (RAG)
  • Real-time recommendations and personalization
  • Semantic search and feature stores
  • Fraud detection and anomaly detection
  • Industrial IoT predictive analytics

Learn more: https://www.scylladb.com/scale-real-time-ai/

Yes. You can store embeddings alongside user data, product data, and features. This allows you to retrieve the vector and its associated metadata in a single query, significantly simplifying application logic.

Learn more: https://docs.scylladb.com/manual/stable/features/vector-search.html

ScyllaDB is designed for extreme-scale AI workloads, supporting 1B+ vectors. Benchmarks show P99 latency as low as ~1.7 ms and throughput up to ~252,000 queries per second.

Benchmark: https://www.scylladb.com/2025/12/01/scylladb-vector-search-1b-benchmark/

In high-demand workloads (10M vectors, 768D), ScyllaDB offers up to 14x higher efficiency.

  • ScyllaDB: 350+ QPS | 7.6ms P99 | $0.003 Cost/QPS/Hour
  • Pinecone: 75 QPS | 30ms P99 | $0.053 Cost/QPS/Hour
  • MongoDB: 20 QPS | 10ms P99 | $0.177 Cost/QPS/Hour
  • OpenSearch: 60 QPS | 6.8ms P99 | $0.030 Cost/QPS/Hour

ScyllaDB supports high-dimensional embeddings up to 16,000 dimensions. It uses tuned ANN (Approximate Nearest Neighbor) indexes to maintain high recall (95%+) even at billion-scale.

Docs: https://docs.scylladb.com/manual/stable/features/vector-search.html

The secret is total isolation. By decoupling the vector indexing engine from the main database storage engine on dedicated vector nodes, ScyllaDB eliminates the “noisy neighbor” effect. This ensures that heavy vector workloads do not impact standard operational database performance.

Trending Resources for ScyllaDB Real-Time AI