When I was building a RAG-based knowledge system for an enterprise client last year, I had to pick a vector database. The internet was full of benchmark comparisons that told me nothing useful. Every database was “the fastest” on their own benchmark. None of them talked about the things that actually matter when you’re running a production system.
So I tested five of them myself, with my actual data and my actual queries, and made a decision based on what I learned. Here’s the honest breakdown.
Why You Need a Vector Database
If you’re building anything with AI that needs to retrieve relevant information — RAG systems, semantic search, recommendation engines, similarity matching — you need a way to store and query vector embeddings efficiently.
You could store vectors in PostgreSQL with pgvector, and for small datasets (under 100K vectors), that’s often the right call. But once you need sub-100ms queries on millions of vectors with filtering, you need a purpose-built solution.
The five databases I evaluated: Qdrant, Pinecone, Weaviate, Milvus, and Chroma.
What I Actually Tested
I loaded each database with 2 million document embeddings (1536 dimensions, OpenAI text-embedding-3-small) and ran three types of queries:
- Pure similarity search — find the 10 nearest vectors to a query
- Filtered search — find nearest vectors where metadata matches specific criteria (department, date range, document type)
- Batch operations — insert 10K vectors, update metadata on 1K vectors, delete 500 vectors
I measured query latency (p50 and p99), indexing speed, memory usage, and how painful each one was to set up and operate.
The Results
Qdrant — My Top Pick for Most Use Cases
Qdrant stood out in almost every category. Written in Rust, it’s fast and memory-efficient. But what sold me was the developer experience and the filtering capabilities.
Filtering is a first-class feature. Unlike some databases where filtering happens after the similarity search (which can miss relevant results), Qdrant integrates filtering into the search algorithm itself. This matters hugely when your queries are “find similar documents, but only from the engineering department, published after January 2025.”
Self-hosting is straightforward. A single Docker container gets you running. Scaling to a cluster is well-documented. For clients with data residency requirements in the UAE, this is non-negotiable.
from qdrant_client import QdrantClient
from qdrant_client.models import Distance, VectorParams, PointStruct
client = QdrantClient(host="localhost", port=6333)
# Create collection
client.create_collection(
collection_name="documents",
vectors_config=VectorParams(size=1536, distance=Distance.COSINE),
)
# Search with filtering
results = client.query_points(
collection_name="documents",
query=query_vector,
query_filter={
"must": [
{"key": "department", "match": {"value": "engineering"}},
{"key": "date", "range": {"gte": "2025-01-01"}},
]
},
limit=10,
)
Downsides: The managed cloud offering is newer than Pinecone’s, and the community is smaller.
Pinecone — Best for Teams Who Don’t Want to Manage Infrastructure
Pinecone is fully managed — there’s nothing to deploy, patch, or scale yourself. If your team doesn’t have DevOps capacity and you just want vectors to work, Pinecone delivers.
Setup is trivial. API key, SDK install, and you’re querying in 5 minutes.
The serverless tier is cost-effective for development and low-traffic production use.
from pinecone import Pinecone
pc = Pinecone(api_key="your-key")
index = pc.Index("documents")
# Upsert vectors
index.upsert(vectors=[
{"id": "doc1", "values": embedding, "metadata": {"department": "engineering"}},
])
# Query with filter
results = index.query(
vector=query_vector,
top_k=10,
filter={"department": {"$eq": "engineering"}},
)
Downsides: No self-hosting option. You’re locked into their cloud. Costs can escalate with high query volumes. The metadata filtering, while improved, still has limitations compared to Qdrant’s payload indexing.
Weaviate — Best for Multi-Modal and GraphQL Fans
Weaviate has an opinionated architecture built around “objects” rather than raw vectors. It supports hybrid search (combining vector and keyword search) out of the box, which is excellent for RAG systems where pure semantic search sometimes misses exact keyword matches.
Built-in vectorization modules let you send text directly and Weaviate handles embedding generation. This simplifies the pipeline if you’re comfortable with the coupling.
GraphQL API is either a strength or a weakness depending on your team. If you like GraphQL, Weaviate’s query interface is powerful and expressive. If you prefer simple REST or gRPC, it adds complexity.
Downsides: Higher memory footprint than Qdrant. The object-oriented model can feel heavyweight when you just want to store and query vectors. Documentation, while comprehensive, can be hard to navigate.
Milvus — Best for Very Large Scale
Milvus is designed for billion-scale vector search. If you’re indexing hundreds of millions to billions of vectors, Milvus handles it with its distributed architecture.
For my 2 million vector test, it was overkill. Setup was more complex (ZooKeeper, etcd, MinIO dependencies), and the operational overhead is higher than Qdrant or Pinecone.
Use Milvus when your scale truly demands it. For most production RAG systems (under 50M vectors), simpler options serve you better.
Chroma — Best for Prototyping
Chroma is the SQLite of vector databases. It runs in-process, needs no separate server, and gets you from zero to working prototype in minutes.
I use Chroma during development and for proof-of-concept demos. When the project moves to production, I migrate to Qdrant or Pinecone. Chroma’s simplicity is its strength and its limitation — it doesn’t scale for production workloads.
What Actually Matters
After running these tests, here are the criteria I’d tell anyone to evaluate on — forget the synthetic benchmarks.
1. Filtering Performance
If your application needs filtered queries (and most production apps do), this is the most important criterion. Test with your actual filter patterns and measure latency, not just throughput.
2. Deployment Model
Can you self-host? Do you need to for compliance? How much operational overhead can your team handle? A managed service that costs 3x more but needs zero maintenance might be cheaper when you factor in engineer time.
3. Cost at Your Scale
Calculate costs at your expected query volume, not the vendor’s example. Include storage, compute, network transfer, and any per-query charges. Many teams are surprised when their $50/month prototype becomes a $2,000/month production bill.
4. Developer Experience
How long does it take a new team member to write their first query? Are the docs clear? Are error messages helpful? SDK quality varies dramatically between these tools.
5. Update and Delete Performance
Benchmarks focus on inserts and queries. In production, you’ll update metadata, delete expired documents, and re-index frequently. Test these operations too.
My Recommendation
Default choice: Qdrant. Self-hostable, fast, excellent filtering, great developer experience, reasonable operational overhead. It handles the 80% case well.
If you can’t self-host: Pinecone. Fully managed, reliable, and the serverless pricing makes it cost-effective for moderate scale.
If you need hybrid search: Weaviate. The built-in keyword + vector search combination is genuinely useful for RAG.
If you have billions of vectors: Milvus. Nothing else handles that scale as well.
If you’re prototyping: Chroma. Get your proof of concept working, then migrate.
The best vector database is the one that fits your constraints — scale, compliance, team skills, and budget. Don’t over-index on benchmarks. Test with your data, your queries, and your operational reality.
If you’re building a RAG system or semantic search and want help choosing and implementing the right vector database, reach out. I’ve shipped these systems in production and can save you weeks of trial and error.
Table of Contents
- Why You Need a Vector Database
- What I Actually Tested
- The Results
- Qdrant — My Top Pick for Most Use Cases
- Pinecone — Best for Teams Who Don’t Want to Manage Infrastructure
- Weaviate — Best for Multi-Modal and GraphQL Fans
- Milvus — Best for Very Large Scale
- Chroma — Best for Prototyping
- What Actually Matters
- 1. Filtering Performance
- 2. Deployment Model
- 3. Cost at Your Scale
- 4. Developer Experience
- 5. Update and Delete Performance
- My Recommendation
On this page
- Why You Need a Vector Database
- What I Actually Tested
- The Results
- Qdrant — My Top Pick for Most Use Cases
- Pinecone — Best for Teams Who Don’t Want to Manage Infrastructure
- Weaviate — Best for Multi-Modal and GraphQL Fans
- Milvus — Best for Very Large Scale
- Chroma — Best for Prototyping
- What Actually Matters
- 1. Filtering Performance
- 2. Deployment Model
- 3. Cost at Your Scale
- 4. Developer Experience
- 5. Update and Delete Performance
- My Recommendation