What Is Vector Database?
A vector database stores and searches high-dimensional vectors (embeddings) generated by AI models. It enables semantic search — finding content by meaning rather than exact keyword matches. It's the storage layer for RAG applications, recommendation systems, and similarity search.
How Vector Database Works
Traditional databases search by exact matches: WHERE title = 'React'. Vector databases find semantically similar items: 'frontend framework tutorial' would match documents about React, Vue, and Angular even without those exact words.
Popular vector databases: Pinecone (managed), Weaviate (open-source), Qdrant (open-source), Chroma (lightweight), and Cloudflare Vectorize (edge). PostgreSQL with pgvector adds vector search to your existing Postgres database.
Why Developers Use Vector Database
Vector databases power RAG chatbots, semantic search, recommendation engines, image similarity search, and anomaly detection. If you're building any AI application that needs to find relevant content, you need a vector database.
Key Concepts
- Embedding — A numerical vector (array of floats) that represents text, images, or other data in a way that captures meaning
- Cosine Similarity — The primary distance metric for comparing vectors — 1.0 means identical, 0 means unrelated
- Indexing — Algorithms (HNSW, IVF) that enable fast approximate nearest neighbor search across millions of vectors
- Dimensions — The size of each vector — typically 768 or 1536 dimensions for text embeddings
Frequently Asked Questions
Do I need a separate vector database or can I use PostgreSQL?
For prototyping and small datasets, pgvector extension for PostgreSQL works great. For production workloads with millions of vectors, dedicated vector databases (Pinecone, Qdrant) offer better performance and scaling.
How do I choose between vector databases?
If you want managed simplicity, use Pinecone. If you want open-source flexibility, use Weaviate or Qdrant. If you're on Cloudflare, use Vectorize. If you want to keep everything in Postgres, use pgvector.