Embeddings are numerical representations (vectors) that capture the meaning of text, images, or other data. Similar items get similar vectors, enabling semantic search, recommendations, and clustering. They're the foundation of modern AI search and RAG systems.

How Embeddings Works

An embedding model converts 'How to learn React' into a vector like [0.23, -0.45, 0.12, ...] with 768 or 1536 dimensions. 'React tutorial for beginners' gets a very similar vector. 'Best pizza in NYC' gets a very different one. This lets you search by meaning.

Popular embedding models: OpenAI text-embedding-3-small, Cohere embed-v3, Google Gecko, and open-source models like BGE and E5 via Hugging Face. Cloudflare Workers AI provides free embedding generation with bge-base-en-v1.5.

Why Developers Use Embeddings

Embeddings power semantic search (finding relevant docs without exact keyword matches), RAG (retrieving context for LLMs), recommendation systems (similar products/content), and deduplication (finding near-duplicate content).

Key Concepts

  • Semantic Similarity — Embeddings that are close in vector space represent semantically similar concepts
  • Embedding Model — A neural network that converts text/images into fixed-size vectors — choose based on quality vs cost
  • Dimensionality — Vector size (768, 1024, 1536 dimensions) — larger captures more nuance but costs more storage
  • Batch Processing — Generate embeddings for many items at once — most APIs support batches of 100+ inputs

Frequently Asked Questions

What embedding model should I use?

OpenAI text-embedding-3-small is the best balance of quality and cost. For free/open-source, use BGE-base or E5-large via Hugging Face. For edge deployment, Cloudflare Workers AI offers free bge-base-en-v1.5.

How much do embeddings cost?

OpenAI charges $0.02 per million tokens for text-embedding-3-small. Embedding 10,000 documents costs about $0.10. Open-source models on your own GPU are free.