Home » Vector Search and Embeddings » What Are Embeddings

What Are Vector Embeddings and How They Work

A vector embedding is a fixed-length array of numbers that represents the semantic meaning of a piece of text. Embedding models convert words, sentences, or documents into points in a high-dimensional space where semantically similar text is close together and dissimilar text is far apart. This numerical representation is what enables AI applications to search by meaning rather than by keyword matching.

From Text to Numbers

Computers cannot natively compare the meaning of two sentences. They can compare strings character by character, but "fix broken login" and "troubleshoot authentication failure" share no characters beyond common English letters and yet mean nearly the same thing. Embeddings solve this by converting both sentences into arrays of floating-point numbers (vectors) where the distance between the two arrays is small because their meanings are similar.

An embedding model takes a string of text as input and outputs a vector of a fixed number of dimensions. OpenAI's text-embedding-3-small produces vectors of 1,536 numbers. Cohere's embed-v4 produces 1,024 numbers. An open-source model like BGE-large produces 1,024 numbers. Each number represents some learned aspect of the text's meaning, though individual numbers are not directly interpretable by humans. What matters is the relationships between vectors: texts with similar meaning produce vectors that are close together in the high-dimensional space, while texts with different meanings produce vectors that are far apart.

How Embedding Models Learn

Embedding models are neural networks trained on massive datasets of text pairs that are known to be related. During training, the model sees millions of examples: a question paired with its answer, a sentence paired with its paraphrase, a search query paired with the document it should find. For each pair, the model adjusts its internal weights to produce vectors that are close together for related text and far apart for unrelated text. This training process is called contrastive learning.

After training on hundreds of millions of text pairs, the model generalizes. It can produce meaningful embeddings for text it has never seen because it has learned the underlying patterns of how concepts relate to each other. The model does not memorize that "database connection pooling" and "DB pool configuration" are related. Instead, it learns general patterns about how words combine to express meaning, so it can recognize novel paraphrases, synonyms, and conceptual similarities.

The quality of training data directly determines the quality of embeddings. Models trained on diverse, high-quality text pairs (academic papers, technical documentation, question-answer pairs, web text) produce better general-purpose embeddings. Models trained on domain-specific data (legal contracts, medical records, source code) produce better embeddings for that domain but may perform worse on general text. This is why Voyage's code-specific model outperforms general-purpose models on code retrieval but underperforms them on general knowledge retrieval.

Dimensionality: What the Numbers Mean

The number of dimensions in an embedding vector determines how much information it can encode. A 2-dimensional embedding could distinguish between "positive" and "negative" sentiment along one axis and "formal" and "informal" tone along another, but that is all. A 1,536-dimensional embedding can capture thousands of subtle semantic distinctions: topic, intent, domain, formality, specificity, sentiment, entity references, and relationships between concepts.

More dimensions generally means more expressive power, but with diminishing returns. Benchmarks show that moving from 384 to 768 dimensions gives a significant quality improvement. Moving from 768 to 1,536 gives a smaller but still meaningful improvement. Moving from 1,536 to 3,072 gives a marginal improvement on most tasks. The cost of more dimensions is storage (each vector takes proportionally more space) and compute (distance calculations take proportionally longer).

Matryoshka representation learning is a technique where the model is trained so that the first N dimensions of a larger vector are themselves a useful embedding. OpenAI's text-embedding-3 models support this: you can request 3,072 dimensions for maximum quality, or truncate to 1,536 or 768 dimensions when storage is more constrained. The truncated vector is less expressive but still meaningful, because the model was trained to front-load the most important information into the earlier dimensions.

What Embeddings Capture (and What They Miss)

Embeddings capture semantic similarity, topical relevance, conceptual relationships, and paraphrase equivalence. They are excellent at understanding that "ways to speed up the API" and "API performance optimization techniques" are about the same topic. They handle vocabulary mismatch, informal language, and partial descriptions well because they encode meaning rather than words.

Embeddings struggle with several categories. Exact identifiers (error codes, version numbers, UUIDs) are treated as opaque tokens with weak semantic signal. Negation is poorly captured: "services that use Redis" and "services that do not use Redis" produce nearly identical embeddings. Temporal references are weak: "last week" and "last year" look similar in embedding space. Quantitative differences are compressed: "costs $5" and "costs $500" may produce similar vectors because the model focuses on the topic (cost) rather than the specific number.

These limitations explain why vector search alone is not sufficient for production retrieval. Hybrid search (adding keyword matching) addresses the exact-identifier problem. Knowledge graph traversal addresses the relationship-following problem. Metadata filtering addresses the temporal and quantitative problems. A complete retrieval system uses embeddings as one signal among several, not as the sole retrieval mechanism.

Embeddings in Practice

Using an embedding model in production involves two operations: indexing and querying. During indexing, you pass each document through the model and store the resulting vector in a vector database. During querying, you pass the search query through the same model and find the stored vectors closest to the query vector. Using the same model for both is essential because different models produce incompatible vector spaces. A vector from OpenAI's model cannot be meaningfully compared to a vector from Cohere's model.

from anthropic import Anthropic
import numpy as np

# Generating embeddings (example using a generic API pattern)
def embed_text(text: str, model: str = "text-embedding-3-small") -> list:
    # Your embedding API call here
    # Returns a list of floats
    pass

# Comparing two texts by embedding similarity
text_a = "how to configure database connection pooling"
text_b = "setting up DB pool management and tuning"
text_c = "the weather is sunny today"

vec_a = np.array(embed_text(text_a))
vec_b = np.array(embed_text(text_b))
vec_c = np.array(embed_text(text_c))

# Cosine similarity
sim_ab = np.dot(vec_a, vec_b) / (np.linalg.norm(vec_a) * np.linalg.norm(vec_b))
sim_ac = np.dot(vec_a, vec_c) / (np.linalg.norm(vec_a) * np.linalg.norm(vec_c))

# sim_ab will be high (~0.85) because same topic
# sim_ac will be low (~0.15) because unrelated topics

Adaptive Recall uses embeddings as one of four retrieval signals. When you store a memory, it is embedded for vector similarity search. When you recall, the query is embedded and compared to stored vectors. But the vector similarity score is combined with cognitive activation (based on recency and access frequency), knowledge graph spreading activation (based on entity connections), and confidence scoring (based on corroboration history) to produce a final ranking that is more accurate than any single signal alone.

Vector embeddings are just the beginning. Adaptive Recall adds cognitive scoring, graph traversal, and confidence weighting to find what similarity alone misses.

Get Started Free