Home » Vector Search and Embeddings » Set Up pgvector

How to Set Up pgvector for Vector Search

pgvector adds vector storage and similarity search to PostgreSQL, letting you run vector queries alongside your existing relational data without a separate database. This guide covers installation, schema design, HNSW indexing, similarity queries, and production tuning. If you already run PostgreSQL, pgvector is the fastest path to working vector search.

Why pgvector

pgvector eliminates the need for a separate vector database in your infrastructure. Your application data, metadata, and vectors all live in one database, which means you can join vector search results with relational data in a single query. There is no synchronization between systems, no additional authentication layer, and no separate backup strategy. For applications under 10 million vectors, pgvector handles the workload on a single PostgreSQL instance with HNSW indexing.

The trade-off is that pgvector is a single-node solution. It scales vertically (bigger machine, more RAM) but not horizontally (multiple machines). Dedicated vector databases like Pinecone, Qdrant, and Weaviate offer distributed architectures that scale to billions of vectors. If your application needs that scale, pgvector is not the right choice. For the majority of applications, which operate in the hundreds of thousands to low millions of vectors, pgvector is simpler, cheaper, and fast enough.

Step-by-Step Setup

Step 1: Install the pgvector extension.
pgvector is available as a package for most PostgreSQL distributions. On managed PostgreSQL services (AWS RDS, Google Cloud SQL, Azure Database, Supabase, Neon), pgvector is pre-installed and you just need to enable it. On self-managed PostgreSQL, install the package and then enable the extension.
# Ubuntu/Debian (PostgreSQL 16) sudo apt install postgresql-16-pgvector # Amazon Linux / RHEL sudo yum install pgvector_16 # macOS with Homebrew brew install pgvector # Or compile from source git clone https://github.com/pgvector/pgvector.git cd pgvector make sudo make install
-- Enable in your database CREATE EXTENSION IF NOT EXISTS vector; -- Verify installation SELECT extversion FROM pg_extension WHERE extname = 'vector';
Step 2: Create a table with a vector column.
Add a vector column to your documents table. The dimension must match your embedding model's output size. For OpenAI text-embedding-3-small, use 1536. For text-embedding-3-large, use 3072 (or a truncated size like 1536 if using Matryoshka). For Cohere embed-v4, use 1024.
CREATE TABLE documents ( id BIGSERIAL PRIMARY KEY, content TEXT NOT NULL, metadata JSONB DEFAULT '{}', embedding vector(1536), created_at TIMESTAMPTZ DEFAULT now() ); -- If adding to an existing table ALTER TABLE documents ADD COLUMN embedding vector(1536);
Step 3: Insert embeddings.
Generate embeddings from your text using your embedding model's API, then insert them as vector literals. pgvector accepts vectors as bracket-enclosed comma-separated values.
import psycopg2 from anthropic import Anthropic import json # Generate embedding (using your preferred embedding API) def get_embedding(text: str) -> list: # Replace with your embedding model call # Returns a list of floats pass conn = psycopg2.connect("dbname=myapp") cur = conn.cursor() # Insert a document with its embedding text = "PostgreSQL connection pooling with PgBouncer..." embedding = get_embedding(text) cur.execute( "INSERT INTO documents (content, embedding) VALUES (%s, %s)", (text, json.dumps(embedding)) ) conn.commit() # Batch insert for efficiency from psycopg2.extras import execute_values data = [(doc["content"], json.dumps(get_embedding(doc["content"]))) for doc in documents] execute_values( cur, "INSERT INTO documents (content, embedding) VALUES %s", data, template="(%s, %s::vector)" )
Step 4: Build an HNSW index.
Without an index, pgvector performs exact nearest neighbor search by scanning every vector. This is accurate but slow for large tables. An HNSW (Hierarchical Navigable Small World) index enables fast approximate nearest neighbor search with 95 to 99% recall at sub-millisecond latency.
-- Create HNSW index for cosine distance CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops); -- For dot product (inner product) distance CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_ip_ops); -- For L2 (Euclidean) distance CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_l2_ops); -- With tuning parameters CREATE INDEX idx_documents_embedding ON documents USING hnsw (embedding vector_cosine_ops) WITH (m = 16, ef_construction = 128);
Index build time: HNSW index construction is CPU-intensive. For 1 million 1536-dimensional vectors, expect 10 to 30 minutes of build time depending on hardware. The index must fit in memory for optimal query performance. Monitor pg_stat_progress_create_index during construction to track progress.
Step 5: Run similarity queries.
pgvector provides distance operators for each metric. The <=> operator computes cosine distance (1 minus cosine similarity), <#> computes negative inner product, and <-> computes L2 distance. Lower values mean more similar for all operators.
-- Find 10 most similar documents (cosine distance) SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity FROM documents ORDER BY embedding <=> $1::vector LIMIT 10; -- With metadata filtering (filter BEFORE vector search for efficiency) SELECT id, content, 1 - (embedding <=> $1::vector) AS similarity FROM documents WHERE metadata->>'category' = 'technical' AND created_at > now() - interval '30 days' ORDER BY embedding <=> $1::vector LIMIT 10; -- Hybrid: combine with full-text search WITH vector_matches AS ( SELECT id, 1 - (embedding <=> $1::vector) AS vscore FROM documents ORDER BY embedding <=> $1::vector LIMIT 20 ), text_matches AS ( SELECT id, ts_rank(to_tsvector('english', content), plainto_tsquery('english', $2)) AS tscore FROM documents WHERE to_tsvector('english', content) @@ plainto_tsquery('english', $2) LIMIT 20 ) SELECT COALESCE(v.id, t.id) AS id, COALESCE(v.vscore, 0) * 0.7 + COALESCE(t.tscore, 0) * 0.3 AS score FROM vector_matches v FULL OUTER JOIN text_matches t ON v.id = t.id ORDER BY score DESC LIMIT 10;
Step 6: Tune for production.
Two HNSW parameters control the speed/accuracy trade-off. ef_search controls how many candidates the index evaluates at query time: higher values give better recall but slower queries. m and ef_construction control index quality at build time: higher values give better recall but slower index building and more memory usage.
-- Increase ef_search for better recall (default is 40) SET hnsw.ef_search = 100; -- Check index size SELECT pg_size_pretty(pg_relation_size('idx_documents_embedding')); -- Monitor query performance EXPLAIN ANALYZE SELECT id FROM documents ORDER BY embedding <=> '[0.1, 0.2, ...]'::vector LIMIT 10; -- Ensure enough shared memory for the index -- In postgresql.conf: -- shared_buffers = 4GB (or 25% of total RAM) -- effective_cache_size = 12GB (or 75% of total RAM) -- maintenance_work_mem = 2GB (for index building)

Scaling Considerations

pgvector handles up to roughly 5 to 10 million 1536-dimensional vectors on a single machine with 32 GB or more of RAM. Beyond that, consider partitioning the table by a natural partition key (tenant, date, category) so each partition's index fits comfortably in memory. If you need to scale beyond what a single machine can handle, consider migrating to a distributed vector database or using pgvector with Citus for distributed PostgreSQL.

For applications using Adaptive Recall, vector storage and indexing are managed as part of the memory infrastructure. You do not need to set up, tune, or maintain pgvector or any other vector database yourself. The system handles embedding, indexing, and querying alongside the cognitive scoring and knowledge graph traversal that complement vector similarity.

Skip the database setup. Adaptive Recall provides managed vector search combined with cognitive scoring and graph traversal, no infrastructure to maintain.

Try It Free