Home » Vector Search and Embeddings » pgvector Capacity

How Many Vectors Can pgvector Handle

pgvector handles 1 to 10 million vectors at production latency on a single PostgreSQL instance, depending on hardware and vector dimensions. With 1,536-dimensional vectors, a machine with 32 GB of RAM handles roughly 2 million vectors with HNSW indexing at sub-5ms query latency. With 64 GB of RAM, this extends to 4 to 5 million. Beyond 10 million vectors, query latency increases because the HNSW index no longer fits entirely in memory, and dedicated vector databases with horizontal scaling become the better choice.

The Memory Equation

pgvector's performance is almost entirely determined by whether the HNSW index fits in RAM. The index must be traversed during every query, and each hop in the traversal reads a node from the index. When the index is in memory, each hop takes nanoseconds. When it spills to disk, each hop becomes a disk read that takes microseconds to milliseconds, and query latency jumps 10 to 100 times.

The memory required for a pgvector HNSW index is roughly: (num_vectors * dimensions * 4 bytes) * 3. The first factor is the raw vector size in float32. The multiplier of 3 accounts for HNSW graph structure overhead (node connections, metadata). For 1,536-dimensional vectors:

# Memory calculation for 1536-dim vectors
vectors   | Raw size  | HNSW index | Total RAM needed
----------|-----------|------------|------------------
100K      | 0.57 GB   | 1.7 GB     | ~4 GB
500K      | 2.9 GB    | 8.6 GB     | ~16 GB
1M        | 5.7 GB    | 17 GB      | ~32 GB
2M        | 11.5 GB   | 34 GB      | ~64 GB
5M        | 28.7 GB   | 86 GB      | ~128 GB
10M       | 57.4 GB   | 172 GB     | ~256 GB

These are approximate, and the actual overhead depends on HNSW parameters (m, ef_construction), metadata size, and PostgreSQL's own memory usage. The "Total RAM needed" column includes headroom for PostgreSQL's shared buffers, OS caches, and other tables. For a dedicated vector search machine, you can allocate more RAM to the index, but most production PostgreSQL instances serve other workloads alongside pgvector.

Query Latency by Scale

When the index fits in RAM, pgvector HNSW queries return in 1 to 5 milliseconds for top-10 results at ef_search = 64. This latency is stable from 100K to several million vectors because HNSW traversal visits a logarithmic number of nodes relative to the total graph size. Doubling the number of vectors adds roughly one more hop, not double the time.

When the index exceeds RAM, latency degrades non-linearly. At 110% of RAM capacity, latency may increase by 2 to 3 times as the least-recently-accessed index pages get evicted and reloaded. At 150% of RAM, latency increases 5 to 10 times. At 200% of RAM, latency becomes unpredictable (10ms to 200ms) depending on cache hit rates. This cliff behavior is why sizing RAM to comfortably hold the index is critical.

Real-World Sizing by Cloud Instance

The abstract memory calculations become concrete when mapped to actual cloud instances. Here is what each common PostgreSQL hosting tier handles for 1,536-dimensional vectors with HNSW indexing:

# AWS RDS PostgreSQL with pgvector
Instance      | RAM   | Max vectors | Monthly cost
--------------|-------|-------------|-------------
db.t4g.medium | 4 GB  | ~50K        | ~$50
db.r6g.large  | 16 GB | ~400K       | ~$180
db.r6g.xlarge | 32 GB | ~1M         | ~$360
db.r6g.2xl    | 64 GB | ~2.5M       | ~$720
db.r6g.4xl    | 128GB | ~5M         | ~$1,440

# DigitalOcean / Hetzner VPS (self-managed PG)
VPS size      | RAM   | Max vectors | Monthly cost
--------------|-------|-------------|-------------
4 GB          | 4 GB  | ~50K        | ~$24
8 GB          | 8 GB  | ~150K       | ~$48
16 GB         | 16 GB | ~400K       | ~$96
32 GB         | 32 GB | ~1M         | ~$192
64 GB         | 64 GB | ~2.5M       | ~$384

These estimates assume pgvector shares the instance with your application database. If the PostgreSQL instance is dedicated solely to vector search with no other tables or workloads, you can push roughly 30% higher because more RAM is available for the HNSW index. The max vector counts assume enough headroom for PostgreSQL shared buffers, OS page cache, and other memory consumers.

Strategies for Scaling Beyond Single-Machine Limits

Table partitioning divides your vectors into partitions by a natural key (tenant ID, date, category). Each partition has its own HNSW index that is smaller and more likely to fit in RAM. Queries that include the partition key only scan the relevant partition. This works well for multi-tenant applications where each tenant's data is queried independently.

-- Partition by tenant for a multi-tenant SaaS
CREATE TABLE documents (
    id BIGSERIAL,
    tenant_id INTEGER NOT NULL,
    content TEXT,
    embedding vector(1536),
    PRIMARY KEY (id, tenant_id)
) PARTITION BY HASH (tenant_id);

-- Create 16 partitions
CREATE TABLE documents_p0 PARTITION OF documents
    FOR VALUES WITH (modulus 16, remainder 0);
CREATE TABLE documents_p1 PARTITION OF documents
    FOR VALUES WITH (modulus 16, remainder 1);
-- ... through p15

-- Each partition gets its own HNSW index
-- With 2M total vectors across 100 tenants,
-- each partition's index is ~1/16 of the total size
CREATE INDEX ON documents_p0 USING hnsw (embedding vector_cosine_ops);
CREATE INDEX ON documents_p1 USING hnsw (embedding vector_cosine_ops);

Partitioning effectively multiplies your capacity. A 32 GB instance that handles 1M vectors in a single table can handle 3 to 4M vectors across well-distributed partitions because queries only load the relevant partition's index into memory.

Quantization reduces vector size, extending how many vectors fit in RAM. pgvector supports halfvec (float16, 2 bytes per dimension) for a 2x reduction with negligible quality impact. For deeper compression, you can store quantized vectors outside pgvector and use them for a first-pass filter, then rescore the top candidates with full-precision vectors. Scalar quantization (float32 to int8) gives a 4x reduction with under 1% recall loss.

Dimension reduction using Matryoshka models lets you truncate 1,536-dim vectors to 768 or even 384 dimensions with controlled quality loss. This halves or quarters the memory requirement at the cost of some embedding expressiveness. Combined with quantization, you can achieve 8x or greater reduction: 1,536 float32 (6 KB per vector) reduced to 768 float16 (1.5 KB per vector) stores 4 times as many vectors in the same RAM.

Partial indexes limit the HNSW index to a subset of rows. If most queries filter by a time range (last 90 days, last year), create a partial HNSW index on recent data and use exact search or a separate index for older data. This reduces the active index size to the hot dataset while keeping historical data accessible.

-- Partial index for recent data only
CREATE INDEX idx_recent_vectors ON documents
USING hnsw (embedding vector_cosine_ops)
WHERE created_at > now() - interval '90 days';

-- Queries within the time range use the smaller index
-- Queries outside the range fall back to sequential scan
-- Refresh periodically by dropping and recreating

If none of these strategies provide enough headroom, it is time to migrate to a horizontally scalable vector database (Qdrant, Weaviate, Pinecone) that distributes the index across multiple machines. pgvector is excellent up to its single-machine limits, but it does not support distributed search. The migration path is straightforward: export vectors and metadata from PostgreSQL, load them into the new database, and update your query code to use the new client.

Skip the capacity planning. Adaptive Recall manages vector storage and scaling as part of its hosted infrastructure.

Try It Free

How Many Vectors Can pgvector Handle

The Memory Equation

Query Latency by Scale

Real-World Sizing by Cloud Instance

Strategies for Scaling Beyond Single-Machine Limits

Related Articles