Is 1536 Dimensions Better Than 768 for Embeddings
What the Benchmarks Show
MTEB benchmark results across multiple models show diminishing returns from additional dimensions. Moving from 384 to 768 dimensions typically improves retrieval NDCG@10 by 3 to 5%. Moving from 768 to 1,536 dimensions improves it by 1 to 3%. Moving from 1,536 to 3,072 dimensions improves it by under 1%. The largest quality gains come at lower dimensions, and by 768 dimensions, most of the meaningful semantic information is already captured.
OpenAI's text-embedding-3-large model (3,072 native dimensions) supports Matryoshka dimension reduction, allowing you to test this directly. Truncating from 3,072 to 1,536 reduces MTEB scores by about 1.5%. Truncating from 1,536 to 768 reduces scores by another 2%. Truncating from 768 to 384 reduces scores by 3 to 4%. The practical implication is that for most applications, the difference between 768 and 1,536 dimensions is smaller than the difference caused by changing chunk size, switching embedding models, or adding hybrid search.
The Storage and Performance Trade-Off
Every additional dimension adds 4 bytes per vector (in float32). Doubling dimensions from 768 to 1,536 doubles the storage for raw vectors, roughly doubles the HNSW index size, and increases distance calculation time. For 1 million vectors:
Dimensions | Raw size | HNSW index | Query time (approx)
-----------|-----------|------------|--------------------
384 | 1.4 GB | 4.3 GB | ~1.5 ms
768 | 2.9 GB | 8.6 GB | ~2.5 ms
1024 | 3.8 GB | 11.5 GB | ~3.0 ms
1536 | 5.7 GB | 17.1 GB | ~4.0 ms
3072 | 11.5 GB | 34.4 GB | ~6.0 msThe storage difference determines how many vectors fit in RAM, which directly affects whether pgvector can handle your workload or whether you need a dedicated vector database. With 32 GB of RAM, you can comfortably index 2 million 768-dimensional vectors but only 1 million 1,536-dimensional vectors. Choosing lower dimensions effectively doubles your capacity without changing hardware.
When Higher Dimensions Matter
Specialized domains with subtle distinctions. In legal document retrieval, the difference between a case about "employment discrimination based on age" and "employment discrimination based on disability" is subtle but critical. Higher dimensions help the model encode these fine distinctions. In general-purpose documentation, the differences between topics are usually large enough that 768 dimensions captures them well.
Very large corpora. With 10 million documents, the probability of having many semantically similar documents increases. Higher dimensions help distinguish between them. With 100K documents, there are fewer near-duplicates to distinguish, and lower dimensions suffice.
Multi-lingual retrieval. Cross-lingual embeddings need to encode both the semantic meaning and the language-specific nuances. Higher dimensions give the model more room to represent both. For English-only corpora, this is not a factor.
Quantization as an Alternative to Fewer Dimensions
If storage is your concern but you want to keep higher-dimensional vectors for quality, quantization offers a better trade-off than reducing dimensions. Scalar quantization converts each float32 value (4 bytes) to int8 (1 byte), giving a 4x storage reduction with under 1% recall loss. Product quantization compresses vectors further by encoding groups of dimensions together, achieving 8 to 16x reduction with 2 to 5% recall loss.
The difference between quantization and dimension reduction is what information you lose. Reducing from 1,536 to 768 dimensions discards half of the learned semantic features entirely. Scalar quantization keeps all 1,536 features but represents each one with less numerical precision. In practice, the precision loss from int8 quantization is smaller than the information loss from halving dimensions, because the important semantic distinctions are encoded across many dimensions, not concentrated in high-precision individual values.
# Storage comparison for 1 million vectors
Approach | Dims | Bytes/vector | Total size
---------------------------|------|--------------|----------
float32, 1536 dims | 1536 | 6,144 | 5.7 GB
float32, 768 dims | 768 | 3,072 | 2.9 GB
int8 quantized, 1536 dims | 1536 | 1,536 | 1.4 GB
PQ compressed, 1536 dims | 1536 | ~384 | 0.36 GB
# int8 at 1536 dims uses LESS storage than float32 at 768 dims
# while retaining more retrieval qualityMost modern vector databases support quantization natively. Qdrant offers scalar and product quantization configurable per collection. Weaviate supports product quantization through its PQ module. pgvector supports halfvec (float16) for a 2x reduction. If your vector database supports quantization, use 1,536 dimensions with scalar quantization rather than reducing to 768 dimensions at full precision.
Matryoshka Embeddings: The Best of Both
Matryoshka representation learning, supported by OpenAI's text-embedding-3 family and several open-source models (nomic-embed-text, mxbai-embed-large), trains the model so that the first N dimensions of a larger vector are themselves a useful embedding. This means you embed once at full dimensionality and can truncate later without re-embedding. A 3,072-dimensional vector from text-embedding-3-large can be truncated to 1,536 dimensions for storage, with the guarantee that those 1,536 dimensions are the most informative ones (they were trained to be).
This eliminates the need to choose dimensions at embedding time. Embed at the maximum, store at whatever dimensionality your storage budget allows, and increase later if you upgrade your infrastructure. The only cost is the initial embedding API call at the higher dimension count, which is identical in price to the lower dimension count for OpenAI's models.
Practical Recommendation
Start with whatever dimensionality your chosen embedding model defaults to (typically 768 or 1,024 for open-source models, 1,536 for OpenAI's small model). Measure recall@10 on your evaluation dataset. If recall is below your target, the cause is almost certainly the embedding model, chunk size, or missing content rather than insufficient dimensions. Increasing dimensions is the optimization of last resort, after you have optimized everything else.
If storage is tight, apply scalar quantization before reducing dimensions. If you use a Matryoshka model, embed at full dimensions and truncate in production if storage becomes a constraint. This gives you the option to increase quality later without re-embedding. And if you are evaluating whether to upgrade from 768 to 1,536, run the comparison on your own data first. The 1 to 3% improvement on benchmarks may be 0% or 5% on your specific queries, and only your data can tell you which.
Adaptive Recall handles embedding dimensions, storage, and retrieval optimization as part of its managed pipeline. Focus on your application, not vector infrastructure tuning.
Try It Free