Can Vector Search Work Without an External Database
In-Memory Search with NumPy
The simplest vector search implementation is a NumPy matrix multiplication. Store all vectors in a 2D NumPy array, and for each query, compute the dot product (or cosine similarity) between the query vector and all stored vectors. This is exact nearest neighbor search with perfect recall, and NumPy's optimized BLAS operations make it fast enough for tens of thousands of vectors.
import numpy as np
import json
class SimpleVectorSearch:
def __init__(self):
self.vectors = []
self.documents = []
self.matrix = None
def add(self, text: str, embedding: list):
self.documents.append(text)
self.vectors.append(embedding)
self.matrix = None # invalidate cache
def search(self, query_embedding: list, top_k: int = 5):
if self.matrix is None:
self.matrix = np.array(self.vectors)
norms = np.linalg.norm(self.matrix, axis=1, keepdims=True)
self.matrix = self.matrix / norms
query = np.array(query_embedding)
query = query / np.linalg.norm(query)
similarities = self.matrix @ query
top_indices = np.argsort(similarities)[-top_k:][::-1]
return [
{"text": self.documents[i], "score": float(similarities[i])}
for i in top_indices
]
def save(self, path: str):
np.savez(path,
vectors=np.array(self.vectors),
documents=np.array(self.documents))
def load(self, path: str):
data = np.load(path + ".npz", allow_pickle=True)
self.vectors = data["vectors"].tolist()
self.documents = data["documents"].tolist()
self.matrix = NoneThis approach handles up to roughly 50K to 100K vectors with sub-100ms query times. Beyond that, exact search becomes slow because every query computes similarity against every stored vector. The O(n) query complexity means doubling the vectors doubles the query time.
FAISS: Fast In-Memory and File-Based Search
FAISS (Facebook AI Similarity Search) is a library that provides both exact and approximate nearest neighbor search with optimized C++ implementations. It runs in-process (no server), supports file-based persistence, and handles millions of vectors efficiently with indexing.
import faiss
import numpy as np
# Exact search (good for under 100K vectors)
dimension = 1536
index = faiss.IndexFlatIP(dimension) # Inner product (cosine for normalized)
# Add vectors (must be float32)
vectors = np.array(embeddings, dtype=np.float32)
faiss.normalize_L2(vectors)
index.add(vectors)
# Search
query = np.array([query_embedding], dtype=np.float32)
faiss.normalize_L2(query)
distances, indices = index.search(query, k=10)
# Save to disk
faiss.write_index(index, "my_index.faiss")
# Load from disk
index = faiss.read_index("my_index.faiss")
# For larger datasets, use HNSW index
index_hnsw = faiss.IndexHNSWFlat(dimension, 32) # 32 neighbors
index_hnsw.hnsw.efSearch = 64
faiss.normalize_L2(vectors)
index_hnsw.add(vectors)
# Now handles millions of vectors with sub-ms queriesFAISS is the right choice when you need fast vector search without a database server. It is commonly used in serverless functions (load the index from S3 at cold start), CLI tools, Jupyter notebooks, and batch processing pipelines. The trade-off is that FAISS does not support concurrent writes from multiple processes, so it is best for read-heavy or single-writer workloads.
ChromaDB: Embedded Database
ChromaDB runs as an embedded database (in-process, no server) with automatic embedding and persistence. It wraps HNSW indexing with a document-oriented API that handles embedding, storage, and querying in a few lines of code. For prototyping and small-scale applications, it is the fastest path to working vector search.
import chromadb
client = chromadb.PersistentClient(path="./chroma_data")
collection = client.get_or_create_collection("docs")
# Add documents (ChromaDB can embed automatically)
collection.add(
documents=["Database connection pooling guide...",
"Authentication troubleshooting..."],
ids=["doc1", "doc2"]
)
# Search
results = collection.query(
query_texts=["how to configure connection pools"],
n_results=5
)When You Need a Real Database
In-memory and embedded approaches stop working well when you need concurrent access from multiple application instances, real-time updates from multiple writers, vectors that exceed available RAM, or enterprise features like replication, backups, and monitoring. At that point, pgvector (if you have PostgreSQL) or a dedicated vector database (Qdrant, Pinecone, Weaviate) is the right step up.
The good news is that starting with a simple approach and migrating later is straightforward. The vector search interface (embed query, find top-k similar, return results) is the same regardless of the backend. Switching from NumPy to FAISS to pgvector to Qdrant changes the implementation but not the API contract.
Adaptive Recall handles vector storage, search, and scaling as a managed service. Start simple, scale without migration.
Try It Free