Cognitive Scoring for AI Retrieval
On This Page
- The Problem with Similarity-Only Retrieval
- What Cognitive Scoring Is
- The Four Scoring Dimensions
- Reranking: Where Cognitive Scoring Lives
- Two-Stage Retrieval Architecture
- Encoder Models and Scoring Approaches
- Lessons from Human Cognition
- Cognitive Scoring in Production
- Implementation Guides
- Core Concepts
- Common Questions
The Problem with Similarity-Only Retrieval
Vector similarity search computes a distance between the embedding of a query and the embeddings of stored content, returning the closest matches. This works well for finding text that is semantically related to the query, but it treats every stored item as equally valid regardless of when it was written, how often it has proven useful, what other knowledge it connects to, or whether it has been contradicted by newer information. Two documents with identical similarity scores could differ enormously in actual usefulness.
Consider a customer support system that has accumulated thousands of product-related memories over two years. A user asks about configuring two-factor authentication. Vector search returns the five most similar memories, which might include setup instructions from three different product versions, a workaround for a bug that was patched eight months ago, and the current documentation. All five score within a few percentage points of each other on similarity. The system has no way to distinguish the current answer from the obsolete ones because similarity measures topical relevance, not temporal validity or reliability.
This problem compounds over time. The more memories a system accumulates, the more stale, contradictory, and redundant entries compete with current, accurate answers in the similarity rankings. Manual curation can manage this for small, slow-changing knowledge bases, but it breaks down for systems that ingest hundreds or thousands of memories per month across diverse topics. The retrieval mechanism itself needs to understand which results are most likely to be correct and current, not just which ones share the most vocabulary with the query.
Cognitive scoring solves this by adding dimensions beyond semantic similarity. A memory retrieved ten times in the past week gets an activation boost that a memory untouched for six months does not. A memory connected to the query through entity-graph associations gets a spreading activation bonus. A memory corroborated by three independent sources gets a confidence weight that an unverified observation does not. These factors combine with vector similarity to produce a final ranking that reflects not just topical relevance but actual answer quality.
What Cognitive Scoring Is
Cognitive scoring is a retrieval ranking method inspired by ACT-R, a cognitive architecture developed at Carnegie Mellon University that models how human memory works. The core insight is that human recall does not operate on similarity alone. When you remember something, your brain considers how recently you encountered it, how many times you have used it, what you are currently thinking about (which primes related memories), and how well-established the information is in your understanding. ACT-R provides mathematical equations for each of these factors, validated across thousands of experiments over more than forty years.
Adaptive Recall implements these equations as a real-time scoring layer that runs on top of vector search results. The system first retrieves candidate memories using standard embedding similarity (finding memories that are about the right topic), then reranks those candidates using cognitive scoring (finding the best answer among the relevant ones). The combined score determines the final ranking returned to the application.
Unlike heuristic approaches that assign arbitrary weights to metadata fields, cognitive scoring uses equations derived from empirical research on human memory. The decay function follows a power law calibrated against human forgetting data. The spreading activation mechanism uses graph-distance weighting validated against priming experiments. The confidence model uses corroboration and contradiction counts that mirror how humans build certainty through repeated exposure to consistent information. Every parameter has a known default value grounded in experimental data, and every parameter is tunable for domain-specific needs.
The practical effect is retrieval that behaves like consulting a knowledgeable colleague rather than running a keyword search. A colleague naturally prioritizes recent information, well-established facts, and contextually relevant knowledge. They do not surface everything they have ever learned with equal weight. Cognitive scoring replicates this behavior through mathematics rather than intuition.
The Four Scoring Dimensions
Base-Level Activation: Recency and Frequency
Base-level activation tracks how recently and how frequently a memory has been accessed. Every time a memory is stored, retrieved, or updated, the system records the timestamp. Activation is computed as a logarithmic sum of recency-weighted access times, following the ACT-R base-level learning equation. Recent accesses contribute more than old ones, and the contribution follows a power-law decay rather than exponential decay, which means memories fade gradually rather than dropping off a cliff.
For AI retrieval, base-level activation solves the stale data problem. When a product's configuration changes, the new documentation accumulates activation through regular access while the old documentation decays. When a developer adopts a new framework, examples using the new framework build activation while old-framework examples fade. This happens automatically through usage patterns without any manual tagging or expiration dates.
The decay rate is tunable. Fast-moving domains like customer support benefit from aggressive decay (a value around 0.7) that strongly favors recent information. Stable domains like legal research benefit from slower decay (around 0.3) that preserves historical context. The default value of 0.5 provides a balanced curve that works well for most applications.
Spreading Activation: Contextual Connections
Spreading activation models how context primes retrieval. When you query about "authentication errors," you are not just looking for text that contains those words. You want information about login flows, session management, token validation, API keys, and other concepts that connect to authentication through your knowledge structure. Spreading activation captures these connections by propagating activation energy through entity associations in the knowledge graph.
Adaptive Recall automatically extracts entities from every stored memory and builds a knowledge graph of associations. When a retrieval query mentions an entity, activation flows from that entity to all memories connected to it through the graph. The activation strength decays with graph distance: direct connections (depth 1) receive full weight, and indirect connections through an intermediate entity (depth 2) receive roughly half weight. This prevents distant, tangentially related memories from polluting results while still capturing meaningful contextual associations.
Spreading activation is what allows cognitive scoring to find results that pure vector search misses. A query about "users getting 403 errors on the dashboard" might have low text similarity to a memory about "we updated the RBAC policy for admin endpoints last Tuesday," but spreading activation through shared entities like permissions, dashboard, and access control connects them. The developer finds the relevant change without needing to guess the exact phrasing that was used when the memory was stored.
Confidence Weighting: Reliability Scoring
Confidence weighting tracks how reliable a memory is based on corroboration and contradiction evidence. Every memory starts with a default confidence score (5.0 on a scale of 0 to 10). When the consolidation process finds multiple independent memories that support the same claim, confidence increases. When it finds memories that directly contradict the claim, confidence decreases. Over time, well-established facts accumulate high confidence while unverified observations and one-off remarks remain at lower confidence.
This dimension matters most for systems that ingest information from multiple sources or over long periods. A memory that says "our API rate limit is 100 requests per minute" might have been true when it was stored, but a later memory might say "we raised the rate limit to 500 RPM." The contradiction detection lowers confidence on the older memory, effectively promoting the newer, correct information even before the old memory fully decays from disuse.
Confidence weighting also protects high-value, well-established knowledge from decay. Memories with confidence above 8.0 are treated as consolidated knowledge and receive partial protection from activation decay. This means that core facts, like your application's architecture decisions or your team's coding conventions, remain accessible even during periods when they are not actively queried, because their high confidence signals that they represent stable, verified knowledge.
Decay: Controlled Forgetting
Decay is the mechanism that keeps retrieval focused on current, relevant information by gradually reducing the activation of unused memories. Without decay, a memory system that has been running for a year contains every observation, correction, update, and superseded fact ever stored, all competing equally for retrieval positions. With decay, unused memories fade from the top rankings, creating space for current information to surface.
The decay function follows a power law rather than exponential decay. This is important because power-law decay preserves a long tail: memories fade quickly in the first hours and days after storage, but the rate of fading slows over time. A memory untouched for three months still retains some activation, which means it can be recalled if directly queried, but it will not crowd out recent, actively used memories in general retrieval results.
Controlled forgetting is a feature, not a bug. The human brain forgets the vast majority of what it perceives, and this forgetting is adaptive: it keeps recall focused on information that is statistically likely to be useful based on past access patterns. ACT-R formalizes this principle, and Adaptive Recall applies it to AI memory. The result is a system that self-curates through usage rather than requiring manual cleanup.
Reranking: Where Cognitive Scoring Lives
Cognitive scoring operates as a reranking layer, not a replacement for vector search. The retrieval pipeline has two stages: candidate retrieval (using vector similarity to find topically relevant memories) and candidate reranking (using cognitive scoring to order those candidates by overall quality). This architecture means you keep the speed benefits of vector search for the broad filtering stage while adding the intelligence of cognitive scoring for the precision ranking stage.
The reranking approach is standard in information retrieval research. Search engines have used two-stage retrieval for decades, first using fast inverted index lookups to find candidate documents, then using more expensive relevance models to rank them. The innovation in cognitive scoring is the specific factors used for reranking: instead of click-through rates, page authority, or freshness heuristics, the system uses scientifically validated equations from cognitive science.
Reranking typically operates on the top 20 to 100 candidates from vector search. This keeps the computational cost low because the expensive cognitive scoring calculations (especially graph traversal for spreading activation) run on tens of items rather than the full memory store. For most configurations, the reranking step adds 15 to 40 milliseconds to retrieval latency, which is imperceptible to users.
Two-Stage Retrieval Architecture
A two-stage retrieval system separates the fast filtering stage from the precision ranking stage. The first stage uses an efficient model (typically a bi-encoder that produces vector embeddings) to find an initial candidate set. The second stage uses a more expensive model (a cross-encoder, an LLM, or in Adaptive Recall's case, cognitive scoring equations) to rerank those candidates for final presentation.
The reason for two stages is the fundamental trade-off between speed and accuracy in retrieval. Bi-encoders can compare a query against millions of documents in milliseconds because the document embeddings are precomputed and stored in an index. But bi-encoders are limited in accuracy because the query and document are encoded independently without any cross-attention. Cross-encoders are more accurate because they process the query and document together, allowing richer interaction between them, but they are orders of magnitude slower because they cannot precompute document representations.
Cognitive scoring adds a third dimension to this architecture by incorporating temporal, relational, and reliability factors that neither bi-encoders nor cross-encoders capture. A cross-encoder can tell you which document is most relevant to the query in terms of semantic content, but it cannot tell you which document is most current, most frequently validated, or most connected to the current context through entity associations. Cognitive scoring fills this gap, and it runs efficiently because it operates on precomputed metadata (access timestamps, entity connections, confidence values) rather than requiring expensive model inference.
The full pipeline in Adaptive Recall works as follows: vector similarity narrows the field to the top candidates, cognitive scoring reranks those candidates based on activation, spreading activation, confidence, and decay, and the final ranked list is returned to the application. Each stage adds value that the others cannot provide, and the total latency remains under 100 milliseconds for typical configurations.
Encoder Models and Scoring Approaches
Understanding the landscape of scoring approaches helps contextualize where cognitive scoring fits. Bi-encoders (like the models from OpenAI, Voyage, or Cohere that produce embedding vectors) are the workhorses of modern retrieval. They encode queries and documents independently into fixed-length vectors, and retrieval is a nearest-neighbor search in the vector space. They are fast, scalable, and effective for semantic matching, but they miss nuances that require cross-attention between query and document.
Cross-encoders process a query-document pair together through a transformer, producing a relevance score that captures fine-grained interactions between the query and document tokens. Models like those in the MS MARCO family, BGE-reranker, and Cohere Rerank achieve higher accuracy than bi-encoders on standard benchmarks. The cost is speed: you cannot precompute cross-encoder scores, so every query-document pair requires a model inference pass. This limits cross-encoders to reranking small candidate sets (typically 10 to 100 items).
ColBERT and similar late-interaction models represent a middle ground. They precompute per-token representations for documents but defer the query-document interaction to retrieval time, achieving accuracy close to cross-encoders with speed closer to bi-encoders. These models are gaining popularity for applications that need better accuracy than bi-encoders without the latency of cross-encoders.
LLM-as-a-judge is another reranking approach where a large language model evaluates each candidate's relevance to the query. This can produce high-quality rankings because LLMs understand context, nuance, and implicit relevance that embedding models miss. The trade-off is significant latency (hundreds of milliseconds to seconds per evaluation) and cost (each evaluation consumes LLM tokens). This approach works for low-volume, high-stakes applications but is impractical for high-throughput systems.
Cognitive scoring is distinct from all of these because it does not rely on model inference at reranking time. The scoring factors (activation, graph connections, confidence) are precomputed and stored as metadata alongside each memory. Reranking is a mathematical combination of these precomputed values with the vector similarity score, which makes it fast (under 40 milliseconds) and cheap (no API calls, no GPU inference). It captures dimensions that none of the model-based approaches address: temporal dynamics, usage patterns, and reliability evidence.
Lessons from Human Cognition
The scientific foundation of cognitive scoring comes from decades of research on how human memory retrieval works. Several key findings from cognitive psychology directly inform the scoring model.
The recency effect is one of the most robust findings in memory research. When people are asked to recall items from a list, they reliably recall the most recent items best. This is not just a laboratory artifact; it reflects a deep principle of memory: recently accessed information is statistically more likely to be relevant to current needs because the world has temporal structure. Things that happened recently are more likely to still be true and actionable than things that happened long ago. Base-level activation implements this principle with mathematical precision.
The spacing effect shows that information reviewed at increasing intervals is retained better than information crammed in a single session. This maps to the observation that memories accessed regularly across different contexts build stronger activation than memories accessed many times in a single burst. A coding pattern used once a week for three months is more reliably accessible than one used twenty times in a single afternoon and never again. Base-level activation naturally captures this pattern because each access event contributes independently to the activation sum.
Semantic priming experiments show that encountering a word speeds up recognition of related words. Seeing "doctor" makes "nurse" faster to recognize. This is spreading activation in action: activating one concept propagates energy to connected concepts through associative links. In AI retrieval, this means that the entities mentioned in a query should boost the retrieval of memories connected to those entities through the knowledge graph, even when the text similarity is low.
Confidence calibration research shows that humans track the reliability of their memories, and this tracking is generally accurate. Facts that you have encountered multiple times from multiple sources feel more certain than facts you heard once from an uncertain source. Adaptive Recall's confidence weighting implements this through corroboration counting: memories supported by multiple independent sources accumulate higher confidence, and this confidence affects their ranking in retrieval results.
Cognitive Scoring in Production
Running cognitive scoring in a production environment raises practical questions about latency, storage overhead, and maintenance. The answers depend on the specific configuration, but the general picture is that cognitive scoring adds minimal overhead to a well-architected retrieval system.
Latency impact is the primary concern. In Adaptive Recall's implementation, cognitive scoring adds 15 to 40 milliseconds to retrieval latency, depending on the size of the entity graph and the number of candidates being reranked. The base-level activation calculation is a simple mathematical operation on precomputed values (under 1 millisecond). Graph traversal for spreading activation is the most expensive component (5 to 20 milliseconds). Confidence weighting is a lookup and multiplication (under 1 millisecond). For applications where total retrieval latency must stay under 50 milliseconds, you can disable spreading activation and use only base-level activation and confidence weighting, reducing the overhead to under 5 milliseconds.
Storage overhead is modest. Each memory needs an array of access timestamps (typically capped at the most recent 100 accesses), a list of entity connections (usually 3 to 10 per memory), a confidence score (single float), and a corroboration count (single integer). For a store of 100,000 memories, this adds roughly 50 to 100 MB of metadata beyond the content and embeddings you are already storing.
Maintenance consists of two background processes. The decay process periodically recomputes base-level activation values for all memories, which can run during off-peak hours and takes seconds to minutes depending on store size. The consolidation process reviews recent memories for contradictions, corroboration, and merge opportunities, which runs as a scheduled job and typically processes a few hundred memories per run. Neither process affects retrieval latency because they update precomputed values that are read (not computed) during retrieval.
Implementation Guides
Reranking and Scoring
Core Concepts
Scoring Models and Approaches
Factors and Trade-offs
Common Questions
Add cognitive scoring to your retrieval system without building it yourself. Adaptive Recall runs the full scoring pipeline on every retrieval call.
Get Started Free