Home » Cognitive Scoring » Scoring Without LLM

Can You Do Cognitive Scoring Without an LLM Call

Yes. Cognitive scoring runs entirely on precomputed metadata, specifically access timestamps, entity connections, and confidence values, using mathematical equations rather than model inference. No LLM call, no embedding API call, and no GPU is needed at scoring time. The only computation is arithmetic operations on stored values, which takes under 40 milliseconds for a typical candidate set. This makes cognitive scoring dramatically cheaper and faster than any model-based reranking approach.

What Runs at Query Time

When a retrieval query arrives, the cognitive scoring pipeline performs four calculations for each candidate in the reranking set, and none of them involve a model call:

Base-level activation: Read the candidate's access timestamps array, compute time differences from the current time, apply the power-law decay equation, sum and take the logarithm. This is pure arithmetic on stored timestamps. Total time: under 0.1 milliseconds per candidate.

Spreading activation: Extract entities from the query (if not already cached), look up those entities in the prebuilt entity graph, find which candidates share entities with the query, and compute a weighted overlap score. The entity graph is prebuilt and stored in memory or a fast key-value store. Total time: 5 to 20 milliseconds for the full candidate set, depending on graph size.

Confidence weighting: Read the candidate's confidence score (a single float value, precomputed during consolidation) and normalize it to a multiplier range. Total time: negligible.

Score combination: Multiply and add the four scoring components (vector similarity, activation, spreading activation, confidence) with their respective weights. Total time: negligible.

The total scoring time is dominated by the graph lookup in spreading activation. Everything else is basic math on values already stored alongside each memory. There is no API call, no network request to an inference service, and no GPU computation.

Where LLM Calls Do Happen

While scoring itself does not require LLM calls, some related processes do. Entity extraction (when a new memory is stored, an LLM extracts entities from the content to build graph connections) uses an LLM call at write time, not at query time. Consolidation (the background process that detects contradictions and updates confidence) may use an LLM to compare memory contents. These are write-path and maintenance operations, not query-path operations, so they do not affect retrieval latency or per-query cost.

The query embedding step (converting the query text into a vector for similarity search) does require an embedding model call, but this is part of the vector search stage, not the cognitive scoring stage. Cognitive scoring operates after vector search has already returned candidates.

Cost Comparison

Reranking MethodPer-Query CostLatencyInfrastructure
Cognitive scoring$0 (no API calls)15-40msCPU only
Cross-encoder (local)$0 (but GPU needed)50-200msGPU required
Cross-encoder (hosted)$0.001-0.003 per query100-200msAPI call
LLM-as-a-judge$0.005-0.02 per query500-2000msAPI call

At 10,000 queries per day, cross-encoder API costs run $10 to $30 per day, and LLM-as-a-judge costs $50 to $200 per day. Cognitive scoring costs nothing per query because all the computation is local arithmetic on precomputed values. The only cost is the storage for metadata (access timestamps, entity connections, confidence values), which is modest: roughly 1 KB per memory.

When You Might Still Want an LLM

Cognitive scoring excels at the dimensions it measures: recency, usage patterns, entity relationships, and reliability. It does not address semantic precision, which is where model-based reranking shines. If your primary retrieval problem is that vector similarity returns documents about the right topic but not the right specific answer, a cross-encoder provides value that cognitive scoring cannot replace.

The practical question is whether your retrieval errors are primarily semantic (wrong content ranked first) or contextual (right content ranked below stale or unreliable content). For dynamic memory stores, contextual errors dominate, and cognitive scoring addresses them efficiently. For static knowledge bases, semantic errors dominate, and model-based reranking is more useful. Many production systems benefit from both.

Multi-factor reranking with zero per-query API cost. Adaptive Recall runs cognitive scoring on precomputed metadata, keeping retrieval fast and cheap.

Get Started Free