Home » ACT-R Cognitive Architecture » How to Build a Confidence Scoring Pipeline

How to Build a Confidence Scoring Pipeline

A confidence scoring pipeline tracks how well-established each memory is by monitoring corroboration from multiple sources and detecting contradictions. Memories gain confidence when independent observations support them and lose confidence when conflicting information appears. This evidence-gated approach prevents the system from treating a single casual observation with the same authority as a well-corroborated fact.

Why Confidence Matters for Retrieval

Without confidence scoring, every memory has equal authority. A user mentioning "I think we use PostgreSQL 14" in passing gets the same retrieval weight as a deployment runbook that documents PostgreSQL 15 with connection details, migration history, and version-specific configuration. When both memories match a query about database configuration, the system has no way to prefer the authoritative source over the casual remark.

Confidence scoring solves this by tracking evidence. The deployment runbook gets corroborated every time someone references it, accesses the database details, or stores a new memory that aligns with its claims. The casual remark either gets corroborated (someone else confirms PostgreSQL 14) or contradicted (the runbook says 15). Over time, the confidence scores diverge, and retrieval naturally surfaces the well-established answer.

Step-by-Step Implementation

Step 1: Define the confidence scale.
Use a 0-to-10 scale with a default starting value of 5.0 for newly stored memories. Define threshold values that trigger specific behaviors: memories above 8.0 are protected from decay (they represent well-established knowledge), memories below 2.0 are candidates for archival or deletion (they have been contradicted or never corroborated), and memories at the default 5.0 are unverified observations that have not yet accumulated evidence in either direction.

CONFIDENCE_DEFAULT = 5.0
CONFIDENCE_MIN = 0.0
CONFIDENCE_MAX = 10.0
CONFIDENCE_PROTECTED = 8.0   # resist decay above this
CONFIDENCE_ARCHIVE = 2.0     # candidate for removal below this

CORROBORATION_BOOST = 0.5    # per corroborating source
CONTRADICTION_PENALTY = 1.5  # per contradicting source
MIN_CORROBORATIONS = 3       # required for high confidence

Step 2: Detect corroboration.
When a new memory is stored, compare it against existing memories that share entities or topic overlap. Use semantic similarity to find memories making similar claims. If the new memory supports an existing claim (similarity above a threshold and no contradictory signals), increment the corroboration count on the existing memory and boost its confidence score.

def detect_corroboration(new_memory, existing_memories, threshold=0.85):
    corroborated = []
    new_emb = new_memory['embedding']
    new_entities = set(new_memory['entities'])

    for mem in existing_memories:
        # must share at least one entity
        shared = new_entities.intersection(set(mem['entities']))
        if not shared:
            continue

        sim = cosine_similarity(new_emb, mem['embedding'])
        if sim >= threshold:
            corroborated.append(mem['id'])

    return corroborated

The entity overlap check prevents false corroboration between unrelated memories that happen to use similar vocabulary. Two memories must be about the same entities and make similar claims to count as corroborating each other.

Step 3: Detect contradictions.
Contradictions are harder to detect than corroboration because they require understanding that two statements conflict rather than simply differing. A practical approach uses entity overlap plus semantic analysis: if two memories share entities but make claims that an LLM judges as contradictory, flag them. For systems that cannot afford LLM calls on every store operation, use heuristics like detecting negation words, different numerical values for the same metric, or different version numbers for the same software.

def detect_contradictions(new_memory, existing_memories,
                          entity_overlap_min=2):
    candidates = []
    new_entities = set(new_memory['entities'])

    for mem in existing_memories:
        shared = new_entities.intersection(set(mem['entities']))
        if len(shared) < entity_overlap_min:
            continue

        # high entity overlap but moderate text similarity
        # suggests same topic, different claims
        sim = cosine_similarity(new_memory['embedding'], mem['embedding'])
        if 0.4 <= sim <= 0.75:
            candidates.append({
                'memory_id': mem['id'],
                'shared_entities': list(shared),
                'similarity': sim
            })

    return candidates

Evidence-gated resolution: When a contradiction is detected, do not automatically delete the older memory. Flag both memories for review. In Adaptive Recall, the reflect tool examines contradictions and resolves them by checking which memory has more corroboration, more recent access patterns, and higher base-level activation. The loser gets its confidence reduced, not deleted, because the contradiction might be resolved by future evidence.

Step 4: Update confidence scores.
Apply confidence adjustments using bounded arithmetic. Each corroborating source adds a fixed boost (typically 0.5 points), and each contradiction applies a penalty (typically 1.5 points, larger than the boost because false information is more damaging than missing information). Clamp the result to the 0-10 range.

def update_confidence(memory, corroborations=0, contradictions=0):
    current = memory.get('confidence', CONFIDENCE_DEFAULT)

    adjustment = (corroborations * CORROBORATION_BOOST -
                  contradictions * CONTRADICTION_PENALTY)

    new_confidence = current + adjustment
    new_confidence = max(CONFIDENCE_MIN, min(CONFIDENCE_MAX, new_confidence))

    memory['confidence'] = new_confidence
    memory['corroboration_count'] = memory.get('corroboration_count', 0) + corroborations
    memory['contradiction_count'] = memory.get('contradiction_count', 0) + contradictions

    return new_confidence

Step 5: Gate learning on evidence.
Do not allow a memory to reach protected status (above 8.0 confidence) until it has been corroborated by at least three independent sources. This evidence gate prevents a single repeated observation from being treated as established fact. The corroboration count must come from distinct memory storage events, not from the same source restating the same claim.

def apply_evidence_gate(memory):
    if memory['confidence'] > CONFIDENCE_PROTECTED:
        if memory.get('corroboration_count', 0) < MIN_CORROBORATIONS:
            memory['confidence'] = CONFIDENCE_PROTECTED
    return memory['confidence']

Step 6: Integrate with retrieval scoring.
Use the confidence score as a multiplier on the combined retrieval score. Normalize confidence to a range that does not completely suppress low-confidence memories (they might still be the only relevant result) but meaningfully favors high-confidence ones. A linear mapping from confidence 0-10 to a multiplier of 0.5-1.0 works well: even a zero-confidence memory retains half its retrieval score, while a fully corroborated memory gets the full score.

Running Confidence Updates

Confidence scoring can run synchronously (on every store operation) or asynchronously (in periodic consolidation batches). Synchronous updates detect corroboration and contradictions immediately but add latency to store operations. Asynchronous updates batch the analysis into periodic runs, which keeps store operations fast but delays confidence adjustments.

Adaptive Recall uses a hybrid approach. Basic corroboration detection (entity overlap plus similarity threshold) runs synchronously on store, adding minimal latency. Deep contradiction analysis and cross-reference validation run asynchronously through the reflect tool, which performs comprehensive consolidation on a configurable schedule.

Adaptive Recall runs evidence-gated confidence scoring automatically. Every memory is corroborated, validated, and scored without manual intervention.

Get Started Free

How to Build a Confidence Scoring Pipeline

Why Confidence Matters for Retrieval

Step-by-Step Implementation

Running Confidence Updates

Related Articles