Home » Memory Lifecycle Management » Build Diff-Based Updates

How to Build Diff-Based Memory Updates

Diff-based updates modify existing memories in place when information changes, instead of storing a new memory alongside the old one. This prevents the accumulation of near-duplicate entries that differ only in small details, preserves the activation history and confidence of the original memory, and keeps the memory store lean. The update tool in Adaptive Recall handles this automatically, but understanding the mechanics helps you build similar functionality into custom systems.

Before You Start

Diff-based updates work best when your system can reliably identify that incoming information is an update to an existing memory rather than genuinely new knowledge. This requires entity overlap detection and semantic similarity comparison between the incoming content and your existing memories. If you do not have entity extraction in place, start there, because entity matching is the most reliable signal for identifying update candidates.

You also need versioning capability in your storage backend, because updates should be reversible. If an update applies incorrect information to a well-established memory, you need the ability to roll back to the previous version.

Step-by-Step Implementation

Step 1: Detect when an update is needed.
When new information arrives, check whether it modifies something already in the memory store. Compute entity overlap and semantic similarity between the incoming content and existing memories. If the incoming content shares three or more entities with an existing memory and has cosine similarity above 0.80, it is likely an update rather than new knowledge. If the content is semantically similar but makes a different factual claim (detected through contradiction analysis), it is definitely an update that supersedes the existing version.
def find_update_target(new_content, new_entities, memory_store, entity_threshold=3, sim_threshold=0.80): new_embedding = generate_embedding(new_content) candidates = memory_store.search(new_embedding, limit=10) for candidate in candidates: shared = set(new_entities).intersection(candidate['entities']) sim = cosine_similarity(new_embedding, candidate['embedding']) if len(shared) >= entity_threshold and sim >= sim_threshold: return candidate return None
Step 2: Compute the diff.
Compare the new information against the existing memory to understand what changed. For structured memories with clear factual claims, extract the claims from both versions and identify additions, removals, and modifications. For unstructured text memories, use an LLM to summarize the differences. The diff should capture what specific information changed so the update can be applied precisely rather than simply overwriting the entire memory.
def compute_diff(existing_content, new_content): prompt = f"""Compare these two pieces of information and identify what changed. Return a structured diff. Existing: {existing_content} New: {new_content} Return JSON with: added (new facts), removed (old facts no longer true), modified (facts that changed), unchanged (facts that remain the same).""" response = llm_call(prompt) return parse_json(response)
Step 3: Apply the update.
Modify the existing memory content based on the computed diff. Replace modified facts with their new versions, add new facts, and remove outdated facts. After modifying the content, regenerate the vector embedding because the text has changed. Re-extract entities in case the update introduces new entity references or removes old ones. Critically, preserve the memory's access history, because the updated memory should retain its established activation rather than starting from zero.
def apply_update(memory, diff, memory_store): # store previous version for rollback memory['previous_versions'] = memory.get('previous_versions', []) memory['previous_versions'].append({ 'content': memory['content'], 'updated_at': memory.get('updated_at', memory['created_at']), 'entities': memory['entities'][:] }) # apply the diff to content updated_content = merge_content(memory['content'], diff) memory['content'] = updated_content memory['embedding'] = generate_embedding(updated_content) memory['entities'] = extract_entities(updated_content) memory['updated_at'] = time.time() memory_store.put(memory) return memory
Step 4: Update metadata.
Adjust the memory's metadata to reflect the update. Bump the modification timestamp to the current time. If the update corroborates existing information (adds detail without contradicting), increase the confidence score slightly. If the update contradicts and replaces information, keep the confidence stable or reduce it slightly, because the correction indicates the previous version was wrong. Add the update event to the access history so the memory gains activation from being modified, which is appropriate because an update demonstrates ongoing relevance.
def update_metadata(memory, diff): memory['updated_at'] = time.time() memory['access_times'].append(time.time()) if diff.get('modified') or diff.get('removed'): # contradiction correction, hold confidence steady pass elif diff.get('added') and not diff.get('removed'): # pure corroboration, boost confidence memory['confidence'] = min(10.0, memory['confidence'] + 0.3) memory['corroboration_count'] = memory.get( 'corroboration_count', 1) + 1 return memory
Step 5: Handle rollbacks.
Store the previous version of the memory content before applying each update. If a rollback is needed, restore the previous content, regenerate the embedding, re-extract entities, and adjust the modification timestamp. Limit the number of stored versions to prevent unbounded growth, keeping the last three to five versions is sufficient for most applications. Adaptive Recall handles versioning internally through its update tool, maintaining a rollback history for every modified memory.

When to Update vs When to Create

Not every piece of related information should trigger an update. The rule is: if the new information modifies or supersedes a specific fact in an existing memory, update that memory. If the new information adds a genuinely new perspective, use case, or piece of knowledge that happens to be related to an existing memory, create a new memory and let the entity graph connect them.

For example, if a memory says "the deployment pipeline uses GitHub Actions" and new information says "the deployment pipeline was migrated to GitLab CI," that is an update because it supersedes a specific fact. But if the new information says "we added a staging environment to the deployment pipeline," that is new knowledge that should be a separate memory, because the original fact about the pipeline tool is still true and the staging environment is additional information.

Preventing Update Storms

In high-volume systems, multiple updates to the same memory can arrive in rapid succession. If each update triggers an embedding regeneration and entity re-extraction, the compute costs add up quickly. Batch updates by debouncing: when an update arrives for a memory that was updated within the last few minutes, queue the update rather than applying it immediately. After the debounce period, apply all queued updates as a single operation. This reduces compute costs while ensuring that all updates are eventually applied.

Diff-based updates, versioning, and rollback built into the update tool. Modify memories without losing history.

Try It Free