How to Keep a Knowledge Graph Updated Over Time
Why Graphs Go Stale
Knowledge graphs go stale because the reality they model changes faster than the graph is updated. A team migrates from MySQL to PostgreSQL, but the graph still says "order service uses MySQL." A developer leaves the company, but the graph still shows them as the maintainer of three services. An API endpoint is deprecated, but the graph still connects applications to it. Each stale triple produces retrieval results that are technically connected to the query but factually wrong.
The rate of staleness depends on your domain. Infrastructure graphs change weekly as services are deployed, scaled, and reconfigured. Personnel graphs change monthly as people join, leave, and switch teams. Conceptual graphs (technology comparisons, architectural patterns) change slowly but can shift dramatically when a new version launches or a technology is deprecated. Understanding the change velocity of your domain tells you how often to update.
Step-by-Step Maintenance
Monitor the sources that feed your knowledge graph for changes. If your graph was built from documentation, watch for file modifications using filesystem events, git webhooks, or polling. If your graph was built from API responses, schedule periodic re-fetches and compare against cached versions. If your graph was built from conversation logs or support tickets, monitor the stream for new entries that mention known entities.
import hashlib
import json
class ChangeDetector:
def __init__(self, state_file="graph_state.json"):
self.state_file = state_file
self.state = self._load_state()
def _load_state(self):
try:
with open(self.state_file) as f:
return json.load(f)
except FileNotFoundError:
return {}
def has_changed(self, doc_id, content):
content_hash = hashlib.sha256(content.encode()).hexdigest()
previous = self.state.get(doc_id)
if previous != content_hash:
self.state[doc_id] = content_hash
return True
return False
def save(self):
with open(self.state_file, 'w') as f:
json.dump(self.state, f)When a source document changes, re-extract entities and relationships from that document only. Do not reprocess the entire corpus for every change. Tag the new extractions with the source document ID and a timestamp so you can track provenance. This keeps extraction costs proportional to the volume of changes rather than the total corpus size.
def incremental_update(changed_docs, graph_db):
for doc_id, content in changed_docs:
# extract from changed document
chunks = chunk_text(content)
new_entities = []
new_triples = []
for chunk in chunks:
extraction = extract_from_chunk(chunk)
new_entities.extend(extraction["entities"])
new_triples.extend(extraction["relationships"])
# diff against existing graph data for this document
existing = graph_db.get_triples_by_source(doc_id)
changes = diff_triples(existing, new_triples)
# apply changes
apply_graph_changes(graph_db, changes, doc_id)Compare newly extracted triples with the triples currently in the graph from the same source document. Classify each triple as: new (exists in extraction but not in graph), unchanged (exists in both with same subject, predicate, object), modified (same subject and object but different predicate, or same subject and predicate but different object), or deleted (exists in graph but not in new extraction). Each classification drives a different update action.
def diff_triples(existing, extracted):
existing_set = {(t["subject"], t["predicate"], t["object"])
for t in existing}
extracted_set = {(t["subject"], t["predicate"], t["object"])
for t in extracted}
return {
"new": extracted_set - existing_set,
"deleted": existing_set - extracted_set,
"unchanged": existing_set & extracted_set
}Do not blindly overwrite existing triples with new extractions. Use confidence scores to determine the appropriate action. New triples from a single extraction start at moderate confidence (0.6 to 0.7). Triples that appear in multiple extractions or from multiple sources accumulate confidence. Triples that are "deleted" (not found in re-extraction) do not get removed immediately. Instead, reduce their confidence by a fixed amount (0.1 to 0.2). Only remove triples when confidence drops below a threshold (0.3). This prevents extraction noise from destabilizing the graph.
def apply_graph_changes(graph_db, changes, source_doc):
for s, p, o in changes["new"]:
graph_db.upsert_triple(s, p, o,
confidence=0.7,
source=source_doc,
updated=datetime.now())
for s, p, o in changes["deleted"]:
current = graph_db.get_triple(s, p, o)
if current:
new_conf = current["confidence"] - 0.15
if new_conf < 0.3:
graph_db.archive_triple(s, p, o)
else:
graph_db.update_confidence(s, p, o, new_conf)
for s, p, o in changes["unchanged"]:
current = graph_db.get_triple(s, p, o)
if current and current["confidence"] < 0.95:
graph_db.update_confidence(s, p, o,
min(current["confidence"] + 0.05, 0.95))When a new extraction says "checkout service uses Braintree" but the graph says "checkout service uses Stripe," you have a contradiction. Do not silently overwrite. Instead, keep both triples with a contradiction flag and reduced confidence on the older triple. Log the contradiction for review. In many cases, both may be true (the service migrated, or uses both), and only a human can determine the correct resolution.
def handle_contradiction(graph_db, new_triple, existing_triple):
# reduce confidence on existing
graph_db.update_confidence(
existing_triple["subject"],
existing_triple["predicate"],
existing_triple["object"],
existing_triple["confidence"] * 0.7
)
# add new triple at moderate confidence
graph_db.upsert_triple(
new_triple["subject"],
new_triple["predicate"],
new_triple["object"],
confidence=0.6,
contradicts=existing_triple["id"]
)
# log for review
log_contradiction(existing_triple, new_triple)Incremental updates catch changes as they happen, but drift accumulates from sources that change without triggering detection. Schedule a full re-extraction and validation cycle weekly or monthly depending on your domain's change velocity. The full cycle reprocesses all source documents, compares the complete extracted graph against the current graph, and generates a report of discrepancies. This catches the changes that incremental updates miss.
Automated Maintenance with Adaptive Recall
Adaptive Recall handles graph maintenance as part of its memory consolidation process. When memories are consolidated (merged, updated, or archived), the entities and relationships associated with those memories are re-evaluated. New entities are added. Entities whose source memories have been archived have their confidence reduced. Contradictions between memories propagate to the graph as reduced confidence on conflicting triples. This keeps the graph aligned with the current state of the memory system without requiring a separate maintenance pipeline.
Let your knowledge graph maintain itself. Adaptive Recall updates entity connections automatically during memory consolidation.
Get Started Free