How to Implement Compilation-Stage Knowledge
Why Raw Chunks Are Not Enough
Traditional RAG indexes raw document chunks and retrieves them as-is. The problem is that raw chunks are optimized for reading in sequence within a document, not for answering isolated questions. A paragraph about database configuration makes sense in the context of the deployment guide but may be cryptic when retrieved on its own. A comparison between two services might span three pages of a design document, and no single chunk captures the full comparison.
Compilation-stage knowledge creates new artifacts that are optimized for retrieval and answering. A "service comparison" artifact synthesizes information from multiple documents into a single retrievable unit. An "entity profile" for PostgreSQL compiles every mention of PostgreSQL across all documents into a comprehensive reference. A "FAQ layer" pre-answers the 100 most common questions so retrieval returns a direct answer rather than a source paragraph that contains the answer somewhere in the middle.
This is analogous to how compiled code differs from source code. The source code is written for human reading. The compiled code is optimized for machine execution. Compilation-stage knowledge transforms human-readable documents into retrieval-optimized artifacts while keeping the source documents as the authoritative reference.
Step-by-Step Implementation
Analyze your query logs (or anticipate query patterns if you are building a new system) to identify categories of questions that your current RAG handles poorly. Common categories include: broad overview questions ("describe the architecture"), entity-specific questions ("what does Service X do"), comparison questions ("how does A differ from B"), and procedural questions ("what are the steps to deploy"). Each category maps to a specific compilation artifact.
# Analyze query logs to find question patterns
patterns = {
"overview": ["describe", "overview", "explain", "how does * work"],
"entity": ["what is", "what does", "who maintains", "tell me about"],
"comparison": ["compare", "difference between", "vs", "better"],
"procedural": ["how to", "steps to", "process for", "guide to"]
}
def categorize_queries(query_log):
categories = {k: [] for k in patterns}
for query in query_log:
for category, keywords in patterns.items():
if any(kw in query.lower() for kw in keywords):
categories[category].append(query)
break
return categoriesGenerate hierarchical summaries at index time. For each document, create a paragraph-level summary and a document-level summary. For each cluster of related documents, create a topic-level summary. These summaries are embedded and stored alongside the raw chunks. When a broad query arrives, the topic-level summary is more relevant (and more retrievable) than any individual chunk from the source documents.
SUMMARY_PROMPT = """Summarize the following document cluster in 200-300
words. Focus on: what these documents cover, the key entities and
relationships, and the most important facts a reader would need.
Documents:
{documents}"""
def build_summary_layer(document_clusters):
summaries = []
for cluster_name, docs in document_clusters.items():
doc_text = "\n\n---\n\n".join(d.text for d in docs)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=500,
messages=[{"role": "user",
"content": SUMMARY_PROMPT.replace(
"{documents}", doc_text)}]
)
summaries.append({
"type": "topic_summary",
"cluster": cluster_name,
"text": response.content[0].text,
"source_docs": [d.id for d in docs]
})
return summariesFor common question patterns, extract the specific answers from your documents and store them as standalone fact artifacts. A fact artifact contains the question, the answer, and a reference to the source document. When a user asks a matching question, the fact artifact retrieves with higher relevance than the source paragraph because it is specifically phrased as a question-answer pair.
FACT_EXTRACTION_PROMPT = """Read this document and extract 5-10
question-answer pairs that someone might ask about this content.
Each answer should be self-contained (understandable without the
original document).
Return as JSON: [{"question": "...", "answer": "...",
"source_section": "..."}]
Document:
{document}"""
def extract_facts(documents):
facts = []
for doc in documents:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[{"role": "user",
"content": FACT_EXTRACTION_PROMPT.replace(
"{document}", doc.text)}]
)
doc_facts = json.loads(response.content[0].text)
for fact in doc_facts:
fact["source_doc"] = doc.id
fact["type"] = "derived_fact"
facts.extend(doc_facts)
return factsFor each entity that appears frequently across your documents, compile a single profile that aggregates all information about that entity from every source. The profile includes what the entity is, its relationships to other entities, key facts, and references to source documents. This solves the fragmentation problem where information about a single entity is scattered across dozens of documents and no single chunk gives a complete picture.
PROFILE_PROMPT = """Create a comprehensive profile for the entity
"{entity}" based on all the passages below. Include:
- What it is and its purpose
- Key relationships to other entities
- Important technical details
- Current status or configuration
Passages mentioning {entity}:
{passages}"""
def build_entity_profiles(entities, document_chunks):
profiles = []
for entity in entities:
mentions = [c for c in document_chunks
if entity.lower() in c.text.lower()]
if len(mentions) < 2:
continue
passages = "\n\n---\n\n".join(m.text for m in mentions[:20])
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1000,
messages=[{"role": "user",
"content": PROFILE_PROMPT
.replace("{entity}", entity)
.replace("{passages}", passages)}]
)
profiles.append({
"type": "entity_profile",
"entity": entity,
"text": response.content[0].text,
"source_chunks": [m.id for m in mentions]
})
return profilesCompiled knowledge becomes stale when source documents change. Set up incremental recompilation that tracks document changes and re-generates only the affected artifacts. A full recompilation processes every document. An incremental recompilation only processes documents that changed since the last run and regenerates the summaries, facts, and entity profiles that depend on those documents.
def incremental_recompile(changed_doc_ids, compiled_index):
# Find all compiled artifacts that reference changed docs
stale = [artifact for artifact in compiled_index
if any(doc_id in artifact.get("source_docs", [])
or doc_id in artifact.get("source_chunks", [])
for doc_id in changed_doc_ids)]
# Regenerate stale artifacts
for artifact in stale:
if artifact["type"] == "topic_summary":
regenerate_summary(artifact)
elif artifact["type"] == "derived_fact":
regenerate_facts(artifact)
elif artifact["type"] == "entity_profile":
regenerate_profile(artifact)
return len(stale)How This Relates to Memory Consolidation
Compilation-stage knowledge and memory consolidation solve the same problem from different angles. Compilation transforms raw documents into retrieval-optimized artifacts at index time. Memory consolidation transforms accumulated memories into refined, current, deduplicated knowledge over time. Both create derived knowledge that is better for retrieval than the raw source material.
Adaptive Recall performs continuous compilation through its consolidation pipeline. As memories accumulate, the system merges related memories, updates entity profiles in the knowledge graph, resolves contradictions, and adjusts confidence scores. The result is a memory store where each memory is retrieval-optimized: it carries entity connections for graph traversal, confidence scores for ranking, and recency metadata for freshness. You get the benefits of compilation-stage knowledge without building a separate compilation pipeline.
Let your memory system compile itself. Adaptive Recall continuously consolidates and optimizes stored knowledge for better retrieval.
Try It Free