Home » Beyond RAG » Compilation-Stage Knowledge

How to Implement Compilation-Stage Knowledge

Compilation-stage knowledge means pre-processing your documents at index time to create derived artifacts, such as summaries, entity profiles, pre-computed answers, and relationship maps, that are stored alongside the raw chunks and retrieved when they match a query better than the source material. This shifts computational work from query time (where latency matters) to index time (where it does not), and it produces higher-quality retrieval because the compiled artifacts are specifically designed to answer common query patterns.

Why Raw Chunks Are Not Enough

Traditional RAG indexes raw document chunks and retrieves them as-is. The problem is that raw chunks are optimized for reading in sequence within a document, not for answering isolated questions. A paragraph about database configuration makes sense in the context of the deployment guide but may be cryptic when retrieved on its own. A comparison between two services might span three pages of a design document, and no single chunk captures the full comparison.

Compilation-stage knowledge creates new artifacts that are optimized for retrieval and answering. A "service comparison" artifact synthesizes information from multiple documents into a single retrievable unit. An "entity profile" for PostgreSQL compiles every mention of PostgreSQL across all documents into a comprehensive reference. A "FAQ layer" pre-answers the 100 most common questions so retrieval returns a direct answer rather than a source paragraph that contains the answer somewhere in the middle.

This is analogous to how compiled code differs from source code. The source code is written for human reading. The compiled code is optimized for machine execution. Compilation-stage knowledge transforms human-readable documents into retrieval-optimized artifacts while keeping the source documents as the authoritative reference.

Step-by-Step Implementation

Step 1: Identify compilable knowledge.
Analyze your query logs (or anticipate query patterns if you are building a new system) to identify categories of questions that your current RAG handles poorly. Common categories include: broad overview questions ("describe the architecture"), entity-specific questions ("what does Service X do"), comparison questions ("how does A differ from B"), and procedural questions ("what are the steps to deploy"). Each category maps to a specific compilation artifact.
# Analyze query logs to find question patterns patterns = { "overview": ["describe", "overview", "explain", "how does * work"], "entity": ["what is", "what does", "who maintains", "tell me about"], "comparison": ["compare", "difference between", "vs", "better"], "procedural": ["how to", "steps to", "process for", "guide to"] } def categorize_queries(query_log): categories = {k: [] for k in patterns} for query in query_log: for category, keywords in patterns.items(): if any(kw in query.lower() for kw in keywords): categories[category].append(query) break return categories
Step 2: Build summary layers.
Generate hierarchical summaries at index time. For each document, create a paragraph-level summary and a document-level summary. For each cluster of related documents, create a topic-level summary. These summaries are embedded and stored alongside the raw chunks. When a broad query arrives, the topic-level summary is more relevant (and more retrievable) than any individual chunk from the source documents.
SUMMARY_PROMPT = """Summarize the following document cluster in 200-300 words. Focus on: what these documents cover, the key entities and relationships, and the most important facts a reader would need. Documents: {documents}""" def build_summary_layer(document_clusters): summaries = [] for cluster_name, docs in document_clusters.items(): doc_text = "\n\n---\n\n".join(d.text for d in docs) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=500, messages=[{"role": "user", "content": SUMMARY_PROMPT.replace( "{documents}", doc_text)}] ) summaries.append({ "type": "topic_summary", "cluster": cluster_name, "text": response.content[0].text, "source_docs": [d.id for d in docs] }) return summaries
Step 3: Pre-compute derived facts.
For common question patterns, extract the specific answers from your documents and store them as standalone fact artifacts. A fact artifact contains the question, the answer, and a reference to the source document. When a user asks a matching question, the fact artifact retrieves with higher relevance than the source paragraph because it is specifically phrased as a question-answer pair.
FACT_EXTRACTION_PROMPT = """Read this document and extract 5-10 question-answer pairs that someone might ask about this content. Each answer should be self-contained (understandable without the original document). Return as JSON: [{"question": "...", "answer": "...", "source_section": "..."}] Document: {document}""" def extract_facts(documents): facts = [] for doc in documents: response = client.messages.create( model="claude-sonnet-4-6", max_tokens=2000, messages=[{"role": "user", "content": FACT_EXTRACTION_PROMPT.replace( "{document}", doc.text)}] ) doc_facts = json.loads(response.content[0].text) for fact in doc_facts: fact["source_doc"] = doc.id fact["type"] = "derived_fact" facts.extend(doc_facts) return facts
Embedding strategy: Embed the question portion of each fact artifact, not the answer. This ensures that the artifact retrieves when a user asks a similar question. The answer is returned as the context for the LLM, which produces a more focused response than a raw document chunk would.
Step 4: Create entity profiles.
For each entity that appears frequently across your documents, compile a single profile that aggregates all information about that entity from every source. The profile includes what the entity is, its relationships to other entities, key facts, and references to source documents. This solves the fragmentation problem where information about a single entity is scattered across dozens of documents and no single chunk gives a complete picture.
PROFILE_PROMPT = """Create a comprehensive profile for the entity "{entity}" based on all the passages below. Include: - What it is and its purpose - Key relationships to other entities - Important technical details - Current status or configuration Passages mentioning {entity}: {passages}""" def build_entity_profiles(entities, document_chunks): profiles = [] for entity in entities: mentions = [c for c in document_chunks if entity.lower() in c.text.lower()] if len(mentions) < 2: continue passages = "\n\n---\n\n".join(m.text for m in mentions[:20]) response = client.messages.create( model="claude-sonnet-4-6", max_tokens=1000, messages=[{"role": "user", "content": PROFILE_PROMPT .replace("{entity}", entity) .replace("{passages}", passages)}] ) profiles.append({ "type": "entity_profile", "entity": entity, "text": response.content[0].text, "source_chunks": [m.id for m in mentions] }) return profiles
Step 5: Schedule recompilation.
Compiled knowledge becomes stale when source documents change. Set up incremental recompilation that tracks document changes and re-generates only the affected artifacts. A full recompilation processes every document. An incremental recompilation only processes documents that changed since the last run and regenerates the summaries, facts, and entity profiles that depend on those documents.
def incremental_recompile(changed_doc_ids, compiled_index): # Find all compiled artifacts that reference changed docs stale = [artifact for artifact in compiled_index if any(doc_id in artifact.get("source_docs", []) or doc_id in artifact.get("source_chunks", []) for doc_id in changed_doc_ids)] # Regenerate stale artifacts for artifact in stale: if artifact["type"] == "topic_summary": regenerate_summary(artifact) elif artifact["type"] == "derived_fact": regenerate_facts(artifact) elif artifact["type"] == "entity_profile": regenerate_profile(artifact) return len(stale)

How This Relates to Memory Consolidation

Compilation-stage knowledge and memory consolidation solve the same problem from different angles. Compilation transforms raw documents into retrieval-optimized artifacts at index time. Memory consolidation transforms accumulated memories into refined, current, deduplicated knowledge over time. Both create derived knowledge that is better for retrieval than the raw source material.

Adaptive Recall performs continuous compilation through its consolidation pipeline. As memories accumulate, the system merges related memories, updates entity profiles in the knowledge graph, resolves contradictions, and adjusts confidence scores. The result is a memory store where each memory is retrieval-optimized: it carries entity connections for graph traversal, confidence scores for ranking, and recency metadata for freshness. You get the benefits of compilation-stage knowledge without building a separate compilation pipeline.

Let your memory system compile itself. Adaptive Recall continuously consolidates and optimizes stored knowledge for better retrieval.

Try It Free