Can LLMs Build Knowledge Graphs Automatically
How Automatic Construction Works
The LLM reads a chunk of text and identifies the entities (people, technologies, services, concepts) and the relationships between them (depends on, is maintained by, uses). The output is structured as JSON, with each entity having a name, type, and aliases, and each relationship having a subject, predicate, and object. These structured extractions are loaded into a graph database or triple store, and the accumulated triples form the knowledge graph.
The extraction runs on every document in your corpus during the initial graph construction and on new or changed documents during maintenance. For a corpus of 10,000 document chunks, the entire extraction process takes a few hours and costs $5 to $80 in LLM API calls depending on the model used. This is dramatically faster and cheaper than the months of manual knowledge engineering that graph construction required before LLMs.
What LLMs Get Right
Domain-agnostic extraction. Unlike traditional NER models that recognize only pre-trained entity types (Person, Organization, Location), LLMs extract any entity type you describe in the prompt. Tell the model to look for "services, databases, teams, APIs, and configuration settings" and it finds them. This flexibility means you can build a knowledge graph for any domain without training data.
Relationship typing. LLMs can assign typed predicates to relationships rather than just noting co-occurrence. They distinguish between "uses," "depends on," "is maintained by," and "is documented in," which produces a graph where traversal can follow specific relationship types rather than all connections.
Implicit relationship inference. LLMs can identify relationships that are implied rather than explicitly stated. If a paragraph says "the checkout service handles payment processing, storing transaction records in the orders database," the LLM infers the relationship (checkout_service, stores_data_in, orders_database) even though the text does not state it as a direct fact. This captures connections that rule-based extraction would miss.
What LLMs Get Wrong
Entity inconsistency. The same entity may be extracted as "PostgreSQL" in one chunk, "Postgres" in another, and "the database" in a third. Without explicit resolution logic, the graph ends up with three separate nodes for the same entity. Entity resolution (merging duplicates) must be handled as a post-processing step, using string matching, alias lists, and occasionally LLM-based disambiguation.
Predicate proliferation. Without a controlled vocabulary, the LLM generates dozens of slightly different predicate wordings for the same relationship: "uses," "utilizes," "relies on," "depends on," "is built with." Each becomes a separate relationship type in the graph, making traversal inconsistent. Providing a controlled predicate vocabulary in the prompt (10 to 20 predicate types) largely solves this, but some normalization is still needed.
Hallucinated relationships. LLMs sometimes infer relationships that are plausible but not stated in the text. If the text mentions Redis and the authentication service in the same paragraph, the model might extract (authentication_service, uses, Redis) even if the text does not say this. These hallucinated triples add noise to the graph. Confidence scoring and evidence tracking (requiring the model to cite the text that supports each relationship) help filter low-confidence extractions.
Context boundary issues. When documents are chunked for processing, relationships that span chunk boundaries may be missed entirely. If entity A is mentioned in chunk 1 and entity B is mentioned in chunk 2, and the relationship between them spans the boundary, neither chunk contains enough information for extraction. Overlapping chunks help, but some cross-boundary relationships are inevitably lost.
Practical Accuracy Expectations
| Metric | First pass | After prompt tuning |
|---|---|---|
| Entity precision | 85-90% | 92-96% |
| Entity recall | 70-80% | 80-90% |
| Relationship precision | 70-80% | 82-90% |
| Relationship recall | 60-70% | 70-80% |
These numbers come from evaluations across multiple domains (software engineering, customer support, legal documents). The "after prompt tuning" column reflects 2 to 3 iterations of refining the extraction prompt based on error analysis. Precision is more important than recall for knowledge graphs because false entities and relationships degrade retrieval quality, while missed entities simply fail to help (without actively hurting).
Making It Fully Automatic
Adaptive Recall makes knowledge graph construction fully automatic by running entity extraction on every memory stored through the MCP tools. The extraction pipeline is pre-tuned for memory content (observations, facts, decisions, technical details). Extracted entities are resolved against the existing graph using alias matching and string similarity. Relationships carry confidence scores that accumulate with corroboration and decay with contradiction. The result is a knowledge graph that builds itself as you store memories, with no manual extraction, no prompt tuning, and no graph maintenance required.
Build a knowledge graph by just storing memories. Adaptive Recall handles entity extraction, relationship identification, and graph construction automatically.
Try It Free