How to Extract Entities and Relationships with LLMs
LLMs vs Traditional NER
Traditional named entity recognition (NER) models like SpaCy's en_core_web_lg recognize a fixed set of entity types: Person, Organization, Location, Date. They are fast (milliseconds per document), free to run locally, and highly accurate on their trained types. But they cannot extract domain-specific entities like "microservice," "API endpoint," "deployment pipeline," or "memory consolidation job" without custom training data, which requires hundreds of annotated examples per entity type.
LLM-based extraction works on any entity type you can describe in natural language. You specify the types you want in the prompt, and the model identifies them. The trade-offs are cost (an API call per chunk), latency (seconds rather than milliseconds), and occasional hallucination (the model may infer entities that are not explicitly stated in the text). For knowledge graph construction, the flexibility of LLMs usually outweighs the cost, especially during the initial build when you are discovering what entity types exist in your data.
Step-by-Step Implementation
Start with 5 to 10 entity types that cover your domain. For a software engineering knowledge base, you might use: Service, Database, Person, Team, Technology, API, Configuration, and Concept. For a customer support knowledge base: Product, Feature, Issue, Resolution, Customer Segment, and Policy. Having a defined vocabulary prevents the LLM from inventing inconsistent categories.
Similarly, define 10 to 20 relationship types. Common relationships include: depends_on, is_maintained_by, uses, is_part_of, is_configured_with, is_authored_by, relates_to, replaces, and causes. More specific predicates produce a more useful graph, but too many predicates make the extraction inconsistent. Start narrow and expand based on what you see in the extracted data.
The prompt should include your entity types, your relationship vocabulary, an instruction to return structured JSON, and one or two examples showing the expected output format. Few-shot examples dramatically improve extraction consistency. Use Claude's tool use or structured output features when available, as they enforce the output schema and reduce JSON parsing failures.
EXTRACTION_PROMPT = """Analyze the following text and extract:
1. ENTITIES: Things mentioned in the text. For each entity provide:
- name: canonical name (use full names, not abbreviations)
- type: one of [Service, Database, Person, Team, Technology, API, Concept]
- aliases: other names used for this entity in the text
2. RELATIONSHIPS: How entities relate to each other. For each:
- subject: entity name (must match an extracted entity)
- predicate: one of [depends_on, uses, is_maintained_by, is_part_of,
is_configured_with, relates_to, replaces, causes]
- object: entity name (must match an extracted entity)
- evidence: the phrase from the text that supports this relationship
Return valid JSON with keys "entities" and "relationships".
Example output:
{
"entities": [
{"name": "Order Service", "type": "Service", "aliases": ["order-svc"]},
{"name": "PostgreSQL", "type": "Database", "aliases": ["Postgres"]}
],
"relationships": [
{"subject": "Order Service", "predicate": "uses",
"object": "PostgreSQL",
"evidence": "The order service stores all records in PostgreSQL"}
]
}
Text to analyze:
{text}"""Split documents into chunks of 500 to 1,000 tokens with 100 to 200 token overlap. The overlap ensures that relationships spanning chunk boundaries are captured in at least one chunk. Process each chunk independently through the extraction prompt. Track which chunk each entity and relationship came from for provenance and validation.
from anthropic import Anthropic
import json
client = Anthropic()
def extract_from_chunk(chunk, chunk_id):
prompt = EXTRACTION_PROMPT.replace("{text}", chunk)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4000,
messages=[{"role": "user", "content": prompt}]
)
try:
data = json.loads(response.content[0].text)
except json.JSONDecodeError:
# retry with explicit JSON instruction
return {"entities": [], "relationships": []}
# tag with source chunk
for e in data.get("entities", []):
e["source_chunk"] = chunk_id
for r in data.get("relationships", []):
r["source_chunk"] = chunk_id
return dataLLMs occasionally produce malformed JSON, entity types not in your vocabulary, or relationships referencing entities not in the extraction. Validate each extraction result against your schema. Reject entities with unknown types (or map them to the closest known type). Reject relationships where the subject or object does not match an extracted entity. Log rejections for prompt refinement.
VALID_TYPES = {"Service", "Database", "Person", "Team",
"Technology", "API", "Concept"}
VALID_PREDICATES = {"depends_on", "uses", "is_maintained_by",
"is_part_of", "is_configured_with",
"relates_to", "replaces", "causes"}
def validate(data):
valid_entities = []
entity_names = set()
for e in data.get("entities", []):
if e.get("type") in VALID_TYPES:
valid_entities.append(e)
entity_names.add(e["name"])
valid_rels = []
for r in data.get("relationships", []):
if (r.get("predicate") in VALID_PREDICATES
and r.get("subject") in entity_names
and r.get("object") in entity_names):
valid_rels.append(r)
return {"entities": valid_entities, "relationships": valid_rels}After extracting from all chunks, merge entities that refer to the same thing. Use a three-tier approach: exact match after case normalization, alias matching (check if one entity's name appears in another's alias list), and string similarity (Levenshtein or Jaro-Winkler distance above 0.85). For ambiguous cases, batch the candidates and ask the LLM whether they refer to the same entity. Update all relationship references to use the canonical entity name after merging.
Randomly sample 50 chunks and their extractions. For each, check: were all entities in the text identified (recall)? Are all extracted entities actually present in the text (precision)? Are relationships correctly identified and typed? Track precision and recall separately for entities and relationships. Aim for 85%+ entity precision and 75%+ relationship precision on the first iteration. Refine your prompt based on the specific error patterns you find.
Cost Optimization
LLM-based extraction costs add up at scale. Processing 10,000 chunks at $0.003 per 1,000 input tokens with 800-token chunks costs roughly $24 for input tokens plus output costs. Three strategies reduce this: use a smaller model (Claude Haiku or GPT-4o mini) for initial extraction and reserve the larger model for validation passes on low-confidence results. Batch multiple short chunks into a single prompt to reduce per-call overhead. Cache extraction results so re-processing a document only processes changed sections.
Adaptive Recall runs entity extraction automatically during memory storage, using an optimized extraction pipeline tuned for memory content. The cost is included in the memory storage operation, so you do not need to manage extraction infrastructure or optimize prompts for your specific domain.
Let the extraction happen automatically. Adaptive Recall extracts entities and builds relationship graphs as you store memories.
Try It Free