Subject-Predicate-Object Extraction Explained
What a Triple Represents
Every triple answers a question of the form "how does entity A relate to entity B?" The subject is the entity the statement is about. The predicate is the relationship type. The object is the entity or value the subject is related to. Together, they form the smallest complete statement of fact that a knowledge graph can store.
Examples from a software engineering domain:
Subject Predicate Object
checkout-service depends_on PostgreSQL
checkout-service communicates_with Stripe API
platform-team maintains checkout-service
PostgreSQL deployed_on AWS RDS
checkout-service uses React 18
Sarah Chen leads platform-teamThese six triples encode a small but useful knowledge graph. From "checkout-service" you can traverse to find its database (PostgreSQL), its payment provider (Stripe API), the team that maintains it (platform-team), and the team's leader (Sarah Chen). None of these connections require text similarity. The graph stores them as explicit, traversable edges.
Triple Structure in Detail
The triple format comes from the Resource Description Framework (RDF), a W3C standard for representing structured data on the web. RDF uses URIs to identify subjects, predicates, and objects, which ensures global uniqueness. In practice, most AI applications use simplified triples with string identifiers rather than full URIs, because the knowledge graph operates within a single application rather than across the web.
Subject: Always an entity. Never a literal value, abstract concept without a name, or incomplete reference. The subject is the thing being described. In a well-constructed graph, every subject corresponds to a node in the graph.
Predicate: The relationship type. Predicates should be drawn from a controlled vocabulary so that traversal queries work consistently. "depends_on," "maintained_by," "stores_data_in" are good predicates. "is related to" is too vague to be useful for traversal. The predicate determines the semantics of the edge.
Object: Either an entity (creating an edge to another node) or a literal value (a string, number, or date that describes a property of the subject). "checkout-service depends_on PostgreSQL" has an entity object. "PostgreSQL version_is 15.4" has a literal object. Entity objects create traversable connections. Literal objects store attributes.
Extracting Triples from Text
There are three approaches to triple extraction, each suited to different situations.
LLM-Based Extraction
The most flexible approach. Give the LLM a text passage and ask it to identify all relationships between entities as triples. The LLM uses its understanding of language to parse complex sentences, resolve coreferences, and identify implied relationships. This works for any domain and produces typed predicates that match your vocabulary when you include it in the prompt.
The trade-off is cost and throughput. Each passage requires an LLM API call costing $0.003 to $0.015, and processing takes 1 to 3 seconds. For batch processing of large document sets, the cost adds up. For real-time extraction during memory storage, the latency is usually acceptable because users expect a brief processing delay.
Dependency Parse Patterns
For text where relationships follow consistent grammatical patterns, you can use dependency parsing to extract triples without an LLM. Parse the sentence, find the subject and object of each verb, and map the verb to a predicate type. "The checkout service uses PostgreSQL" has a clear subject (checkout service), verb (uses), and object (PostgreSQL) that map directly to a triple.
This approach is fast and free (SpaCy's dependency parser runs locally) but brittle. It fails on complex sentences, passive voice, relative clauses, and any sentence structure that deviates from simple subject-verb-object. It also cannot resolve coreferences or extract implicit relationships. Use it as a supplementary method, not a primary extraction approach.
Co-occurrence Based
The simplest approach: if two entities appear in the same sentence or paragraph, they are related. This produces a connected graph with minimal effort but the relationships are untyped (you know the entities co-occur but not how they relate). Co-occurrence graphs are useful for initial exploration and for measuring entity connectivity, but they lack the semantic precision needed for reliable traversal in production systems.
Predicate Design
The quality of your predicate vocabulary determines the quality of your graph traversal. Too few predicates and the graph loses semantic precision ("relates_to" tells you nothing useful). Too many predicates and the graph becomes sparse (traversal misses connections because the same semantic relationship is split across multiple predicate types).
Start with 10 to 15 predicates for a new domain. Group them into categories: structural (depends_on, part_of, uses), organizational (maintained_by, owned_by, created_by), informational (documented_in, described_in), and temporal (replaced_by, deprecated_since). Each predicate should represent a distinct, useful relationship that supports queries your users actually ask.
Predicate directionality matters. "checkout-service depends_on PostgreSQL" and "PostgreSQL depended_on_by checkout-service" express the same relationship but from different directions. Pick one canonical direction for each predicate and be consistent. The convention is to use the direction that reads most naturally as a sentence: subject predicate object should form a readable statement.
Reification: Metadata About Triples
Sometimes you need to say something about a triple itself, not just the entities it connects. "As of January 2026, checkout-service uses PostgreSQL with confidence 0.9, based on the architecture document" adds temporal scope, confidence, and provenance to the triple. This is reification: treating a statement as an entity that can have its own properties.
In practice, reification is implemented by attaching metadata properties to the edge rather than to a separate statement entity. Graph databases (Neo4j) support edge properties natively. Relational triple stores add metadata columns to the triples table. The essential metadata fields are confidence (how sure the extraction system is), source (which document the triple came from), and timestamp (when the triple was created or last validated).
Adaptive Recall stores triples with full reification metadata. Each entity relationship carries a confidence score that evolves through the memory consolidation process, evidence links back to the source memories, and temporal markers that indicate when the relationship was established and last confirmed. This metadata enables the system to prioritize well-supported connections during graph traversal and to identify stale relationships that need re-validation.
Adaptive Recall extracts subject-predicate-object triples automatically from every memory you store. The knowledge graph grows organically, and spreading activation uses the connections for smarter retrieval.
Get Started Free