How to Build a Chatbot That Remembers Users
Before You Start
You need a working chatbot that can hold a basic conversation, whether built with a framework like LangChain, a platform like Botpress, or direct LLM API calls. You also need a persistent storage layer for memories. This can be a vector database (Pinecone, Weaviate, Qdrant), a managed memory service like Adaptive Recall, or even PostgreSQL with pgvector for smaller applications. The critical requirement is that your storage supports both text search and user-scoped queries, because you need to retrieve memories for a specific user without mixing in other users' data.
You also need a user identification strategy. Memories are only useful if you can associate them with the right user when they return. This typically means authentication (login, API key) or a persistent identifier (cookie, device ID, session token). Anonymous chatbots can use memory within a session but cannot provide cross-session continuity without some form of user identification.
Step-by-Step Implementation
Before writing code, decide what your chatbot should remember. Not every piece of conversation is worth storing. Define categories of memorable information based on your chatbot's purpose. A customer support chatbot should remember: the user's product or plan, previous issues and their resolutions, stated preferences (communication style, technical level, timezone), and unresolved problems. A personal assistant should remember: the user's role and responsibilities, active projects and deadlines, recurring tasks, and stated preferences. A coding assistant should remember: the user's tech stack, coding conventions, architectural decisions, and known bugs or workarounds. For each category, define the expected format: free text, structured key-value pairs, or entities with relationships. Structured memories are easier to retrieve accurately but harder to extract automatically. Start with free text memories and add structure as you identify patterns.
The extraction layer processes conversation turns and identifies information worth storing as long-term memories. The simplest approach uses an LLM call at the end of each conversation (or every N turns) with a prompt like: "Review this conversation and extract discrete facts about the user that would be useful to remember in future conversations. Output each fact as a separate item with a category label." This approach costs one additional LLM call per extraction cycle but produces good results because the model can understand context, resolve references, and distinguish between important facts and conversational filler. More sophisticated extraction runs continuously, processing each turn as it arrives and deciding in real time whether it contains memorable content. This is more responsive but requires careful deduplication to avoid storing the same fact multiple times as the user reinforces it across turns.
import anthropic
client = anthropic.Anthropic()
def extract_memories(conversation_messages, user_id):
extraction_prompt = """Review this conversation and extract discrete facts
about the user that would be useful to remember in future conversations.
For each fact, output a JSON object with:
- "content": the fact in a clear, standalone sentence
- "category": one of "preference", "project", "decision", "personal", "technical"
- "entities": list of key entities mentioned
Only extract facts that would be useful across sessions. Skip greetings,
acknowledgments, and conversational filler.
Output a JSON array of extracted facts."""
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2000,
messages=[
{"role": "user", "content": extraction_prompt + "\n\n" +
format_conversation(conversation_messages)}
]
)
return parse_extracted_facts(response.content[0].text)Store extracted memories in a system that supports semantic search with user-scoped filtering. Each memory should be stored with: the memory text, the user ID it belongs to, an embedding vector (generated from the memory text), extracted entities, the category label, a timestamp, and a confidence score (initially 1.0, adjusted over time as the memory is corroborated or contradicted). If you are using Adaptive Recall, the store tool handles all of this automatically, generating embeddings, extracting entities, building knowledge graph connections, and initializing cognitive scores. If you are building from scratch, you need to generate embeddings (using an embedding model like text-embedding-3-small), store them in a vector database with metadata filtering, and build your own entity extraction pipeline.
# Using Adaptive Recall's MCP tools
async def store_memory(memory_text, user_id):
result = await mcp_client.call_tool("adaptive-recall", "store", {
"content": memory_text,
"metadata": {"user_id": user_id}
})
return result
# Or using direct vector storage
async def store_memory_manual(memory_text, user_id, category, entities):
embedding = await get_embedding(memory_text)
await vector_db.upsert({
"id": generate_id(),
"values": embedding,
"metadata": {
"text": memory_text,
"user_id": user_id,
"category": category,
"entities": entities,
"created_at": datetime.now().isoformat(),
"confidence": 1.0
}
})When a user sends a message, query the memory store for relevant memories before calling the LLM. The recall query should use the current message as the semantic search input, filtered to the current user's memories only. Retrieve 5 to 15 memories (depending on your context budget), format them as a "Known context about this user" section, and insert them into the system prompt or as a separate context block. Position the recalled memories after the system prompt instructions but before the conversation history, so the model treats them as established context rather than conversation content. Include the confidence score or a recency indicator so the model can weight uncertain or outdated memories appropriately.
async def assemble_context(user_message, user_id, system_prompt, history):
memories = await recall_memories(user_message, user_id, limit=10)
memory_context = "## Known context about this user\n"
for mem in memories:
memory_context += f"- {mem['text']}\n"
messages = [
{"role": "user", "content": system_prompt + "\n\n" + memory_context},
]
messages.extend(history[-10:]) # last 10 turns
messages.append({"role": "user", "content": user_message})
return messagesMemories are not static. Users change jobs, switch projects, update preferences, and correct previous statements. Your chatbot needs to handle memory updates gracefully. When the extraction layer detects information that contradicts an existing memory (the user now says they use Python, but a stored memory says JavaScript), it should update the existing memory rather than creating a duplicate. When a user explicitly corrects the chatbot ("No, my name is Sarah, not Sara"), the system should update the memory immediately and acknowledge the correction. Implement a forget mechanism for when users request deletion of their data, and a decay mechanism that gradually reduces the confidence of memories that have not been accessed or corroborated recently. Adaptive Recall handles all of these operations through its update, forget, and consolidation tools with built-in contradiction detection.
Testing memory-backed chatbots requires multi-session test scenarios that verify the system remembers correctly across time gaps. Create test scripts that simulate: a user establishing preferences in session 1, then returning in session 2 and verifying the chatbot uses those preferences without being asked. A user correcting information in session 1, then returning in session 2 and verifying the correction persisted. A user discussing topic A in session 1, topic B in session 2, then asking about topic A in session 3 and verifying the chatbot recalls session 1 context correctly. A user who has not interacted in 30 days returning and verifying the chatbot still has relevant memories but does not over-rely on potentially outdated information. Run these tests with real LLM calls rather than mocked responses, because the model's behavior with recalled memories is difficult to predict without actual generation.
Common Pitfalls
The most common mistake is storing too much. If every sentence in every conversation becomes a memory, the recall results become noisy and the chatbot starts surfacing irrelevant context that confuses the model. Be selective: only store information that would be genuinely useful in a future conversation. A good heuristic is to ask "would a human assistant bother writing this down?" If the answer is no, do not store it.
The second most common mistake is poor recall ranking. Retrieving the 10 most semantically similar memories often surfaces old, outdated, or tangentially related content. Cognitive scoring that incorporates recency, access frequency, and confidence produces dramatically better recall results than pure vector similarity. A memory that was stored yesterday and has been accessed three times should rank higher than a semantically similar memory from six months ago that has never been retrieved.
The third mistake is not handling contradictions. Users update their preferences, change their minds, and correct previous statements. A memory system that only appends new memories without checking for contradictions will accumulate conflicting information that confuses the model. Build contradiction detection into your extraction pipeline, or use a memory system that handles this automatically.
Build a chatbot that remembers. Adaptive Recall handles extraction, storage, recall, and lifecycle management through seven MCP tools, so you can add persistent memory to any chatbot in minutes.
Get Started Free