Home » AI Tool Use » Build a Tool Router

How to Build a Tool Router for AI

A tool router selects which tools to present to the model for each user query, solving the problem of large tool sets that overwhelm the context window and confuse tool selection. Instead of passing all 50 tools to every API call, the router analyzes the user's message and selects the 5 to 10 most relevant tools, reducing token usage, improving selection accuracy, and enabling your agent to scale to hundreds of tools without degrading performance.

Before You Start

Tool routing becomes necessary when your agent has more than 10 to 15 tools. Below that threshold, passing all tools to the model works well because the total token cost is modest and the model can distinguish between them reliably. If your agent has fewer than 10 tools and selection accuracy is above 90%, routing adds complexity without meaningful benefit. Read the tool selection problem article to understand when routing becomes necessary.

You need a working function calling implementation and a set of tool schemas. You also need logs of real user queries (or realistic test queries) to evaluate whether your routing layer selects the right tools. Without evaluation data, you cannot tell whether routing is helping or hurting.

Step-by-Step Implementation

Step 1: Categorize your tools by domain.
Group tools into logical categories that reflect the systems or task types they handle. An enterprise assistant might have categories like CRM (get_customer, update_customer, search_customers), orders (get_order, create_order, cancel_order), support (create_ticket, search_tickets, escalate), and analytics (run_report, get_metrics, export_data). Each category represents a coherent domain that a user query would typically address in a single interaction.

Good categories are mutually exclusive at the intent level: a user asking about an order is unlikely to need the analytics tools, and a user requesting a report is unlikely to need the ticket creation tools. Overlapping categories cause the router to select too many tools, defeating the purpose of routing. If two categories frequently co-occur in real usage, consider merging them.

Step 2: Implement keyword-based routing as a baseline.
The simplest routing strategy maps keywords in the user's message to tool categories. Build a keyword dictionary where each category has a list of trigger words and phrases: the orders category triggers on "order", "shipment", "delivery", "tracking"; the support category triggers on "ticket", "issue", "problem", "help", "complaint". When the user's message contains keywords matching a category, include all tools from that category.
CATEGORY_KEYWORDS = { "orders": ["order", "shipment", "delivery", "tracking", "shipping", "package"], "support": ["ticket", "issue", "problem", "help", "complaint", "bug", "error"], "crm": ["customer", "account", "profile", "subscription", "user", "contact"], "analytics": ["report", "metrics", "dashboard", "data", "chart", "stats"] } def keyword_route(message): message_lower = message.lower() matched = set() for category, keywords in CATEGORY_KEYWORDS.items(): if any(kw in message_lower for kw in keywords): matched.add(category) if not matched: matched = set(CATEGORY_KEYWORDS.keys()) # fallback: include all return matched

Keyword routing is fast, deterministic, and easy to debug. Its weakness is that it misses semantic matches: "Why was I charged twice?" should route to the orders and billing tools, but the keywords "charged" and "twice" might not appear in the keyword dictionary. Use keyword routing as a baseline that handles the obvious cases, then layer smarter strategies on top for the ambiguous ones.

Step 3: Add embedding-based similarity matching.
Embed each tool's description into a vector and store them in a lightweight vector index. At query time, embed the user's message and find the top N most similar tool descriptions. This catches semantic matches that keyword routing misses: "Why was I charged twice?" embeds close to the description of a billing dispute tool even though there is no keyword overlap.
from openai import OpenAI import numpy as np client = OpenAI() # Pre-compute tool embeddings (do once at startup) tool_embeddings = {} for tool in all_tools: resp = client.embeddings.create( model="text-embedding-3-small", input=f"{tool['name']}: {tool['description']}" ) tool_embeddings[tool['name']] = resp.data[0].embedding def embedding_route(message, top_k=8): resp = client.embeddings.create( model="text-embedding-3-small", input=message ) query_vec = np.array(resp.data[0].embedding) scores = {} for name, emb in tool_embeddings.items(): scores[name] = np.dot(query_vec, np.array(emb)) sorted_tools = sorted(scores.items(), key=lambda x: x[1], reverse=True) return [name for name, score in sorted_tools[:top_k]]
Step 4: Build a classifier for intent-based routing.
For the highest-accuracy routing, train or prompt a lightweight classifier that determines the user's intent and maps it to tool categories. You can use a small fine-tuned model, a few-shot prompted call to a fast model like Haiku, or a traditional ML classifier trained on labeled query data. The classifier approach handles the ambiguous cases that both keyword and embedding routing struggle with.

A prompted classifier using a fast, inexpensive model adds about 100 to 200 milliseconds of latency and costs a fraction of a cent per query. For agents handling complex, ambiguous queries across many tool categories, this small overhead pays for itself many times over through improved tool selection accuracy and reduced retry rates.

def classifier_route(message): response = client.messages.create( model="claude-haiku-4-5-20251001", max_tokens=100, system="Classify the user message into one or more tool categories. Return only the category names, comma-separated. Categories: orders, support, crm, analytics, billing, content.", messages=[{"role": "user", "content": message}] ) categories = [c.strip() for c in response.content[0].text.split(",")] return categories
Step 5: Combine strategies with a scoring layer.
Each routing strategy has different strengths: keywords catch explicit mentions, embeddings catch semantic similarity, classifiers catch intent. Combine them by scoring each tool across all strategies and selecting the tools with the highest combined scores. Weight the strategies based on your observed accuracy: if the classifier outperforms embeddings on your query distribution, give it higher weight.
def combined_route(message, top_k=8): keyword_cats = keyword_route(message) embedding_tools = embedding_route(message, top_k=15) classifier_cats = classifier_route(message) scores = {} for tool in all_tools: score = 0 if tool['category'] in keyword_cats: score += 1.0 if tool['name'] in embedding_tools: rank = embedding_tools.index(tool['name']) score += (15 - rank) / 15.0 if tool['category'] in classifier_cats: score += 2.0 # classifier gets highest weight scores[tool['name']] = score sorted_tools = sorted(scores.items(), key=lambda x: x[1], reverse=True) return [name for name, score in sorted_tools[:top_k]]
Step 6: Add memory-based routing for returning users.
Users develop patterns: a support agent repeatedly uses the same three tools for most queries, a developer always starts by searching the knowledge base. Memory-based routing boosts tools that have been successfully used in recent interactions with the same user or for similar query types. Adaptive Recall's cognitive scoring naturally supports this: tool usage memories receive activation boosts from recency and frequency, so frequently used tools surface higher in recall results when the routing layer queries for relevant tool history.

To implement memory-based routing, store a brief observation after each successful tool interaction: "Used get_order_status to check shipping for customer X. Returned shipped with FedEx tracking." Before routing a new query, recall recent tool usage memories for the current user. Boost the scores of tools that appear in recent successful interactions. This creates a feedback loop where the router learns user-specific patterns and pre-selects the tools most likely to be needed.

Evaluating Your Router

Measure routing quality with three metrics. Coverage: what percentage of queries result in the correct tool being included in the routed set? If coverage is below 95%, the router is filtering out tools the model needs. Precision: what percentage of routed tools are actually used by the model? Low precision means the router is including too many irrelevant tools. Latency: how much time does the routing step add to total response time? Routing should add less than 300 milliseconds; if it adds more, simplify the strategy.

Build a tool router that learns from usage. Adaptive Recall tracks tool outcomes and usage patterns through cognitive scoring, enabling memory-powered routing that improves with every interaction.

Get Started Free