How to Chain Multiple Tool Calls in Sequence
Before You Start
You need a working function calling implementation with the tool execution loop described in the function calling guide. The core loop already supports tool chaining because it continues processing until the model produces a final text response. What this guide adds is handling for parallel calls within a chain, progress tracking for long workflows, and recovery strategies when a step fails mid-chain.
Step-by-Step Implementation
The basic tool loop is the foundation of all tool chaining. Send the user's message to the model along with tool definitions. If the model responds with tool calls instead of text, execute them and send the results back. Repeat until the model produces a text response. Each iteration of this loop is one "link" in the chain, and the model decides how many links are needed based on the complexity of the request.
The model naturally chains tools when the task requires it. Ask "What is the status of the most recent order for customer jane@example.com?" and the model will first call get_customer to find the customer ID, then call get_recent_orders with that customer ID to find the latest order, then call get_order_status with the order ID from the result. Each call depends on the previous result, so the model generates them one at a time, receiving each result before deciding on the next call.
When the model determines that multiple independent tool calls are needed, it emits them all in a single response. Your execution layer should detect multiple tool_use blocks, execute them concurrently, and return all results in a single tool_result message. This reduces total latency from the sum of all calls to the duration of the slowest call.
import asyncio
async def execute_tools_parallel(tool_calls):
async def run_one(call):
try:
func = tool_registry[call.name]
result = await asyncio.to_thread(func, **call.input)
return {"tool_use_id": call.id, "content": json.dumps(result)}
except Exception as e:
return {"tool_use_id": call.id, "content": json.dumps({"error": str(e)})}
tasks = [run_one(call) for call in tool_calls]
results = await asyncio.gather(*tasks)
return [{"type": "tool_result", **r} for r in results]A common real-world example: the user asks "Compare my last three orders." The model calls get_order_details three times in parallel, once for each order ID. Without parallel execution, this takes three round trips. With parallel execution, it takes one round trip at the latency of the slowest call. For tools that hit external APIs with 200 to 500 millisecond response times, this optimization saves seconds per interaction.
Sequential dependencies are handled naturally by the tool loop: the model sees the result of call N before generating call N+1. You do not need to build a dependency graph or plan the execution order. The model's reasoning handles this because it generates each tool call based on all the information it has seen so far, including previous tool results.
What you do need to ensure is that the full conversation history, including all previous tool calls and results, is passed to the model in every API call. If you truncate the history to save tokens and accidentally remove a tool result that a later call depends on, the model loses context and either re-calls the tool (wasting time and money) or hallucinates the missing data. For long chains, this means the message list grows with each step, so monitor token usage and consider summarizing early steps if the context window approaches its limit.
When a tool chain takes more than a few seconds, the user sees a blank or spinning interface and wonders what is happening. Stream status updates during execution so the user knows the agent is working and can see what progress has been made. Track each completed step and surface it to the user through streaming text or a progress indicator.
class ChainTracker:
def __init__(self):
self.steps = []
self.current_step = 0
def record_step(self, tool_name, status, summary):
self.steps.append({
"step": self.current_step,
"tool": tool_name,
"status": status,
"summary": summary
})
self.current_step += 1
def get_progress_summary(self):
completed = [s for s in self.steps if s["status"] == "success"]
failed = [s for s in self.steps if s["status"] == "error"]
return {
"completed": len(completed),
"failed": len(failed),
"steps": self.steps
}For user-facing progress, stream brief updates between tool calls: "Looking up your account... Found it. Checking recent orders... Found 3 orders. Getting details for the most recent..." This running commentary keeps the user engaged and informed, especially for chains that take 5 to 10 seconds or more.
When a tool fails partway through a chain, the model needs to know what succeeded before the failure so it can communicate partial progress to the user and suggest next steps. Include the chain progress in the error context so the model has the full picture.
Consider a three-step chain: look up customer (success), check their order (success), process a refund (failure, the refund service is down). The error message sent to the model should be: "Refund processing failed: the refund service is currently unavailable (HTTP 503). Previous steps completed successfully: found customer Jane Doe (ID: 12345), found order #A6789 (shipped, $47.99)." This gives the model everything it needs to tell the user: "I found your order #A6789 for $47.99, but the refund system is temporarily unavailable. I can try again in a few minutes, or you can call our support line at 1-800-EXAMPLE to process the refund immediately."
Never silently swallow mid-chain errors. A common mistake is catching the exception at the execution layer and returning a generic "error occurred" message that loses the context of what was accomplished before the failure. Preserve and communicate the full chain state, both successes and failures, so the model can give the user an accurate and actionable response.
Chain Length Limits
Set a maximum chain length (typically 10 to 25 iterations) to prevent runaway loops. If the model repeatedly calls tools without converging on a final answer, a hard limit stops the loop and prompts the model to summarize what it has learned so far and explain what it could not complete. Without this limit, a confused model can loop indefinitely, burning tokens and time while the user waits.
The right limit depends on your use case. Simple assistants with focused tool sets rarely need more than 5 iterations. Complex agents that orchestrate multi-system workflows may legitimately need 15 to 20 iterations for their most involved tasks. Set the limit based on the longest legitimate chain you observe in testing, then add a small buffer.
Chain tools with memory that persists across sessions. Adaptive Recall stores tool outcomes so your agent remembers past chains and avoids repeating failed approaches.
Get Started Free