What Is Function Calling and How LLMs Use Tools
The Core Problem Function Calling Solves
Language models are trained on text and produce text. They cannot access live databases, check real-time inventory, look up a customer's account, send an email, or interact with any system outside their training data. Before function calling, the only way to give a model access to external data was to stuff everything into the prompt, a brute-force approach that is expensive, limited by the context window, and impossible for dynamic data that changes between requests.
Function calling solves this by giving the model a structured mechanism to request specific data or actions from external systems. Instead of trying to memorize every customer's order status (impossible) or having every order loaded into the prompt (impractical), the model can call a get_order_status function with a specific order ID and receive the current status in real time. This pattern extends to any external capability: search, computation, communication, data manipulation, monitoring, or system administration.
How the Mechanism Works
Function calling involves four participants: the developer (who defines the tools), the model (which decides when and how to use them), the application (which executes the calls), and the external system (which provides the actual data or performs the action).
The developer creates tool definitions that describe each available function. A tool definition includes a name (search_products), a description ("Searches the product catalog by keyword, category, or price range"), and a parameter schema (a JSON Schema object describing the function's inputs). These definitions are passed to the model as part of the API request, alongside the system prompt and user messages. The model reads the tool definitions and uses them to decide when to call tools and how to construct the arguments.
When the user asks a question that requires external data, the model recognizes the need and generates a tool call instead of (or alongside) text. The tool call is a structured output containing the tool name and a JSON object of arguments. This is not the model "running" the function; it is the model requesting that the application run the function. The model has no ability to execute code or access external systems directly. It can only produce a structured description of what it wants done.
The application receives the tool call, validates the arguments, executes the corresponding function, and captures the result. The result is then sent back to the model as a tool result message, a new message in the conversation that contains the function's output. The model reads this result and generates its next response, which incorporates the data from the function. This response might be a final answer to the user, or it might be another tool call (starting another cycle of the loop) if the model needs additional information to complete its task.
Before Native Function Calling
Before model providers built function calling into their APIs, developers used prompt engineering to achieve similar results. The approach typically involved instructing the model in the system prompt to "output a JSON block when you need to call a function," then parsing the model's text output to extract the JSON, running the function, and appending the result to the conversation. This worked, but it was fragile. The model might format the JSON incorrectly, include extra text around it, forget to use it when appropriate, or hallucinate function names that did not exist.
Native function calling eliminated these problems by making tool use a first-class feature of the model API. The model receives tool definitions in a dedicated format, generates tool calls as a distinct output type (not embedded in text), and the API provides clear signals about when a tool call was generated (through stop reason codes or content block types). This structural separation makes tool calls reliable enough for production use, with first-attempt accuracy rates above 95% for well-designed schemas.
How Models Decide When to Call Tools
The model's decision to call a tool is based on the same reasoning process it uses for all generation: it evaluates the user's message, the system instructions, the available tools, and the conversation history, and produces the output (text or tool call) that best satisfies the request. The model does not follow explicit "if user says X, call tool Y" rules. Instead, it reasons about whether a tool call would help produce a better response.
If the user asks "What is machine learning?" the model can answer from its training data without calling any tools. If the user asks "What is the status of my order #A1234?" the model recognizes that it does not have this information and that a tool (get_order_status) could provide it. If the user asks "Summarize the weather forecast for New York and suggest what to wear," the model might call a weather tool, receive the forecast data, and then use its reasoning abilities to generate outfit suggestions based on the conditions.
The model uses tool descriptions to make selection decisions. When multiple tools are available, the model reads each tool's description to determine which one is most relevant to the current request. Clear, detailed descriptions that explain when to use each tool produce better selection decisions than vague descriptions that leave the model guessing. This is why schema design is one of the highest-leverage activities in building tool-using agents.
Single-Turn vs. Multi-Turn Tool Use
In single-turn tool use, the model calls one tool, receives the result, and generates a final response. This covers simple use cases like data lookups, calculations, and single-action operations. The user asks a question, the model calls one function, and the answer comes back. Most interactions fall into this category.
In multi-turn tool use, the model calls multiple tools across several iterations of the execution loop. The first call might retrieve data that informs the second call, which produces results that trigger a third call. This enables complex workflows: look up a customer, check their subscription tier, verify they are eligible for a discount, apply the discount, and confirm the change. The model chains these calls naturally, using each result to inform its next decision.
Parallel tool use is a special case where the model generates multiple independent tool calls in a single response. When the model determines that it needs data from several sources and those lookups are independent of each other (checking weather in three different cities, fetching details for three different orders), it can request all of them at once. The application executes them in parallel and returns all results together, reducing total latency to the duration of the slowest call instead of the sum of all calls.
Function Calling and Memory
Function calling becomes significantly more powerful when combined with persistent memory. Without memory, every conversation starts from scratch: the model has no knowledge of previous tool calls, past results, or user-specific patterns. With memory, the model can recall what tools it has used before for this user, what results they produced, and what strategies worked or failed. This transforms tool use from stateless request-response into a learning system that improves with every interaction.
Adaptive Recall provides this memory layer through its seven-tool MCP interface. After each tool call, the agent stores an observation about the outcome. In future interactions, the recall tool retrieves relevant past outcomes, and the knowledge graph connects tool results to the entities they involve. The cognitive scoring ensures that recent, frequently accessed, and well-corroborated tool memories surface first, so the agent's tool knowledge stays current and relevant.
Give your tools a memory that learns. Adaptive Recall stores tool outcomes, builds entity connections, and improves retrieval quality through cognitive scoring.
Get Started Free