Home » AI Tool Use » Function Calling Compared

Function Calling: OpenAI vs Anthropic vs Google

All three major model providers support function calling, but their implementations differ in important ways. Anthropic's Claude uses a content-block model where tool calls appear as structured blocks alongside text. OpenAI's GPT-4 uses a separate tool_calls array in the response. Google's Gemini uses function declarations with a slightly different schema format. These differences affect how you structure your code, handle responses, and support multi-provider deployments.

Schema Definition Differences

The biggest practical difference between providers is how you define tool schemas. All three use JSON Schema under the hood, but the wrapping format and field names differ.

Anthropic (Claude) uses a tools array at the top level of the API request. Each tool has name, description, and input_schema fields. The input_schema is standard JSON Schema. Claude processes tool definitions as part of the model's context, which means tool definitions consume tokens from the context window. This is important for agents with many tools because the total token cost of definitions can be significant.

# Anthropic Claude tool definition
{
    "name": "get_weather",
    "description": "Returns current weather for a city.",
    "input_schema": {
        "type": "object",
        "properties": {
            "city": {"type": "string", "description": "City name"},
            "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
        },
        "required": ["city"]
    }
}

OpenAI (GPT-4) uses a tools array where each entry has a type field (always "function") and a function object containing name, description, and parameters. The parameters field is JSON Schema, same as Claude's input_schema but with a different field name. OpenAI also supports strict mode, which forces the model to produce outputs that exactly conform to the schema, eliminating a class of type and format errors at the cost of slightly more constrained model behavior.

# OpenAI GPT-4 tool definition
{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Returns current weather for a city.",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"},
                "units": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["city"]
        },
        "strict": true
    }
}

Google (Gemini) uses function_declarations within a tools array. Each declaration has name, description, and parameters. Google uses its own OpenAPI-based schema format rather than pure JSON Schema, which means some JSON Schema features (like complex $ref patterns, oneOf, or allOf) are not supported. For straightforward schemas, the difference is minimal, but for complex parameter structures, you may need to simplify your schemas for Gemini compatibility.

Response Format Differences

How each provider returns tool calls in the response is the area where code changes are most significant when switching providers.

Claude returns tool calls as content blocks within the response's content array. A single response can contain both text blocks and tool_use blocks, intermixed in any order. The stop_reason field is "tool_use" when the model wants you to execute tools before continuing. Each tool_use block has an id, name, and input. You return results using the same id in a tool_result content block.

GPT-4 returns tool calls in a separate tool_calls array on the assistant message, alongside (not mixed with) the text content. The finish_reason is "tool_calls" when the model wants tool execution. Each call has an id, function.name, and function.arguments (as a JSON string that you must parse). Results are returned as separate messages with role "tool" and the matching tool_call_id.

Gemini returns tool calls as function_call parts within the response's parts array. The structure nests the call inside a part object, with function_call.name and function_call.args. Results are returned as function_response parts. The nesting is slightly deeper than Claude's or GPT-4's approaches, requiring more traversal in your parsing code.

Parallel Tool Use

All three providers support generating multiple tool calls in a single response, but they handle it differently. Claude emits multiple tool_use content blocks in a single response, and you return multiple tool_result blocks in a single user message. GPT-4 emits multiple entries in the tool_calls array, and you return each result as a separate tool-role message. Gemini emits multiple function_call parts and expects multiple function_response parts.

The practical impact is on your execution layer. With Claude, you iterate over content blocks, collect all tool_use blocks, execute them (ideally in parallel), and return all results in a single message. With GPT-4, the same pattern applies but with different data structures. The logic is the same, but the code is different enough that you need provider-specific handling.

Streaming Behavior

Streaming tool calls is where provider differences cause the most implementation complexity. With streaming, the model's response arrives as a sequence of events rather than a complete response, and tool calls are built up incrementally.

Claude streams tool_use blocks as content_block_start, content_block_delta (containing partial JSON), and content_block_stop events. You need to accumulate the JSON deltas and parse the complete call when the block stops. Text blocks and tool blocks can be interleaved in the stream.

GPT-4 streams tool calls through delta objects that contain incremental argument strings. You accumulate the argument string until the stream completes, then parse the full JSON. GPT-4 sends a dedicated "tool_calls" key in the delta, making it straightforward to detect and accumulate.

Gemini's streaming for function calls is simpler because Gemini typically sends the entire function_call part in a single chunk rather than streaming it incrementally. This makes parsing easier but means you do not get partial tool call information while the model is still generating.

Forcing Tool Use

Sometimes you want to require the model to call a specific tool rather than letting it decide. All three providers support this but with different mechanisms. Claude supports tool_choice with values "auto" (model decides), "any" (must use a tool, model picks which one), or a specific tool name (must use this exact tool). GPT-4 uses the same tool_choice parameter with similar options: "auto", "required", or a specific function specification. Gemini uses tool_config.function_calling_config.mode with values "AUTO", "ANY", or "NONE", plus an allowed_function_names list to restrict which tools can be called.

Practical Recommendations

If you are building for a single provider, use that provider's native format and take advantage of provider-specific features like OpenAI's strict mode or Claude's content block flexibility. If you need to support multiple providers, build an abstraction layer that translates a common tool definition format into each provider's specific format. The schema differences are mechanical and can be handled with simple mapping functions. The response format differences are the harder part because the data structures are meaningfully different.

For tool-heavy agents where function calling reliability is critical, Claude and GPT-4 both achieve above 95% first-attempt accuracy on well-designed schemas. The choice between them often comes down to other factors: Claude's longer context windows for agents that need extensive tool definitions, GPT-4's strict mode for schemas where argument format compliance is critical, or Gemini's integration with Google Cloud services for agents in that ecosystem.

Regardless of provider, the quality of your tool schemas matters more than the choice of model. A poorly designed schema produces bad tool calls on any model. A well-designed schema produces reliable tool calls across providers. Invest in schema design before optimizing for provider-specific features.

Build tool-using agents that work with any provider. Adaptive Recall integrates through MCP and REST, providing persistent memory for tool outcomes regardless of which model or API you use.

Get Started Free