Home » MCP Servers » Deploy Remote

How to Deploy a Remote MCP Server in Production

Deploying an MCP server remotely means switching from stdio to HTTP transport, adding authentication, containerizing the service, and setting up monitoring. The server becomes a standard HTTP service that any MCP client can connect to over the network, supporting multiple users, persistent state, and independent scaling.

Before You Start

You need a working MCP server that runs locally with stdio transport. This guide covers converting it to a production HTTP service. You also need a hosting environment (any platform that runs Docker containers: AWS ECS, Google Cloud Run, Fly.io, Railway, or a plain VM with Docker) and a domain name or static IP for the server URL.

Step-by-Step Deployment

Step 1: Switch to HTTP transport.
Replace the stdio transport with Streamable HTTP. The tool, resource, and prompt definitions stay identical. Only the server startup code changes. For Python servers using FastMCP, this is a single parameter change. For TypeScript servers, swap the transport class.

Python (FastMCP):

from mcp.server.fastmcp import FastMCP

mcp = FastMCP("my-tools")

# ... all tool definitions unchanged ...

if __name__ == "__main__":
    mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)

TypeScript:

import express from "express";
import { StreamableHTTPServerTransport } from
  "@modelcontextprotocol/sdk/server/streamableHttp.js";

const app = express();
app.use(express.json());

app.post("/mcp", async (req, res) => {
  const transport = new StreamableHTTPServerTransport("/mcp");
  await server.connect(transport);
  await transport.handleRequest(req, res);
});

app.listen(8080);

Step 2: Add authentication.
Every production MCP server needs authentication. The simplest approach is API key validation: check the Authorization header on each request and reject requests without a valid key. For multi-user or enterprise deployments, implement OAuth 2.1 instead.

import os

VALID_KEYS = set(os.environ.get("API_KEYS", "").split(","))

def validate_request(headers):
    auth = headers.get("Authorization", "")
    if not auth.startswith("Bearer "):
        return False
    token = auth[7:]
    return token in VALID_KEYS

Integrate validation into your HTTP handler so every MCP request is checked before reaching your tool logic. Return a 401 response for invalid or missing credentials.

Step 3: Containerize with Docker.
Create a Dockerfile that installs dependencies, copies your server code, and sets the entrypoint. Use a minimal base image to reduce attack surface and image size.

Python example:

FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["python", "server.py"]

TypeScript example:

FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
EXPOSE 8080
CMD ["node", "dist/index.js"]

Step 4: Add health checks.
Add a health endpoint that your orchestrator or load balancer can poll. This is a simple HTTP GET endpoint that returns 200 when the server is ready to handle requests. Separate it from the MCP endpoint so health checks do not interfere with protocol handling.

# Python with FastAPI/Flask alongside MCP
@app.get("/health")
def health():
    return {"status": "ok"}

Configure your container orchestrator to check this endpoint. For Docker, add a HEALTHCHECK instruction. For Kubernetes, use a livenessProbe. For cloud platforms like Cloud Run or Fly.io, set the health check path in the deployment configuration.

Step 5: Deploy and configure clients.
Build and push the Docker image, deploy it to your hosting platform, and note the public URL. Update your MCP client configurations to use the remote URL with authentication headers.

docker build -t my-mcp-server .
docker push registry.example.com/my-mcp-server:latest

Client configuration for the remote server:

{
  "mcpServers": {
    "my-tools": {
      "type": "url",
      "url": "https://mcp.example.com/mcp",
      "headers": {
        "Authorization": "Bearer your-api-key"
      }
    }
  }
}

Step 6: Add monitoring.
Instrument your server with structured logging and metrics. Track request latency, error rates, tool invocation counts, and authentication failures. Use your existing monitoring stack (Prometheus, Datadog, CloudWatch, or any platform that accepts HTTP metrics) because MCP servers are standard HTTP services.

import time
import logging

logger = logging.getLogger("mcp-server")

def instrument_tool(tool_name, handler):
    async def wrapper(*args, **kwargs):
        start = time.time()
        try:
            result = await handler(*args, **kwargs)
            duration = time.time() - start
            logger.info(f"tool={tool_name} duration={duration:.3f}s status=ok")
            return result
        except Exception as e:
            duration = time.time() - start
            logger.error(f"tool={tool_name} duration={duration:.3f}s error={e}")
            raise
    return wrapper

Scaling Considerations

MCP requests are stateless at the protocol level (each request is independent), which means you can run multiple server instances behind a load balancer. Horizontal scaling works the same as any HTTP service: add more instances to handle more concurrent requests.

If your tools have state (a database connection, a cache, or in-memory data), make sure that state is shared across instances or externalized to a database. The server process should be disposable, meaning any instance can handle any request without relying on in-process state from previous requests.

For memory servers specifically, the storage backend (database, vector store) handles persistence. The MCP server is a thin translation layer between the MCP protocol and your storage backend. This separation means you scale the servers and the storage independently based on their respective bottlenecks.

TLS and Network Security

Always serve production MCP endpoints over HTTPS. Use a reverse proxy (nginx, Caddy, or your cloud platform's load balancer) to terminate TLS in front of your server. The server itself can listen on plain HTTP internally while the proxy handles certificate management and encryption. This is simpler than managing TLS certificates in your application code and lets you use standard certificate automation like Let's Encrypt or cloud-managed certificates.

Skip the deployment work. Adaptive Recall is already deployed, monitored, and scaled as a production MCP service. Connect in two minutes.

Get Started Free

How to Deploy a Remote MCP Server in Production

Before You Start

Step-by-Step Deployment

Scaling Considerations

TLS and Network Security

Related Articles