How to Deploy a Remote MCP Server in Production
Before You Start
You need a working MCP server that runs locally with stdio transport. This guide covers converting it to a production HTTP service. You also need a hosting environment (any platform that runs Docker containers: AWS ECS, Google Cloud Run, Fly.io, Railway, or a plain VM with Docker) and a domain name or static IP for the server URL.
Step-by-Step Deployment
Replace the stdio transport with Streamable HTTP. The tool, resource, and prompt definitions stay identical. Only the server startup code changes. For Python servers using FastMCP, this is a single parameter change. For TypeScript servers, swap the transport class.
Python (FastMCP):
from mcp.server.fastmcp import FastMCP
mcp = FastMCP("my-tools")
# ... all tool definitions unchanged ...
if __name__ == "__main__":
mcp.run(transport="streamable-http", host="0.0.0.0", port=8080)TypeScript:
import express from "express";
import { StreamableHTTPServerTransport } from
"@modelcontextprotocol/sdk/server/streamableHttp.js";
const app = express();
app.use(express.json());
app.post("/mcp", async (req, res) => {
const transport = new StreamableHTTPServerTransport("/mcp");
await server.connect(transport);
await transport.handleRequest(req, res);
});
app.listen(8080);Every production MCP server needs authentication. The simplest approach is API key validation: check the Authorization header on each request and reject requests without a valid key. For multi-user or enterprise deployments, implement OAuth 2.1 instead.
import os
VALID_KEYS = set(os.environ.get("API_KEYS", "").split(","))
def validate_request(headers):
auth = headers.get("Authorization", "")
if not auth.startswith("Bearer "):
return False
token = auth[7:]
return token in VALID_KEYSIntegrate validation into your HTTP handler so every MCP request is checked before reaching your tool logic. Return a 401 response for invalid or missing credentials.
Create a Dockerfile that installs dependencies, copies your server code, and sets the entrypoint. Use a minimal base image to reduce attack surface and image size.
Python example:
FROM python:3.12-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
EXPOSE 8080
CMD ["python", "server.py"]TypeScript example:
FROM node:20-slim
WORKDIR /app
COPY package*.json ./
RUN npm ci --production
COPY dist/ ./dist/
EXPOSE 8080
CMD ["node", "dist/index.js"]Add a health endpoint that your orchestrator or load balancer can poll. This is a simple HTTP GET endpoint that returns 200 when the server is ready to handle requests. Separate it from the MCP endpoint so health checks do not interfere with protocol handling.
# Python with FastAPI/Flask alongside MCP
@app.get("/health")
def health():
return {"status": "ok"}Configure your container orchestrator to check this endpoint. For Docker, add a HEALTHCHECK instruction. For Kubernetes, use a livenessProbe. For cloud platforms like Cloud Run or Fly.io, set the health check path in the deployment configuration.
Build and push the Docker image, deploy it to your hosting platform, and note the public URL. Update your MCP client configurations to use the remote URL with authentication headers.
docker build -t my-mcp-server .
docker push registry.example.com/my-mcp-server:latestClient configuration for the remote server:
{
"mcpServers": {
"my-tools": {
"type": "url",
"url": "https://mcp.example.com/mcp",
"headers": {
"Authorization": "Bearer your-api-key"
}
}
}
}Instrument your server with structured logging and metrics. Track request latency, error rates, tool invocation counts, and authentication failures. Use your existing monitoring stack (Prometheus, Datadog, CloudWatch, or any platform that accepts HTTP metrics) because MCP servers are standard HTTP services.
import time
import logging
logger = logging.getLogger("mcp-server")
def instrument_tool(tool_name, handler):
async def wrapper(*args, **kwargs):
start = time.time()
try:
result = await handler(*args, **kwargs)
duration = time.time() - start
logger.info(f"tool={tool_name} duration={duration:.3f}s status=ok")
return result
except Exception as e:
duration = time.time() - start
logger.error(f"tool={tool_name} duration={duration:.3f}s error={e}")
raise
return wrapperScaling Considerations
MCP requests are stateless at the protocol level (each request is independent), which means you can run multiple server instances behind a load balancer. Horizontal scaling works the same as any HTTP service: add more instances to handle more concurrent requests.
If your tools have state (a database connection, a cache, or in-memory data), make sure that state is shared across instances or externalized to a database. The server process should be disposable, meaning any instance can handle any request without relying on in-process state from previous requests.
For memory servers specifically, the storage backend (database, vector store) handles persistence. The MCP server is a thin translation layer between the MCP protocol and your storage backend. This separation means you scale the servers and the storage independently based on their respective bottlenecks.
TLS and Network Security
Always serve production MCP endpoints over HTTPS. Use a reverse proxy (nginx, Caddy, or your cloud platform's load balancer) to terminate TLS in front of your server. The server itself can listen on plain HTTP internally while the proxy handles certificate management and encryption. This is simpler than managing TLS certificates in your application code and lets you use standard certificate automation like Let's Encrypt or cloud-managed certificates.
Skip the deployment work. Adaptive Recall is already deployed, monitored, and scaled as a production MCP service. Connect in two minutes.
Get Started Free