Deploying Microsoft MCP Gateway on Kubernetes for Enterprise AI Agents
A hands-on guide to deploying Microsoft MCP Gateway on Kubernetes — architecture, step-by-step setup, enterprise security, observability, and scaling for production AI agent workloads.
Mohammed Kafeel
Machine Learning Researcher
On this page
- What Is Microsoft MCP Gateway?
- Why Deploy MCP Gateway on Kubernetes?
- MCP Gateway Architecture Deep Dive
- Prerequisites Before You Deploy
- Step-by-Step: Deploying MCP Gateway on Kubernetes
- Enterprise Security Best Practices
- Observability: Monitoring Your MCP Gateway
- Scaling MCP Gateway in Production
- MCP Gateway vs Traditional API Gateway: What's Different?
- Real-World Enterprise Use Cases
- Key Takeaways
- FAQ
- Useful Resources
By 2026, the average enterprise AI platform connects dozens of AI agents to hundreds of tools - databases, APIs, code repos, ticketing systems. Without a governed routing layer, every agent-to-tool connection is a bespoke integration, a security blind spot, and an ops nightmare waiting to happen.
That's exactly the problem Microsoft MCP Gateway Kubernetes deployments solve. The MCP Gateway gives you a single, session-aware reverse proxy that manages, secures, and scales all your MCP server connections - inside a Kubernetes cluster you already control.
This guide walks you through everything: what MCP Gateway is, how its architecture works, a step-by-step deploy, enterprise security hardening, observability, and scaling for production. Let's get into it.
What Is Microsoft MCP Gateway?
MCP Gateway is a reverse proxy and management layer for MCP servers. It sits between your AI agents and your tools, handling routing, auth, and lifecycle management - so you don't have to wire each connection manually.
The Problem MCP Solves First
The Model Context Protocol (MCP) is an open standard (originally from Anthropic, now broadly adopted) that defines how AI agents communicate with external tools and data sources. Think of it as USB-C for AI: a universal connector so any agent can talk to any tool without custom glue code. (For the basics, see what MCP is.)
Without MCP, every agent-tool integration is one-off. With MCP, you write the tool once as an MCP server and any compliant agent can call it.
So What Does MCP Gateway Add?
MCP servers are great individually. But at enterprise scale - 50 teams, 200 tools, multiple AI agents - you need more than just the protocol. You need:
- Centralized routing so agents don't need to know where each server lives
- Session affinity so multi-turn conversations stay coherent
- Lifecycle management to deploy, update, and delete MCP servers via API
- Centralized security - one auth layer, not one per server
That's MCP Gateway. It's an open-source project from Microsoft, available at github.com/microsoft/mcp-gateway under the MIT license. As of June 2026, it has 706 stars and 74 forks - active, real-world adoption. (For the broader pattern, compare an MCP gateway vs a direct connection.)
Why Deploy MCP Gateway on Kubernetes?
Kubernetes is the right runtime for MCP Gateway because it was designed for it. The gateway uses Kubernetes-native primitives - StatefulSets, headless services, namespaces - to deliver the session affinity and lifecycle management that enterprise AI agents need.
Scalability for Enterprise Workloads
A single MCP server pod handling hundreds of concurrent agent sessions will buckle. Kubernetes lets you scale horizontally - add more gateway replicas, add more MCP server instances - without changing your agent code. (We dig into the high-volume case in scaling an MCP gateway.)
- Horizontal Pod Autoscaler (HPA) scales gateway pods based on CPU/memory or custom metrics
- StatefulSets for MCP server instances ensure stable network identities across restarts
- Namespaces isolate teams and workloads cleanly (the foundation for multi-tenant Kubernetes deployments where each client needs hard data boundaries)
Built-in Session-Aware Routing
This is the big one. Standard HTTP load balancers are stateless - they don't care which backend pod handles your request. MCP conversations aren't stateless. A multi-turn agent session needs to hit the same MCP server instance every time.
MCP Gateway solves this with session-aware stateful routing: every request carrying a session_id is consistently routed to the same pod. Kubernetes headless services make this work at scale.
Centralized Security and Policy Enforcement
Instead of configuring auth on every individual MCP server, you configure it once on the gateway. Microsoft Entra ID bearer token authentication and RBAC role checks happen at the gateway layer - before traffic ever reaches an MCP server pod.
This means:
- One place to audit all AI agent access
- One place to rotate credentials
- One place to enforce network policies
MCP Gateway Architecture Deep Dive
MCP Gateway has two distinct planes: a control plane for management and a data plane for live traffic. Understanding both is essential before you deploy.
Control Plane - Adapter and Tool Management
The control plane exposes RESTful CRUD APIs for managing your MCP server ecosystem:
Adapter Management (/adapters):
POST /adapters- Deploy and register a new MCP serverGET /adapters- List all servers the caller can accessGET /adapters/{name}/status- Check deployment healthGET /adapters/{name}/logs- Stream server logsPUT /adapters/{name}- Update a deploymentDELETE /adapters/{name}- Remove a server
Tool Management (/tools):
POST /tools- Register a tool with its MCP definition and container imageGET /tools/{name}/status- Check tool deployment statusGET /tools/{name}/logs- Access tool server logs
The control plane also manages Agents and Sessions (preview) when Azure AI Foundry is configured - enabling full LLM-driven agent runs streamed over Server-Sent Events.
Data Plane - Live MCP Traffic Routing
The data plane is where agent requests actually flow:
POST /adapters/{name}/mcp- Direct streamable HTTP connection to a named MCP server, with session affinityPOST /mcp- Routes to the Tool Gateway Router, an intelligent MCP server that dynamically dispatches tool calls to the right registered tool server
The Tool Gateway Router itself runs as multiple instances behind the gateway for high availability. It knows every registered tool definition and routes calls accordingly.
Authentication & Authorization
Every request - both control plane and data plane - goes through Entra ID bearer token validation first.
RBAC is role-based at the resource level:
mcp.admin- Full read/write access to all adapters and toolsmcp.engineer(or custom roles) - Read access to resources where the role is listed inrequiredRoles- Resource creator - Always has read/write access to their own resources
Metadata Store - Cosmos DB
The gateway's Metadata Manager persists all adapter and tool definitions in Azure Cosmos DB. In production, this gives you a distributed, durable store for server and tool metadata - decoupled from the gateway pods themselves. In local dev mode, a lightweight in-memory store is used instead.
Control Plane vs Data Plane: Quick Reference
| Aspect | Control Plane | Data Plane |
|---|---|---|
| Purpose | Manage MCP server lifecycle | Route live agent traffic |
| Key endpoints | /adapters, /tools, /agents |
/adapters/{name}/mcp, /mcp |
| Auth | Entra ID bearer token + RBAC | Entra ID bearer token + RBAC |
| Consumers | DevOps/platform engineers | AI agents, MCP clients |
| State | Cosmos DB (metadata) | Session affinity (in-memory/distributed) |
| Scaling | Low traffic, management ops | High throughput, latency-sensitive |
Prerequisites Before You Deploy
Before you run a single command, make sure you have these in place:
- .NET 8 SDK - the gateway is built on ASP.NET Core
- Docker Desktop with Kubernetes enabled (for local dev)
kubectlconfigured and pointing at your target cluster- Azure subscription with Owner access (for Entra ID app registration, Cosmos DB, and AKS)
- Helm 3.x - for Helm-based deployments
- Access to a container registry - Azure Container Registry (ACR) for production, or a local registry (
localhost:5000) for dev - Azure CLI installed and authenticated (
az login)
For a local dev setup, Docker Desktop's built-in Kubernetes is all you need. For production on AKS MCP Gateway deployments, you'll want an active Azure subscription and the Azure CLI.
Step-by-Step: Deploying MCP Gateway on Kubernetes
You can go from zero to a running MCP Gateway in under 30 minutes - locally with Docker Desktop, or to AKS with the one-click Azure template. Here's the full local path first, then the AKS notes.
Step 1: Clone the Repository
git clone https://github.com/microsoft/mcp-gateway.git
cd mcp-gateway
The repo includes everything: the .NET gateway service, sample MCP servers, Kubernetes manifests under deployment/k8s/, and Bicep templates for Azure.
Step 2: Set Up a Local Container Registry
For local/dev deployments, spin up a local Docker registry. This is where you'll push your MCP server images.
docker run -d -p 5000:5000 --name registry registry:2.7
For production, use Azure Container Registry (ACR) instead. The Azure deployment template provisions one automatically.
Step 3: Build and Push the MCP Server Image
Build the sample MCP server and push it to your local registry:
docker build -f sample-servers/mcp-example/Dockerfile sample-servers/mcp-example \
-t localhost:5000/mcp-example:1.0.0
docker push localhost:5000/mcp-example:1.0.0
This mcp-example server is a working reference implementation - great for validating your setup before you bring in your own MCP servers.
Step 4: Build and Publish the MCP Gateway
Publish the gateway service image to your local registry using the included publish profile:
dotnet publish dotnet/Microsoft.McpGateway.Service/src/Microsoft.McpGateway.Service.csproj \
-c Release /p:PublishProfile=localhost_5000.pubxml
Also publish the Tool Gateway Router (needed for dynamic tool routing via /mcp):
dotnet publish dotnet/Microsoft.McpGateway.Tools/src/Microsoft.McpGateway.Tools.csproj \
-c Release /p:PublishProfile=localhost_5000.pubxml
Step 5: Deploy to Kubernetes
Apply the included Kubernetes manifests. The repo ships a complete local-deployment.yml that covers the Deployment, Service, ConfigMap, and namespace setup:
kubectl apply -f deployment/k8s/local-deployment.yml
Here's what a representative gateway Deployment manifest looks like (simplified from the repo pattern):
apiVersion: apps/v1
kind: Deployment
metadata:
name: mcpgateway
namespace: adapter
spec:
replicas: 1
selector:
matchLabels:
app: mcpgateway
template:
metadata:
labels:
app: mcpgateway
spec:
containers:
- name: mcpgateway
image: localhost:5000/mcpgateway:latest
ports:
- containerPort: 8080
env:
- name: ASPNETCORE_ENVIRONMENT
value: "Development"
- name: GatewaySettings__Secret
valueFrom:
secretKeyRef:
name: gateway-secrets
key: gateway-secret
---
apiVersion: v1
kind: Service
metadata:
name: mcpgateway-service
namespace: adapter
spec:
selector:
app: mcpgateway
ports:
- port: 8000
targetPort: 8080
type: ClusterIP
Production note: In production, replace
ClusterIPwith a LoadBalancer or configure an Ingress. The Azure deployment template provisions an Application Gateway for this automatically.
Step 6: Verify the Deployment
Check that your pods are running and services are exposed:
kubectl get pods -n adapter
kubectl get services -n adapter
You should see mcpgateway pods in Running state and the mcpgateway-service listed. Enable port forwarding to test locally:
kubectl port-forward -n adapter svc/mcpgateway-service 8000:8000
Step 7: Register Your First MCP Adapter
With the gateway running, register your first MCP server (adapter) via the control plane API:
curl -X POST http://localhost:8000/adapters \
-H "Content-Type: application/json" \
-d '{
"name": "mcp-example",
"imageName": "mcp-example",
"imageVersion": "1.0.0",
"description": "Example MCP server for testing",
"requiredRoles": []
}'
The gateway will deploy the MCP server as a pod in the cluster and register it in the metadata store. Check its status:
curl http://localhost:8000/adapters/mcp-example/status
In cloud mode with Entra ID enabled, add Authorization: Bearer <token> to every request. Acquire a token with:
az account get-access-token --resource $clientId
Step 8: Connect an AI Agent
Once the adapter is deployed, any MCP-compatible AI agent can connect to it via the gateway's data plane endpoint:
http://localhost:8000/adapters/mcp-example/mcp
For VS Code with GitHub Copilot, add this to .vscode/mcp.json:
{
"servers": {
"mcp-example": {
"url": "http://localhost:8000/adapters/mcp-example/mcp"
}
}
}
For Azure OpenAI agents or AutoGen multi-agent setups, point the MCP client transport at the same URL. For dynamic tool routing (where the agent doesn't need to know which server handles which tool), use the /mcp endpoint instead - the Tool Gateway Router handles dispatch automatically.
Enterprise Security Best Practices
Security isn't an afterthought in MCP Gateway - it's built into the architecture. But you still need to configure it correctly for production.
Enforce Entra ID Authentication on Every Request
- Register an app in Entra ID, expose an
accessscope, and set theclientIdin your gateway configuration - In production, set
ASPNETCORE_ENVIRONMENT=Production- this activates MSAL/Entra ID auth and disables the dev-mode anonymous access - Authorize Azure CLI and VS Code as client applications for developer access
- Use Workload Identity (Managed Identity) for service-to-service auth - no static credentials in pods
Apply RBAC to Limit Tool Access by Team/Role
- Create custom Entra app roles like
mcp.engineer,mcp.admin,mcp.readonly - When registering adapters, set
requiredRolesto control which roles can read that server - Only the resource creator and
mcp.admincan write - this is enforced by the gateway, not by you - Audit role assignments quarterly; remove stale access promptly
Use Network Policies to Isolate MCP Pods
- Deploy MCP servers in a dedicated namespace (
adapter) separate from other workloads - Apply Kubernetes NetworkPolicy resources to restrict ingress to MCP pods - only the gateway should be able to reach them directly
- Use Private Endpoints for Cosmos DB and ACR in production; no public internet exposure for backend services
Rotate Secrets with Azure Key Vault + CSI Driver
- Store
GatewaySettings__Secret(the gateway-to-tool-router shared secret) in Azure Key Vault - Mount secrets into pods using the Secrets Store CSI Driver - no secrets in environment variables or ConfigMaps
- Enable automatic rotation: the CSI driver re-syncs secrets on a configurable interval without pod restarts
Observability: Monitoring Your MCP Gateway
You can't manage what you can't see. MCP Gateway ships with solid observability hooks - you just need to wire them up.
Logs
- Use
kubectl logs -n adapter <pod-name>for quick debugging - In production, ship logs to Azure Monitor / Log Analytics via the Azure Monitor agent or a Fluent Bit DaemonSet
- The gateway exposes per-adapter log access via
GET /adapters/{name}/logs- useful for debugging a specific MCP server without kubectl access - Structured JSON logging is enabled by default; filter by
session_idto trace a specific agent conversation
Metrics
- The gateway exposes Prometheus-compatible metrics on the standard
/metricsendpoint - Deploy the Prometheus Operator on your cluster and add a
ServiceMonitorresource pointing at the gateway service - Key metrics to watch: request latency by adapter, session count, error rate per tool, pod restarts
- Visualize in Grafana with a custom dashboard - or import metrics into Azure Monitor via the Azure Managed Prometheus integration on AKS
Tracing
- MCP Gateway supports OpenTelemetry for distributed tracing across agent calls
- Configure the OTLP exporter to send traces to Azure Monitor Application Insights or a self-hosted Jaeger/Tempo instance
- Traces span the full path: agent request → gateway auth → adapter routing → MCP server response
- This is invaluable for debugging latency in multi-hop agent workflows (e.g., AutoGen orchestrator → gateway → 3 tool servers)
Scaling MCP Gateway in Production
The gateway is designed to scale horizontally. Here's how to do it right.
Horizontal Pod Autoscaler (HPA) for Gateway Pods
Add an HPA to automatically scale gateway replicas based on CPU utilization:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mcpgateway-hpa
namespace: adapter
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mcpgateway
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
For production, consider custom metrics (e.g., active session count) via the KEDA (Kubernetes Event-Driven Autoscaling) operator for more precise scaling.
StatefulSets for Session-Aware MCP Server Instances
MCP servers that maintain per-session state should run as StatefulSets, not Deployments. StatefulSets give each pod a stable network identity (mcp-a-0, mcp-a-1) that the gateway uses for session affinity routing.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mcp-example
namespace: adapter
spec:
serviceName: "mcp-example-headless"
replicas: 3
selector:
matchLabels:
app: mcp-example
template:
metadata:
labels:
app: mcp-example
spec:
containers:
- name: mcp-example
image: localhost:5000/mcp-example:1.0.0
ports:
- containerPort: 8080
Using AKS for Managed Kubernetes
For production MCP Gateway Azure deployments, Azure Kubernetes Service (AKS) is the recommended platform. The repo includes a one-click Azure deployment template that provisions:
| Resource | Type |
|---|---|
mg-aks-<label> |
AKS Cluster |
mgreg<label> |
Azure Container Registry |
mg-storage-<label> |
Cosmos DB Account |
mg-aag-<label> |
Application Gateway |
mg-ai-<label> |
Application Insights |
mg-identity-<label> |
Managed Identity |
AKS gives you Workload Identity (no credential management), Azure CNI for network policy support, and native integration with Azure Monitor - everything you need for a production deploy MCP server Kubernetes setup.
MCP Gateway vs Traditional API Gateway: What's Different?
MCP Gateway isn't a replacement for Kong or Azure APIM - it's a complement. It handles things a traditional API gateway simply wasn't built for.
| Feature | Traditional API Gateway | MCP Gateway |
|---|---|---|
| Session-aware routing | ❌ | ✅ |
| MCP protocol support | ❌ | ✅ |
| AI agent lifecycle management | ❌ | ✅ |
| Tool catalog management | ❌ | ✅ |
| Standard HTTP REST | ✅ | ✅ |
| Rate limiting | ✅ | ✅ (via APIM integration) |
| Auth (OAuth/Entra ID) | ✅ | ✅ |
| Kubernetes-native StatefulSet routing | ❌ | ✅ |
| Per-adapter log access API | ❌ | ✅ |
| MCP server deployment management | ❌ | ✅ |
The sweet spot: use Azure API Management (APIM) in front of MCP Gateway for rate limiting, an external developer portal, and API versioning. Let MCP Gateway handle everything specific to the MCP protocol and AI agent lifecycle.
Real-World Enterprise Use Cases
Multi-Agent Orchestration with AutoGen or Semantic Kernel
Teams building enterprise AI agents MCP workflows with Microsoft's AutoGen or Semantic Kernel use MCP Gateway as the central tool broker. Each agent in the orchestration connects to /mcp, and the Tool Gateway Router dispatches calls to the right tool server - whether that's a database query tool, a code execution server, or a document retrieval service. No agent needs to know the topology.
Secure Tool Access for GitHub Copilot Extensions
GitHub Copilot extensions in VS Code connect to MCP servers for context-aware assistance. In regulated enterprises, you can't expose raw MCP servers to developer workstations. MCP Gateway sits in the middle: Copilot connects to the gateway endpoint, Entra ID validates the developer's identity, and RBAC ensures they only see the tools their team is authorized to use.
Centralized AI Tool Governance for Regulated Industries
In financial services and healthcare, every AI tool access needs an audit trail. MCP Gateway's centralized auth and logging means every tool call - which agent, which tool, which user identity, at what time - flows through a single observable point. Combined with Azure Monitor and Log Analytics, compliance teams get the audit logs they need without instrumenting each MCP server individually.
Key Takeaways
- MCP Gateway solves the enterprise AI tooling problem - it's the governed, observable layer between your AI agents and your tools.
- It's Kubernetes-native by design. StatefulSets, headless services, and namespace isolation are first-class features, not afterthoughts.
- The control plane and data plane are separate concerns. DevOps manages adapters via REST APIs; agents consume tools via the data plane endpoints.
- Security is built in. Entra ID auth and RBAC are core features - not plugins. In production, combine them with network policies and Key Vault secret rotation.
- Observability is ready to wire up. Prometheus metrics, OpenTelemetry tracing, and per-adapter log APIs are available out of the box.
- AKS MCP Gateway deployments are one-click in Azure. The Bicep template provisions AKS, ACR, Cosmos DB, Application Gateway, and Managed Identity together.
- It's MIT-licensed and actively maintained at github.com/microsoft/mcp-gateway.
FAQ
What is Microsoft MCP Gateway?
Microsoft MCP Gateway is an open-source reverse proxy and management layer for Model Context Protocol (MCP) servers. It provides session-aware stateful routing, lifecycle management (deploy/update/delete via REST APIs), and centralized Entra ID authentication for MCP servers running in Kubernetes. Think of it as the control tower for all your AI agent tool connections.
Is MCP Gateway production-ready?
The core gateway - adapter management, data plane routing, Entra ID auth, Cosmos DB metadata store, and AKS deployment - is production-ready. The Agents and Sessions subsystem (LLM-driven agent runs via /sessions/run) is currently in preview and is recommended for single-replica evaluation deployments only. For production multi-agent workloads, use the gateway as a routing and management layer and run your LLM orchestration in AutoGen or Semantic Kernel.
Does MCP Gateway work with AKS?
Yes - AKS is the primary production deployment target. The repo includes a one-click Azure deployment template (Bicep) that provisions an AKS cluster, ACR, Cosmos DB, Application Gateway, and Managed Identity together. The gateway runs as a Kubernetes Deployment on AKS, with Workload Identity for credential-less authentication to Azure services.
How does MCP Gateway handle authentication?
In production (cloud mode), every request to both the control plane and data plane requires an Entra ID bearer token. The gateway validates the token, extracts the caller's identity and roles, and enforces RBAC at the resource level. In local dev mode, anonymous access is available for rapid iteration. The gateway also supports a shared secret (GatewaySettings__Secret) for secure service-to-service communication between the gateway and the Tool Gateway Router.
Can I use MCP Gateway without Azure?
Partially. The gateway itself is open-source .NET and runs on any Kubernetes cluster - no Azure required for the core routing and management functionality. However, the metadata store defaults to Cosmos DB (Azure), and the auth layer is built around Microsoft Entra ID. For a fully non-Azure setup, you'd need to swap the metadata store implementation and auth provider - which is possible but requires code changes.
What AI agents are compatible with MCP Gateway?
Any agent or client that speaks the Model Context Protocol over streamable HTTP can connect to MCP Gateway. This includes GitHub Copilot (via VS Code MCP server configuration), Azure OpenAI agents using the MCP client SDK, AutoGen multi-agent frameworks, Semantic Kernel with MCP plugin support, and any MCP-compatible client using the url transport pointing at /adapters/{name}/mcp or /mcp.
Useful Resources
Keep reading
MCP at Scale: Handling High-Volume Requests with a Gateway
An MCP gateway is the control plane that makes AI agents production-ready. Architecture, rate limiting, load balancing, and an implementation checklist.
MCP Gateway vs Direct Connection: Choosing the Right Architecture
Direct MCP connections are fine for prototyping. In production, they become a security and scalability liability. Here's how to choose.
Best MCP Deployment Platforms for Enterprise Teams (2026)
Choosing the right MCP deployment platform in 2026 can make or break your enterprise AI rollout. A data-driven breakdown of the 10 best options.



