Deploying Microsoft MCP Gateway on Kubernetes for Enterprise AI Agents

A hands-on guide to deploying Microsoft MCP Gateway on Kubernetes — architecture, step-by-step setup, enterprise security, observability, and scaling for production AI agent workloads.

MK

Mohammed Kafeel

Machine Learning Researcher

June 24, 202615 min read
On this page

By 2026, the average enterprise AI platform connects dozens of AI agents to hundreds of tools - databases, APIs, code repos, ticketing systems. Without a governed routing layer, every agent-to-tool connection is a bespoke integration, a security blind spot, and an ops nightmare waiting to happen.

That's exactly the problem Microsoft MCP Gateway Kubernetes deployments solve. The MCP Gateway gives you a single, session-aware reverse proxy that manages, secures, and scales all your MCP server connections - inside a Kubernetes cluster you already control.

This guide walks you through everything: what MCP Gateway is, how its architecture works, a step-by-step deploy, enterprise security hardening, observability, and scaling for production. Let's get into it.


What Is Microsoft MCP Gateway?

MCP Gateway is a reverse proxy and management layer for MCP servers. It sits between your AI agents and your tools, handling routing, auth, and lifecycle management - so you don't have to wire each connection manually.

The Problem MCP Solves First

The Model Context Protocol (MCP) is an open standard (originally from Anthropic, now broadly adopted) that defines how AI agents communicate with external tools and data sources. Think of it as USB-C for AI: a universal connector so any agent can talk to any tool without custom glue code. (For the basics, see what MCP is.)

Without MCP, every agent-tool integration is one-off. With MCP, you write the tool once as an MCP server and any compliant agent can call it.

So What Does MCP Gateway Add?

MCP servers are great individually. But at enterprise scale - 50 teams, 200 tools, multiple AI agents - you need more than just the protocol. You need:

  • Centralized routing so agents don't need to know where each server lives
  • Session affinity so multi-turn conversations stay coherent
  • Lifecycle management to deploy, update, and delete MCP servers via API
  • Centralized security - one auth layer, not one per server

That's MCP Gateway. It's an open-source project from Microsoft, available at github.com/microsoft/mcp-gateway under the MIT license. As of June 2026, it has 706 stars and 74 forks - active, real-world adoption. (For the broader pattern, compare an MCP gateway vs a direct connection.)


Why Deploy MCP Gateway on Kubernetes?

Kubernetes is the right runtime for MCP Gateway because it was designed for it. The gateway uses Kubernetes-native primitives - StatefulSets, headless services, namespaces - to deliver the session affinity and lifecycle management that enterprise AI agents need.

Scalability for Enterprise Workloads

A single MCP server pod handling hundreds of concurrent agent sessions will buckle. Kubernetes lets you scale horizontally - add more gateway replicas, add more MCP server instances - without changing your agent code. (We dig into the high-volume case in scaling an MCP gateway.)

  • Horizontal Pod Autoscaler (HPA) scales gateway pods based on CPU/memory or custom metrics
  • StatefulSets for MCP server instances ensure stable network identities across restarts
  • Namespaces isolate teams and workloads cleanly (the foundation for multi-tenant Kubernetes deployments where each client needs hard data boundaries)

Built-in Session-Aware Routing

This is the big one. Standard HTTP load balancers are stateless - they don't care which backend pod handles your request. MCP conversations aren't stateless. A multi-turn agent session needs to hit the same MCP server instance every time.

MCP Gateway solves this with session-aware stateful routing: every request carrying a session_id is consistently routed to the same pod. Kubernetes headless services make this work at scale.

Centralized Security and Policy Enforcement

Instead of configuring auth on every individual MCP server, you configure it once on the gateway. Microsoft Entra ID bearer token authentication and RBAC role checks happen at the gateway layer - before traffic ever reaches an MCP server pod.

This means:

  • One place to audit all AI agent access
  • One place to rotate credentials
  • One place to enforce network policies

MCP Gateway Architecture Deep Dive

MCP Gateway has two distinct planes: a control plane for management and a data plane for live traffic. Understanding both is essential before you deploy.

Control Plane - Adapter and Tool Management

The control plane exposes RESTful CRUD APIs for managing your MCP server ecosystem:

Adapter Management (/adapters):

  • POST /adapters - Deploy and register a new MCP server
  • GET /adapters - List all servers the caller can access
  • GET /adapters/{name}/status - Check deployment health
  • GET /adapters/{name}/logs - Stream server logs
  • PUT /adapters/{name} - Update a deployment
  • DELETE /adapters/{name} - Remove a server

Tool Management (/tools):

  • POST /tools - Register a tool with its MCP definition and container image
  • GET /tools/{name}/status - Check tool deployment status
  • GET /tools/{name}/logs - Access tool server logs

The control plane also manages Agents and Sessions (preview) when Azure AI Foundry is configured - enabling full LLM-driven agent runs streamed over Server-Sent Events.

Data Plane - Live MCP Traffic Routing

The data plane is where agent requests actually flow:

  • POST /adapters/{name}/mcp - Direct streamable HTTP connection to a named MCP server, with session affinity
  • POST /mcp - Routes to the Tool Gateway Router, an intelligent MCP server that dynamically dispatches tool calls to the right registered tool server

The Tool Gateway Router itself runs as multiple instances behind the gateway for high availability. It knows every registered tool definition and routes calls accordingly.

Authentication & Authorization

Every request - both control plane and data plane - goes through Entra ID bearer token validation first.

RBAC is role-based at the resource level:

  • mcp.admin - Full read/write access to all adapters and tools
  • mcp.engineer (or custom roles) - Read access to resources where the role is listed in requiredRoles
  • Resource creator - Always has read/write access to their own resources

Metadata Store - Cosmos DB

The gateway's Metadata Manager persists all adapter and tool definitions in Azure Cosmos DB. In production, this gives you a distributed, durable store for server and tool metadata - decoupled from the gateway pods themselves. In local dev mode, a lightweight in-memory store is used instead.

Control Plane vs Data Plane: Quick Reference

Aspect Control Plane Data Plane
Purpose Manage MCP server lifecycle Route live agent traffic
Key endpoints /adapters, /tools, /agents /adapters/{name}/mcp, /mcp
Auth Entra ID bearer token + RBAC Entra ID bearer token + RBAC
Consumers DevOps/platform engineers AI agents, MCP clients
State Cosmos DB (metadata) Session affinity (in-memory/distributed)
Scaling Low traffic, management ops High throughput, latency-sensitive

Prerequisites Before You Deploy

Before you run a single command, make sure you have these in place:

  • .NET 8 SDK - the gateway is built on ASP.NET Core
  • Docker Desktop with Kubernetes enabled (for local dev)
  • kubectl configured and pointing at your target cluster
  • Azure subscription with Owner access (for Entra ID app registration, Cosmos DB, and AKS)
  • Helm 3.x - for Helm-based deployments
  • Access to a container registry - Azure Container Registry (ACR) for production, or a local registry (localhost:5000) for dev
  • Azure CLI installed and authenticated (az login)

For a local dev setup, Docker Desktop's built-in Kubernetes is all you need. For production on AKS MCP Gateway deployments, you'll want an active Azure subscription and the Azure CLI.


Step-by-Step: Deploying MCP Gateway on Kubernetes

You can go from zero to a running MCP Gateway in under 30 minutes - locally with Docker Desktop, or to AKS with the one-click Azure template. Here's the full local path first, then the AKS notes.

Step 1: Clone the Repository

git clone https://github.com/microsoft/mcp-gateway.git
cd mcp-gateway

The repo includes everything: the .NET gateway service, sample MCP servers, Kubernetes manifests under deployment/k8s/, and Bicep templates for Azure.

Step 2: Set Up a Local Container Registry

For local/dev deployments, spin up a local Docker registry. This is where you'll push your MCP server images.

docker run -d -p 5000:5000 --name registry registry:2.7

For production, use Azure Container Registry (ACR) instead. The Azure deployment template provisions one automatically.

Step 3: Build and Push the MCP Server Image

Build the sample MCP server and push it to your local registry:

docker build -f sample-servers/mcp-example/Dockerfile sample-servers/mcp-example \
  -t localhost:5000/mcp-example:1.0.0

docker push localhost:5000/mcp-example:1.0.0

This mcp-example server is a working reference implementation - great for validating your setup before you bring in your own MCP servers.

Step 4: Build and Publish the MCP Gateway

Publish the gateway service image to your local registry using the included publish profile:

dotnet publish dotnet/Microsoft.McpGateway.Service/src/Microsoft.McpGateway.Service.csproj \
  -c Release /p:PublishProfile=localhost_5000.pubxml

Also publish the Tool Gateway Router (needed for dynamic tool routing via /mcp):

dotnet publish dotnet/Microsoft.McpGateway.Tools/src/Microsoft.McpGateway.Tools.csproj \
  -c Release /p:PublishProfile=localhost_5000.pubxml

Step 5: Deploy to Kubernetes

Apply the included Kubernetes manifests. The repo ships a complete local-deployment.yml that covers the Deployment, Service, ConfigMap, and namespace setup:

kubectl apply -f deployment/k8s/local-deployment.yml

Here's what a representative gateway Deployment manifest looks like (simplified from the repo pattern):

apiVersion: apps/v1
kind: Deployment
metadata:
  name: mcpgateway
  namespace: adapter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: mcpgateway
  template:
    metadata:
      labels:
        app: mcpgateway
    spec:
      containers:
        - name: mcpgateway
          image: localhost:5000/mcpgateway:latest
          ports:
            - containerPort: 8080
          env:
            - name: ASPNETCORE_ENVIRONMENT
              value: "Development"
            - name: GatewaySettings__Secret
              valueFrom:
                secretKeyRef:
                  name: gateway-secrets
                  key: gateway-secret
---
apiVersion: v1
kind: Service
metadata:
  name: mcpgateway-service
  namespace: adapter
spec:
  selector:
    app: mcpgateway
  ports:
    - port: 8000
      targetPort: 8080
  type: ClusterIP

Production note: In production, replace ClusterIP with a LoadBalancer or configure an Ingress. The Azure deployment template provisions an Application Gateway for this automatically.

Step 6: Verify the Deployment

Check that your pods are running and services are exposed:

kubectl get pods -n adapter
kubectl get services -n adapter

You should see mcpgateway pods in Running state and the mcpgateway-service listed. Enable port forwarding to test locally:

kubectl port-forward -n adapter svc/mcpgateway-service 8000:8000

Step 7: Register Your First MCP Adapter

With the gateway running, register your first MCP server (adapter) via the control plane API:

curl -X POST http://localhost:8000/adapters \
  -H "Content-Type: application/json" \
  -d '{
    "name": "mcp-example",
    "imageName": "mcp-example",
    "imageVersion": "1.0.0",
    "description": "Example MCP server for testing",
    "requiredRoles": []
  }'

The gateway will deploy the MCP server as a pod in the cluster and register it in the metadata store. Check its status:

curl http://localhost:8000/adapters/mcp-example/status

In cloud mode with Entra ID enabled, add Authorization: Bearer <token> to every request. Acquire a token with:

az account get-access-token --resource $clientId

Step 8: Connect an AI Agent

Once the adapter is deployed, any MCP-compatible AI agent can connect to it via the gateway's data plane endpoint:

http://localhost:8000/adapters/mcp-example/mcp

For VS Code with GitHub Copilot, add this to .vscode/mcp.json:

{
  "servers": {
    "mcp-example": {
      "url": "http://localhost:8000/adapters/mcp-example/mcp"
    }
  }
}

For Azure OpenAI agents or AutoGen multi-agent setups, point the MCP client transport at the same URL. For dynamic tool routing (where the agent doesn't need to know which server handles which tool), use the /mcp endpoint instead - the Tool Gateway Router handles dispatch automatically.


Enterprise Security Best Practices

Security isn't an afterthought in MCP Gateway - it's built into the architecture. But you still need to configure it correctly for production.

Enforce Entra ID Authentication on Every Request

  • Register an app in Entra ID, expose an access scope, and set the clientId in your gateway configuration
  • In production, set ASPNETCORE_ENVIRONMENT=Production - this activates MSAL/Entra ID auth and disables the dev-mode anonymous access
  • Authorize Azure CLI and VS Code as client applications for developer access
  • Use Workload Identity (Managed Identity) for service-to-service auth - no static credentials in pods

Apply RBAC to Limit Tool Access by Team/Role

  • Create custom Entra app roles like mcp.engineer, mcp.admin, mcp.readonly
  • When registering adapters, set requiredRoles to control which roles can read that server
  • Only the resource creator and mcp.admin can write - this is enforced by the gateway, not by you
  • Audit role assignments quarterly; remove stale access promptly

Use Network Policies to Isolate MCP Pods

  • Deploy MCP servers in a dedicated namespace (adapter) separate from other workloads
  • Apply Kubernetes NetworkPolicy resources to restrict ingress to MCP pods - only the gateway should be able to reach them directly
  • Use Private Endpoints for Cosmos DB and ACR in production; no public internet exposure for backend services

Rotate Secrets with Azure Key Vault + CSI Driver

  • Store GatewaySettings__Secret (the gateway-to-tool-router shared secret) in Azure Key Vault
  • Mount secrets into pods using the Secrets Store CSI Driver - no secrets in environment variables or ConfigMaps
  • Enable automatic rotation: the CSI driver re-syncs secrets on a configurable interval without pod restarts

Observability: Monitoring Your MCP Gateway

You can't manage what you can't see. MCP Gateway ships with solid observability hooks - you just need to wire them up.

Logs

  • Use kubectl logs -n adapter <pod-name> for quick debugging
  • In production, ship logs to Azure Monitor / Log Analytics via the Azure Monitor agent or a Fluent Bit DaemonSet
  • The gateway exposes per-adapter log access via GET /adapters/{name}/logs - useful for debugging a specific MCP server without kubectl access
  • Structured JSON logging is enabled by default; filter by session_id to trace a specific agent conversation

Metrics

  • The gateway exposes Prometheus-compatible metrics on the standard /metrics endpoint
  • Deploy the Prometheus Operator on your cluster and add a ServiceMonitor resource pointing at the gateway service
  • Key metrics to watch: request latency by adapter, session count, error rate per tool, pod restarts
  • Visualize in Grafana with a custom dashboard - or import metrics into Azure Monitor via the Azure Managed Prometheus integration on AKS

Tracing

  • MCP Gateway supports OpenTelemetry for distributed tracing across agent calls
  • Configure the OTLP exporter to send traces to Azure Monitor Application Insights or a self-hosted Jaeger/Tempo instance
  • Traces span the full path: agent request → gateway auth → adapter routing → MCP server response
  • This is invaluable for debugging latency in multi-hop agent workflows (e.g., AutoGen orchestrator → gateway → 3 tool servers)

Scaling MCP Gateway in Production

The gateway is designed to scale horizontally. Here's how to do it right.

Horizontal Pod Autoscaler (HPA) for Gateway Pods

Add an HPA to automatically scale gateway replicas based on CPU utilization:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: mcpgateway-hpa
  namespace: adapter
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mcpgateway
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70

For production, consider custom metrics (e.g., active session count) via the KEDA (Kubernetes Event-Driven Autoscaling) operator for more precise scaling.

StatefulSets for Session-Aware MCP Server Instances

MCP servers that maintain per-session state should run as StatefulSets, not Deployments. StatefulSets give each pod a stable network identity (mcp-a-0, mcp-a-1) that the gateway uses for session affinity routing.

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mcp-example
  namespace: adapter
spec:
  serviceName: "mcp-example-headless"
  replicas: 3
  selector:
    matchLabels:
      app: mcp-example
  template:
    metadata:
      labels:
        app: mcp-example
    spec:
      containers:
        - name: mcp-example
          image: localhost:5000/mcp-example:1.0.0
          ports:
            - containerPort: 8080

Using AKS for Managed Kubernetes

For production MCP Gateway Azure deployments, Azure Kubernetes Service (AKS) is the recommended platform. The repo includes a one-click Azure deployment template that provisions:

Resource Type
mg-aks-<label> AKS Cluster
mgreg<label> Azure Container Registry
mg-storage-<label> Cosmos DB Account
mg-aag-<label> Application Gateway
mg-ai-<label> Application Insights
mg-identity-<label> Managed Identity

AKS gives you Workload Identity (no credential management), Azure CNI for network policy support, and native integration with Azure Monitor - everything you need for a production deploy MCP server Kubernetes setup.


MCP Gateway vs Traditional API Gateway: What's Different?

MCP Gateway isn't a replacement for Kong or Azure APIM - it's a complement. It handles things a traditional API gateway simply wasn't built for.

Feature Traditional API Gateway MCP Gateway
Session-aware routing
MCP protocol support
AI agent lifecycle management
Tool catalog management
Standard HTTP REST
Rate limiting ✅ (via APIM integration)
Auth (OAuth/Entra ID)
Kubernetes-native StatefulSet routing
Per-adapter log access API
MCP server deployment management

The sweet spot: use Azure API Management (APIM) in front of MCP Gateway for rate limiting, an external developer portal, and API versioning. Let MCP Gateway handle everything specific to the MCP protocol and AI agent lifecycle.


Real-World Enterprise Use Cases

Multi-Agent Orchestration with AutoGen or Semantic Kernel

Teams building enterprise AI agents MCP workflows with Microsoft's AutoGen or Semantic Kernel use MCP Gateway as the central tool broker. Each agent in the orchestration connects to /mcp, and the Tool Gateway Router dispatches calls to the right tool server - whether that's a database query tool, a code execution server, or a document retrieval service. No agent needs to know the topology.

Secure Tool Access for GitHub Copilot Extensions

GitHub Copilot extensions in VS Code connect to MCP servers for context-aware assistance. In regulated enterprises, you can't expose raw MCP servers to developer workstations. MCP Gateway sits in the middle: Copilot connects to the gateway endpoint, Entra ID validates the developer's identity, and RBAC ensures they only see the tools their team is authorized to use.

Centralized AI Tool Governance for Regulated Industries

In financial services and healthcare, every AI tool access needs an audit trail. MCP Gateway's centralized auth and logging means every tool call - which agent, which tool, which user identity, at what time - flows through a single observable point. Combined with Azure Monitor and Log Analytics, compliance teams get the audit logs they need without instrumenting each MCP server individually.


Key Takeaways

  • MCP Gateway solves the enterprise AI tooling problem - it's the governed, observable layer between your AI agents and your tools.
  • It's Kubernetes-native by design. StatefulSets, headless services, and namespace isolation are first-class features, not afterthoughts.
  • The control plane and data plane are separate concerns. DevOps manages adapters via REST APIs; agents consume tools via the data plane endpoints.
  • Security is built in. Entra ID auth and RBAC are core features - not plugins. In production, combine them with network policies and Key Vault secret rotation.
  • Observability is ready to wire up. Prometheus metrics, OpenTelemetry tracing, and per-adapter log APIs are available out of the box.
  • AKS MCP Gateway deployments are one-click in Azure. The Bicep template provisions AKS, ACR, Cosmos DB, Application Gateway, and Managed Identity together.
  • It's MIT-licensed and actively maintained at github.com/microsoft/mcp-gateway.

FAQ

What is Microsoft MCP Gateway?

Microsoft MCP Gateway is an open-source reverse proxy and management layer for Model Context Protocol (MCP) servers. It provides session-aware stateful routing, lifecycle management (deploy/update/delete via REST APIs), and centralized Entra ID authentication for MCP servers running in Kubernetes. Think of it as the control tower for all your AI agent tool connections.

Is MCP Gateway production-ready?

The core gateway - adapter management, data plane routing, Entra ID auth, Cosmos DB metadata store, and AKS deployment - is production-ready. The Agents and Sessions subsystem (LLM-driven agent runs via /sessions/run) is currently in preview and is recommended for single-replica evaluation deployments only. For production multi-agent workloads, use the gateway as a routing and management layer and run your LLM orchestration in AutoGen or Semantic Kernel.

Does MCP Gateway work with AKS?

Yes - AKS is the primary production deployment target. The repo includes a one-click Azure deployment template (Bicep) that provisions an AKS cluster, ACR, Cosmos DB, Application Gateway, and Managed Identity together. The gateway runs as a Kubernetes Deployment on AKS, with Workload Identity for credential-less authentication to Azure services.

How does MCP Gateway handle authentication?

In production (cloud mode), every request to both the control plane and data plane requires an Entra ID bearer token. The gateway validates the token, extracts the caller's identity and roles, and enforces RBAC at the resource level. In local dev mode, anonymous access is available for rapid iteration. The gateway also supports a shared secret (GatewaySettings__Secret) for secure service-to-service communication between the gateway and the Tool Gateway Router.

Can I use MCP Gateway without Azure?

Partially. The gateway itself is open-source .NET and runs on any Kubernetes cluster - no Azure required for the core routing and management functionality. However, the metadata store defaults to Cosmos DB (Azure), and the auth layer is built around Microsoft Entra ID. For a fully non-Azure setup, you'd need to swap the metadata store implementation and auth provider - which is possible but requires code changes.

What AI agents are compatible with MCP Gateway?

Any agent or client that speaks the Model Context Protocol over streamable HTTP can connect to MCP Gateway. This includes GitHub Copilot (via VS Code MCP server configuration), Azure OpenAI agents using the MCP client SDK, AutoGen multi-agent frameworks, Semantic Kernel with MCP plugin support, and any MCP-compatible client using the url transport pointing at /adapters/{name}/mcp or /mcp.


Useful Resources