The Three-Layer AI Agent Stack: MCP, A2A, and Streamable HTTP Explained

MCP, A2A, and Streamable HTTP are the three protocols that form the modern AI agent stack. Here's exactly how they fit together — and why it matters for every developer building with AI.

MK

Mohammed Kafeel

Machine Learning Researcher

June 24, 202611 min read
On this page

AI agents are everywhere - but most of them still can't talk to each other. Every integration is a one-off, custom-coded bridge that breaks the moment something changes. That's the problem the industry is finally solving, and it's solving it with a three-layer protocol stack: MCP, A2A, and Streamable HTTP.

By the end of this post, you'll know exactly what each protocol does, how they fit together, and where to start building.


What Is the AI Agent Stack?

The AI agent stack is the set of protocols and infrastructure that lets AI agents connect to tools, data, and other agents. Without it, every integration is a fragile, hand-rolled piece of code that only works in one context.

Think about how the web works. HTTP moves data. HTML structures content. DNS resolves addresses. No single company owns any of it, and any browser can talk to any server. That's why the web scaled.

Agentic AI is building its own version of that stack. And right now, three protocols are doing the heavy lifting.

  • MCP - connects agents to tools and data sources
  • A2A - connects agents to other agents
  • Streamable HTTP - moves all of it over the network

Without a standard stack, you're writing custom glue code forever. With it, any agent can plug into any tool or collaborate with any other agent - regardless of who built it.


Layer 1: What Is the Model Context Protocol (MCP)?

MCP is an open standard for connecting AI applications to external systems. Anthropic launched it on November 25, 2024, and donated it to the Linux Foundation on December 9, 2025, where it now lives under the Agentic AI Foundation (AAIF). (For a full primer, see what MCP is.)

The best analogy? MCP is the USB-C port for AI. Before USB-C, every device had a different cable. Now there's one standard connector that works everywhere. MCP does the same thing for AI agents and tools.

The AAIF co-founders are Anthropic, Block, and OpenAI. Supporting members include AWS, Google, Microsoft, Cloudflare, GitHub, and Bloomberg. This isn't a startup project - it's the industry converging on a standard.

As of late 2025, MCP has over 10,000 published servers and 97 million+ monthly SDK downloads across Python and TypeScript.

How Does MCP Work?

Every MCP interaction involves three participants:

  • Host - the AI application (Claude, VS Code, Cursor)
  • Client - the component inside the host that manages communication
  • Server - the external service that provides tools or data (GitHub, Notion, a database)

All messages use JSON-RPC 2.0 - a lightweight, well-understood remote procedure call format. The client sends a request; the server sends back a response. Simple. (For a deeper look at the MCP architecture and how host, client, and server divide responsibilities, see our dedicated breakdown.)

MCP supports two transport mechanisms:

  • STDIO - for local tools. The host launches the MCP server as a subprocess and communicates via standard input/output. Zero network overhead.
  • Streamable HTTP - for remote/cloud servers. We'll cover this in detail in Layer 3.

What Can You Do with MCP?

The real-world use cases are already here:

  • Claude Code generating full web apps by reading Figma designs directly through an MCP server
  • Enterprise chatbots connecting to multiple internal databases simultaneously - no custom connectors needed
  • AI models creating 3D designs and sending them directly to a 3D printer via an MCP server
  • Developer tools like VS Code and Cursor using MCP to access GitHub, run terminal commands, and query documentation

The Model Context Protocol explained simply: it's the layer that gives your AI agent hands. Without MCP, an agent can only think. With MCP, it can act.


Layer 2: What Is the Agent2Agent (A2A) Protocol?

A2A is an open protocol that lets AI agents communicate with, delegate to, and coordinate with other AI agents. Google launched it on April 9, 2025, with support from 50+ partners at launch - including Atlassian, Box, Salesforce, SAP, ServiceNow, Workday, PayPal, Deloitte, McKinsey, and Accenture.

Here's the key distinction you need to remember:

MCP connects agents to tools. A2A connects agents to other agents.

MCP is about giving an agent access to a calendar API. A2A is about one agent asking a different agent to handle the scheduling - and getting a result back. (We unpack this fully in MCP vs A2A.)

By April 2026, the number of supporting organizations had grown to over 150, including Microsoft, AWS, and IBM. Google has since donated the protocol to the Linux Foundation for vendor-neutral governance.

How Does A2A Work?

A2A was built on five design principles:

  1. Embrace agentic capabilities - agents collaborate in natural, unstructured ways, not just as tools
  2. Build on existing standards - HTTP, SSE, JSON-RPC (easy to integrate with existing stacks)
  3. Secure by default - enterprise-grade auth from day one
  4. Support long-running tasks - from instant responses to multi-day workflows with human-in-the-loop
  5. Modality agnostic - text, audio, video, whatever the task needs

The core concepts in A2A:

  • Agent Card - a JSON file that advertises what an agent can do. Think of it as a business card for AI. Other agents discover capabilities by reading it.
  • Task - the unit of work. It has a lifecycle and can complete in milliseconds or run for days.
  • Artifact - the output of a completed task (a document, a dataset, a booking confirmation).
  • Client Agent - the agent that requests work to be done.
  • Remote Agent - the specialist agent that actually does the work.

Real-world example: A hiring manager asks their AI assistant to find software engineering candidates. The assistant (client agent) reads Agent Cards to find specialist agents, then delegates: one agent sources candidates from LinkedIn, another checks availability and schedules interviews, a third runs background checks. Each agent reports back via A2A. The hiring manager sees a unified result - without knowing or caring which agents handled which piece. (For the nuts and bolts of multi-agent workflow design, see our step-by-step guide.)

MCP vs A2A - What's the Difference?

Feature MCP A2A
Purpose Connect agents to tools & data Connect agents to other agents
Created by Anthropic (now Linux Foundation/AAIF) Google (now Linux Foundation)
Launched November 2024 April 2025
Analogy USB-C port for AI Phone network for AI agents
Use case Access a database, run a tool Delegate a task to a specialist agent
Relationship Complementary Complementary

They're not competing. You'll use both. MCP gives each agent its tools; A2A lets agents hand work off to each other.


Layer 3: What Is Streamable HTTP in MCP?

Streamable HTTP is MCP's modern transport mechanism for remote and cloud-based connections. It replaced the older HTTP+SSE transport from protocol version 2024-11-05, landing in the 2025-03-26 spec update. (For the full transport landscape, see our MCP transport comparison.)

If MCP is the USB-C standard, Streamable HTTP is the cable that makes it work over the internet - not just on your local machine.

How Does Streamable HTTP Work?

The server runs as an independent process that can handle multiple client connections simultaneously. It exposes a single HTTP endpoint (like https://example.com/mcp) that supports both POST and GET.

Here's the flow:

  1. Client sends a POST request to the MCP endpoint with a JSON-RPC message in the body
  2. The client must include an Accept header listing both application/json and text/event-stream
  3. The server decides how to respond:
    • Simple response? Returns Content-Type: application/json with a single JSON blob
    • Complex, multi-part response? Opens an SSE stream with Content-Type: text/event-stream and sends multiple messages over time
  4. The client must handle both cases

For listening to server-initiated messages, the client can issue an HTTP GET to open a standing SSE stream.

Session management uses an Mcp-Session-Id header - a cryptographically secure token that tracks logically related interactions across multiple requests.

Security is built in: servers must validate the Origin header on all connections (to prevent DNS rebinding attacks), and OAuth 2.1 with RFC 8707 handles authentication for remote deployments.

Streamable HTTP vs STDIO - When to Use Which?

STDIO Streamable HTTP
Best for Local tools, CLI integrations Remote/cloud servers
Network overhead None Low
Streaming No Yes (via SSE)
Auth support No Yes (OAuth 2.1)
Session management No Yes (Mcp-Session-Id)
Use case Desktop apps, local dev Production cloud deployments

Start with STDIO when you're experimenting locally. Move to Streamable HTTP when you're deploying to production.

Why "Streamable"?

The name captures the key design choice: the server gets to decide, per response, whether to return a single JSON blob or open a streaming connection.

For a simple lookup - "what's the weather?" - a single JSON response is fine. For a long-running task - "analyze this 500-page document and summarize the key risks" - the server opens an SSE stream and sends progress updates, partial results, and the final answer as they become available.

This flexibility is what makes Streamable HTTP powerful for real agentic workloads. Full deprecation of the old HTTP+SSE transport is expected by June 30, 2026.


The Full AI Agent Stack in Action

Let's walk through a real end-to-end example to see all three layers working together.

The request: "Research the top 5 competitors and book a meeting with the sales team."

Here's what actually happens:

  1. MCP (Layer 1) kicks in first. The agent uses MCP servers to connect to a web search tool, the company CRM, and a calendar API. It has the tools it needs.

  2. A2A (Layer 2) handles delegation. The agent recognizes this is two distinct jobs. It reads Agent Cards to find a specialist research agent and a specialist scheduling agent. Via A2A, it sends a Task to the research agent: "find the top 5 competitors." Simultaneously, it sends a Task to the scheduling agent: "find a 30-minute slot with the sales team this week."

  3. Streamable HTTP (Layer 3) carries everything. All of this communication - the MCP tool calls, the A2A task delegation, the streaming progress updates back to the user - flows over Streamable HTTP. The user sees real-time updates as results come in.

The three-layer analogy that makes this stick:

  • MCP is the power outlets - every agent plugs into the tools it needs
  • A2A is the electrical grid - it routes work between agents across the system
  • Streamable HTTP is the wiring - it carries the current reliably from source to destination

Why This Stack Is a Big Deal

Before these protocols existed, every AI integration was a custom build. You'd write bespoke code to connect your AI to Salesforce, then write different bespoke code to connect it to your database, then write more bespoke code to get two AI systems to hand work off to each other. Fragile. Expensive. Not reusable.

Now the equation changes completely.

  • Any agent can connect to any tool via MCP - no custom connectors
  • Any agent can delegate to any other agent via A2A - no custom orchestration code
  • All of it runs over Streamable HTTP - no custom transport layer

The industry backing makes this more than a technical experiment. The Linux Foundation governs both MCP and A2A. The founding and supporting members of the AAIF include Anthropic, OpenAI, Block, Google, Microsoft, AWS, Cloudflare, and Bloomberg. On the A2A side: Salesforce, SAP, ServiceNow, Workday, and 150+ others.

When this many major players align on the same standards, those standards become infrastructure. That's what's happening here.


How to Get Started with the AI Agent Stack

You don't need to implement all three layers at once. Here's a practical path:

Step 1: Clarify your use case. Do you need an agent to access tools or data? → Start with MCP. Do you need multiple agents to collaborate? → Add A2A once MCP is working.

Step 2: Explore MCP servers. Visit modelcontextprotocol.io - there are 10,000+ published servers covering GitHub, Slack, Google Drive, Notion, databases, and more. You probably don't need to build one from scratch.

Step 3: Try a local MCP integration first. Use STDIO transport for your first experiment. No network setup, no auth configuration - just a subprocess and standard I/O. Get comfortable with the Host → Client → Server pattern before adding network complexity.

Step 4: Move to Streamable HTTP for production. When you're ready to deploy to the cloud, switch to Streamable HTTP transport. Set up your single /mcp endpoint, implement the Accept header handling on the client side, and add OAuth 2.1 for auth.

Step 5: Explore A2A. Check out google.github.io/A2A for the spec, code samples, and worked examples. Start by implementing an Agent Card for your agent - that's the entry point to the A2A ecosystem.

Step 6: Follow the ecosystem. The AAIF (Agentic AI Foundation) at the Linux Foundation is where governance happens. Watch lfaidata.foundation for spec updates, new working groups, and community contributions.


FAQ

What is MCP in AI?

MCP (Model Context Protocol) is an open standard that lets AI applications connect to external tools and data sources. It was created by Anthropic, launched in November 2024, and donated to the Linux Foundation's Agentic AI Foundation in December 2025. Think of it as a universal connector - like USB-C - that lets any AI agent plug into any tool without custom integration code.

What is the difference between MCP and A2A?

MCP connects an AI agent to tools and data (databases, APIs, calendars). A2A connects an AI agent to other AI agents (enabling delegation, collaboration, and multi-agent workflows). They're complementary, not competing. In a real system, you'll typically use both: MCP gives each agent its capabilities, A2A lets agents hand work off to each other.

What is Streamable HTTP in MCP?

Streamable HTTP is MCP's transport mechanism for remote and cloud-based connections. It uses a single HTTP endpoint supporting both POST (to send messages) and GET (to open a streaming connection). The server can respond with a simple JSON blob for quick requests, or open an SSE stream for long-running tasks. It replaced the older HTTP+SSE transport in the March 2025 spec update (2025-03-26).

Do I need all three protocols for an AI agent?

Not necessarily. A simple local agent that just needs tool access can get by with MCP over STDIO. You add Streamable HTTP when you deploy to the cloud, and you add A2A when you need multiple agents to collaborate. Think of it as a progression: start with MCP, scale up from there.

Is MCP open source?

Yes. MCP is fully open source under the Apache 2.0 license. It's governed by the Agentic AI Foundation (AAIF), a directed fund under the Linux Foundation. The spec, SDKs, and server implementations are all publicly available at modelcontextprotocol.io.

Who created the A2A protocol?

Google created and launched the Agent2Agent (A2A) protocol on April 9, 2025, with contributions from 50+ technology partners at launch. Google has since donated the protocol to the Linux Foundation for vendor-neutral, community-driven governance. The protocol is open source under the Apache 2.0 license.

What replaced HTTP+SSE in MCP?

Streamable HTTP replaced the HTTP+SSE transport from protocol version 2024-11-05. The new transport was introduced in the 2025-03-26 spec update. The key improvement: a single endpoint handles both simple JSON responses and streaming SSE connections, making it more flexible and easier to deploy. Full deprecation of the old HTTP+SSE transport is expected by June 30, 2026.


Useful Sources