How to Build Multi-Agent Workflows with MCP Task Delegation

A hands-on guide to building production-ready multi-agent workflows with MCP task delegation — architecture patterns, Python code, state management, and best practices for 2026.

MK

Mohammed Kafeel

Machine Learning Researcher

June 24, 202614 min read
On this page

What if your AI agent could delegate tasks to 10 specialized sub-agents - automatically - without you writing a single routing function?

That's not a thought experiment anymore. With the Model Context Protocol (MCP), it's a production pattern that teams are shipping right now. The catch? Most tutorials stop at "hello world" and leave you stranded when things break at scale.

This guide doesn't do that. You'll get the full picture: architecture, three core delegation patterns, a step-by-step build walkthrough with real async Python, a concrete sales-research example with benchmarks, and every pitfall we've hit in production so you don't have to.


What Is MCP and Why Does It Matter for Multi-Agent Systems?

MCP (Model Context Protocol) is an open standard introduced by Anthropic in November 2024. The simplest way to think about it: it's the USB-C port for AI agents. Before MCP, every agent needed its own bespoke connector to every tool - a custom Salesforce integration here, a hand-rolled Fireflies API wrapper there. MCP replaces that chaos with a single, standardized client-server interface using JSON-RPC 2.0 over stdio or HTTP. (New to the protocol? Start with what MCP is.)

The protocol has three components: MCP Servers expose tools and data; MCP Clients are the connectors inside your agent application; and MCP Hosts are the AI applications (your orchestrators) that coordinate everything. By June 2025, the spec had matured to include OAuth 2.0 security, structured JSON tool output, and elicitation support.

Why does this matter for multi-agent AI architecture? Single-agent systems hit a ceiling fast. Context windows overflow on complex tasks, a generalist agent can't match a specialist's output quality, and you can't parallelize work inside one LLM call. A MCP multi-agent system solves all three: each agent stays focused, agents run in parallel, and MCP gives them a shared, secure way to call the same tools. (MCP handles tool access while a separate protocol handles agent-to-agent coordination - we draw that line clearly in MCP vs A2A protocol.)

One critical thing to get right upfront: MCP is not an orchestration engine. It doesn't decide which agent runs next or manage workflow state. That's your job - and frameworks like LangGraph are built for exactly that. (For where MCP sits among the protocols, see the AI agent stack architecture.)


How Does MCP Task Delegation Actually Work?

The core model is orchestrator → MCP server → sub-agents. Here's the flow in plain terms.

  1. A supervisor (orchestrator) agent receives a high-level goal from the user.
  2. It breaks the goal into subtasks using its own reasoning (usually an LLM call with a structured prompt).
  3. Each subtask maps to a specialized sub-agent, registered as a callable tool in the supervisor's MCP context.
  4. The supervisor invokes those tools via MCP - the sub-agents execute, call their own MCP servers (databases, APIs, external services), and return structured results.
  5. The supervisor aggregates the results and produces the final output.

The key insight is that sub-agents look like tools to the supervisor. The supervisor doesn't need to know how a CRM agent works internally - it just calls crm_agent.get_deal_status(deal_id) and gets back a Pydantic model. This is what makes MCP agent handoffs clean and composable. (If you're fuzzy on what counts as a tool vs. a resource, see MCP tools vs resources vs prompts.) If a sub-agent's own MCP server needs to reason mid-call, that's where MCP sampling for agent queries comes in.

Latency-wise, the MCP protocol itself adds roughly ~10ms p95 overhead per tool call - negligible for most workflows, but worth budgeting when you're chaining 5+ agents in sequence.

⚠️ Important: The supervisor's routing logic is where most bugs live. If the supervisor misroutes a task, you get a silent wrong answer, not a loud error. Always add structured output validation on every handoff.


The 3 Core Multi-Agent Patterns with MCP

Pattern 1 - Handoffs (Single Agent Delegates to Specialized Sub-Agents)

What it is: The supervisor receives a task, identifies which specialist can handle it, and hands it off entirely. The sub-agent completes the work and returns a result. Control flows back to the supervisor.

When to use it: Customer support triage, document routing, intent classification followed by domain-specific processing.

Real-world example: A user asks "What's the renewal risk for Acme Corp?" The supervisor hands off to a CRM agent (Salesforce), which returns deal data. Done.

Pros Cons
Simple to reason about and debug Sub-agents can't collaborate directly
Low latency (single hop) Supervisor becomes a bottleneck
Easy to test in isolation Poor fit for tasks requiring cross-agent context

Pattern 2 - Sequential Chaining (Pipeline Workflows)

What it is: Output from Agent A becomes input to Agent B, which feeds Agent C. Each agent transforms the data and passes it downstream - a classic pipeline.

When to use it: Document processing pipelines, data enrichment workflows, multi-step research where each step depends on the previous one.

Real-world example: Raw meeting transcript → Summarizer Agent → Sentiment Agent → Risk Scorer Agent → final report.

Pros Cons
Easy to reason about data flow Errors cascade downstream
Each step is independently testable Total latency = sum of all steps
Great for ETL-style AI workflows Not suitable for parallel workloads

Pattern 3 - Agent Graphs (Parallel + Branching Topologies)

What it is: Multiple agents run in parallel or conditionally, based on workflow state. This is the most powerful pattern - and the most complex. LangGraph's StateGraph is purpose-built for this.

When to use it: Any workflow where independent subtasks can run simultaneously (e.g., querying Salesforce and Fireflies at the same time), or where the next step depends on the result of a conditional check.

Real-world example: A sales research workflow fires a CRM agent, a web research agent, and a meeting notes agent in parallel. The supervisor waits for all three, then runs a risk-scoring agent on the combined output.

Pros Cons
Dramatically lower end-to-end latency Harder to debug; non-deterministic execution order
Scales to complex, real-world tasks Requires careful state management
Supports conditional branching and loops Higher engineering overhead upfront

Step-by-Step: Building Your First MCP Multi-Agent Workflow

We'll build a supervisor-worker system using LangGraph for orchestration and langchain-mcp-adapters for MCP connectivity. The pattern here is production-grade - async Python, structured state, real error handling.

Step 1: Define Your Agent Roles and Responsibilities

Before writing a line of code, write down what each agent does - and what it explicitly does not do. Single-responsibility is non-negotiable.

A good agent definition answers three questions: What data does it need as input? What tool(s) does it call? What structured type does it return?

# agent_roles.py
from pydantic import BaseModel
from typing import Optional

class CRMResult(BaseModel):
    deal_id: str
    stage: str
    amount: float
    close_date: str
    sentiment_score: Optional[float] = None

class ResearchResult(BaseModel):
    company: str
    recent_news: list[str]
    competitor_mentions: list[str]

class SupervisorOutput(BaseModel):
    deal_health_score: float  # 0.0 - 1.0
    risk_factors: list[str]
    recommended_actions: list[str]
    confidence: float
    processing_time_seconds: float

Step 2: Set Up Your MCP Server(s)

Each sub-agent connects to its own MCP server. You can run them locally via stdio or remotely via streamable HTTP. Here's a minimal MCP server definition using the official Python SDK:

# crm_mcp_server.py
from mcp.server.fastmcp import FastMCP

mcp = FastMCP("crm-server")

@mcp.tool()
async def get_deal_status(deal_id: str) -> dict:
    """Fetch deal data from CRM by deal ID."""
    # Replace with your actual Salesforce/CRM API call
    return {
        "deal_id": deal_id,
        "stage": "Negotiation",
        "amount": 850000.0,
        "close_date": "2026-07-31",
    }

if __name__ == "__main__":
    mcp.run()  # Runs on stdio by default

(If this is your first server, our build an MCP server in Python walkthrough covers the setup in full.)

Step 3: Create Your Specialized Sub-Agents

Each sub-agent is a LangGraph react_agent that loads its tools from an MCP server. Keep the system prompt narrow and explicit.

# agents.py
import os
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

async def create_crm_agent():
    """CRM agent: reads deal data from Salesforce via MCP."""
    client = MultiServerMCPClient({
        "crm_server": {
            "command": "python",
            "args": ["./crm_mcp_server.py"],
            "transport": "stdio",
        }
    })
    tools = await client.get_tools()

    return create_react_agent(
        model=llm,
        tools=tools,
        prompt=(
            "You are a CRM specialist. "
            "Fetch deal data and return structured results only. "
            "Do not add commentary or analysis."
        ),
        name="crm_agent",
    )

async def create_research_agent():
    """Web research agent: fetches recent news and competitor signals."""
    client = MultiServerMCPClient({
        "search_server": {
            "url": "http://localhost:8001/mcp",
            "transport": "streamable_http",
        }
    })
    tools = await client.get_tools()

    return create_react_agent(
        model=llm,
        tools=tools,
        prompt=(
            "You are a web research specialist. "
            "Find recent news and competitor mentions for the given company. "
            "Return structured data only."
        ),
        name="research_agent",
    )

Step 4: Build the Supervisor/Orchestrator Agent

The supervisor knows about all sub-agents and decides who does what. Register sub-agents as tools using LangGraph's create_supervisor helper.

# supervisor.py
from langgraph_supervisor import create_supervisor

async def build_supervisor():
    crm_agent = await create_crm_agent()
    research_agent = await create_research_agent()

    supervisor = create_supervisor(
        model=llm,
        agents=[crm_agent, research_agent],
        prompt=(
            "You are a deal intelligence supervisor. "
            "For any deal analysis request:\n"
            "1. Send the deal_id to crm_agent for CRM data.\n"
            "2. Send the company name to research_agent for market context.\n"
            "3. Synthesize both results into a deal health score (0-100) "
            "and a list of risk factors.\n"
            "Always delegate - do not perform research yourself."
        ),
        output_mode="full_history",
    )
    return supervisor.compile()

Step 5: Wire Up Task Delegation with MCP Tools

With the supervisor compiled, invoking the workflow is straightforward. Pass a thread_id for session continuity.

# main.py
import asyncio
from supervisor import build_supervisor

async def main():
    graph = await build_supervisor()

    result = await graph.ainvoke(
        {
            "messages": [{
                "role": "user",
                "content": "Analyze deal D-4821 for Acme Corp renewal risk."
            }]
        },
        config={"configurable": {"thread_id": "session-acme-001"}}
    )

    # Final message contains the supervisor's synthesized output
    print(result["messages"][-1].content)

asyncio.run(main())

Step 6: Add State Management and Context Passing

Agents are stateless by default. You need to pass context explicitly - either inline in the message payload or via an external store. More on this in the state management section below.

# Pass structured context with every delegation
workflow_context = {
    "workflow_id": "wf-acme-20260624",
    "deal_id": "D-4821",
    "company": "Acme Corp",
    "requested_by": "joe.smith@company.com",
    "priority": "high",
}

result = await graph.ainvoke(
    {
        "messages": [{
            "role": "user",
            "content": f"Analyze deal. Context: {workflow_context}"
        }]
    },
    config={"configurable": {"thread_id": workflow_context["workflow_id"]}}
)

Step 7: Test and Observe Your Workflow

Test each agent in isolation before composing. Call crm_agent.ainvoke(...) directly with known inputs and assert the output matches your Pydantic schema. Only then wire it into the supervisor.

For observability, add LangSmith tracing with two lines:

import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"
os.environ["LANGCHAIN_API_KEY"] = os.getenv("LANGSMITH_API_KEY")

Every agent call, tool invocation, and handoff now appears in your LangSmith dashboard with latency, token counts, and full input/output traces.


Real-World Example - AI Sales Research Workflow

Let's make this concrete. Here's the exact workflow that TrueFoundry documented in their April 2026 production case study - a supervisor managing a Salesforce CRM agent, a Fireflies meeting notes agent, and a web research agent to produce a deal health score and prep brief.

The benchmark numbers are real:

  • Processing time: ~2.3 seconds end-to-end for a full deal brief
  • MCP latency overhead: ~10ms p95 per tool call
  • Prep time reduction: 60% compared to manual tab-switching
  • Pipeline velocity improvement: 20% increase in deal progression

Here's the supervisor setup, adapted from TrueFoundry's production code:

# sales_supervisor.py
import os
from langchain_mcp_adapters.client import MultiServerMCPClient
from langgraph.prebuilt import create_react_agent
from langgraph_supervisor import create_supervisor
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o", temperature=0)

async def create_fireflies_agent():
    """Meeting notes agent: retrieves and analyzes call transcripts."""
    client = MultiServerMCPClient({
        "fireflies": {
            "url": os.getenv("FIREFLIES_MCP_URL"),
            "transport": "streamable_http",
            "headers": {
                "Authorization": f"Bearer {os.getenv('FIREFLIES_KEY')}"
            },
        }
    })
    tools = await client.get_tools()

    return create_react_agent(
        model=llm,
        tools=tools,
        prompt=(
            "You are a Fireflies meeting assistant. "
            "Summarize call transcripts, extract action items, "
            "and detect sentiment signals. "
            "Return structured data only - no extra commentary."
        ),
        name="fireflies_agent",
    )

async def create_sales_supervisor():
    """Supervisor that orchestrates CRM, Fireflies, and research agents."""
    crm_agent = await create_crm_agent()
    fireflies_agent = await create_fireflies_agent()
    research_agent = await create_research_agent()

    return create_supervisor(
        model=llm,
        agents=[crm_agent, fireflies_agent, research_agent],
        prompt=(
            "You are a deal intelligence supervisor managing three agents:\n"
            "- crm_agent: Salesforce deal data, stage, amount, close date\n"
            "- fireflies_agent: meeting transcripts, sentiment, action items\n"
            "- research_agent: web news, competitor mentions, leadership changes\n\n"
            "For each deal analysis request:\n"
            "1. Delegate to all three agents (they can run in parallel).\n"
            "2. Synthesize results into a Deal Health Score (0-100).\n"
            "3. List risk factors with severity (High/Medium/Low).\n"
            "4. Recommend 3 specific next actions.\n"
            "Do not do any research yourself."
        ),
        add_handoff_back_messages=True,
        output_mode="full_history",
    ).compile()

The output for a real deal looks like this:

Deal Health Score: 68/100 (At-Risk)

Risk Factors:
  [HIGH]  CFO transition may delay Q3 decision (LinkedIn, July 14)
  [MED]   Competitor "TechRival" mentioned 3x in recent calls
  [MED]   Budget sensitivity flagged in 2 of last 3 Fireflies calls

Recommended Actions:
  1. Schedule exec-to-exec call before July 25 to re-anchor on value
  2. Prepare ROI one-pager addressing CFO's cost concerns
  3. Request mutual action plan to lock in close date

Confidence: 87% | Processing time: 2.3s

That output used to take 45 minutes of tab-switching. Now it takes 2.3 seconds.


How Do You Manage State and Context Across Agents?

This is where most MCP multi-agent systems break in production. Agents are stateless by default. Each MCP tool call is a fresh invocation with no memory of what happened before - unless you explicitly pass that context.

There are three practical approaches:

1. Inline context in the message payload (simplest, works for short workflows)

Pass a workflow_id, relevant IDs, and any shared metadata directly in the message. Every agent can read it from the conversation history.

# state_management.py
import redis
import json
from typing import Any

# External state store for long-running or parallel workflows
redis_client = redis.Redis(host="localhost", port=6379, decode_responses=True)

def save_workflow_state(workflow_id: str, state: dict[str, Any]) -> None:
    """Persist workflow state to Redis with a 24-hour TTL."""
    redis_client.setex(
        f"workflow:{workflow_id}",
        86400,  # 24 hours
        json.dumps(state)
    )

def load_workflow_state(workflow_id: str) -> dict[str, Any] | None:
    """Load workflow state from Redis."""
    raw = redis_client.get(f"workflow:{workflow_id}")
    return json.loads(raw) if raw else None

2. LangGraph's built-in checkpointer (recommended for most production workflows)

LangGraph's SqliteSaver or PostgresSaver persists the full graph state between invocations. You get pause/resume, time-travel debugging, and automatic state recovery after failures.

from langgraph.checkpoint.postgres import PostgresSaver

checkpointer = PostgresSaver.from_conn_string(
    os.getenv("DATABASE_URL")
)
graph = supervisor.compile(checkpointer=checkpointer)

3. External Redis store (best for high-throughput parallel workflows)

When multiple agents need to read and write shared state concurrently, Redis gives you atomic operations and sub-millisecond reads. Use it to store intermediate results that agents need to share without going through the supervisor.

⚠️ Warning: Never store sensitive PII or credentials in workflow state. Use references (e.g., deal_id) and let each agent fetch what it needs via its own MCP server with proper auth.


What Are the Best Practices for MCP Multi-Agent Workflows?

We've run these systems in production. Here's what actually matters:

  1. Keep agents single-responsibility. An agent that does "CRM + research + summarization" is three agents badly dressed as one. When it breaks, you won't know which part failed.

  2. Use structured outputs everywhere (Pydantic models). Unstructured string outputs between agents are a reliability disaster. Define a Pydantic model for every agent's return type and validate at every handoff.

  3. Implement retry logic and fallbacks. MCP tool calls fail. Networks blip. Use tenacity for automatic retries with exponential backoff, and define a fallback behavior (e.g., return cached data, skip the agent, escalate to human). For high-stakes steps, wire in a human-in-the-loop approval gate before the agent acts.

    from tenacity import retry, stop_after_attempt, wait_exponential
    
    @retry(stop=stop_after_attempt(3), wait=wait_exponential(min=1, max=10))
    async def call_crm_agent_with_retry(deal_id: str):
        return await crm_agent.ainvoke({"messages": [...]})
    
  4. Add observability - tracing and per-agent logging. You can't debug what you can't see. LangSmith, Langfuse, or OpenTelemetry give you per-agent latency, token usage, and full input/output traces. Set this up before you go to production, not after.

  5. Secure your MCP servers (auth + RBAC). The June 2025 MCP spec update formalized OAuth 2.0 for MCP servers. Use it. Every MCP server should require a bearer token, and access should be scoped to the minimum required tools. Don't give a meeting-notes agent access to your CRM write tools.

  6. Test agents in isolation before composing. Write unit tests for each agent with mocked MCP tools. Validate that inputs and outputs match your Pydantic schemas. Composing untested agents is the fastest way to produce a system that fails in ways you can't explain. (For catching regressions across the whole pipeline, see our guide to testing multi-agent workflows.)


Common Pitfalls to Avoid

⚠️ These are the mistakes we see most often in MCP multi-agent systems:

  • Treating MCP as an orchestration engine. MCP handles tool connectivity. It doesn't manage workflow state, routing logic, or retry behavior. You need a real orchestration layer (LangGraph, CrewAI, or custom) on top.

  • Skipping state management. "We'll add Redis later" is how you end up with agents that lose context mid-workflow and produce nonsensical outputs. Design your state model before you write your first agent.

  • No error handling on handoffs. If a sub-agent returns an error or an empty result, the supervisor needs explicit logic to handle it - retry, fallback, or escalate. Without this, one flaky API call silently corrupts your entire workflow output.

  • Over-engineering the graph topology. Start with the simplest pattern that works (usually Handoffs or Sequential Chaining). Add parallelism and branching only when you have a measured latency problem to solve.

  • Ignoring latency budgets. Each agent hop adds latency. A 5-agent sequential chain with 2s per step = 10s total response time. Map out your latency budget before designing the topology. Use parallel execution (Pattern 3) where independent subtasks allow it.


MCP vs. Other Orchestration Approaches

Here's how MCP fits into the broader AI agent workflow landscape. The key thing to internalize: MCP and orchestration frameworks aren't competing - they're complementary layers.

Approach Role Best For Limitation
MCP Connectivity protocol (Agent ↔ Tool) Standardizing tool access across any framework Not an orchestrator - needs a framework on top
LangGraph Graph-based orchestration engine Complex, stateful, production-grade workflows with branching and loops Steeper learning curve; requires graph thinking
CrewAI Role-based orchestration framework Rapid prototyping; simple sequential or hierarchical agent crews Less control over state; harder to customize routing
Direct API calls No orchestration layer Simple single-agent tools, quick scripts Zero reusability, no state management, brittle at scale

The industry consensus by mid-2026: use LangGraph + MCP for production systems that need explicit state control and complex routing. Use CrewAI + MCP for fast prototyping where you want role-based delegation without writing graph logic. Never use direct API calls for anything you'll run more than once.


Frequently Asked Questions

What is MCP task delegation?

MCP task delegation is the pattern where a supervisor agent breaks a high-level goal into subtasks and routes each one to a specialized sub-agent via MCP tool calls. The sub-agent executes the task using its own MCP-connected tools (APIs, databases, services) and returns a structured result. The supervisor then aggregates all results. It's delegation by tool invocation - clean, typed, and auditable.

Is MCP the same as LangGraph or CrewAI?

No - they operate at different layers. MCP is a connectivity protocol that standardizes how agents call tools and access data. LangGraph and CrewAI are orchestration frameworks that manage workflow logic, state, and agent routing. In a production MCP multi-agent system, you typically use MCP inside LangGraph or CrewAI - MCP handles the tool connections, the framework handles the coordination.

How many agents can an MCP workflow handle?

Technically, there's no hard limit in the MCP spec. Practically, 3–7 specialized agents is the sweet spot for most production workflows. Beyond that, coordination overhead and debugging complexity grow faster than the benefits. Scale up based on measured need, not ambition.

How do I handle failures in an MCP multi-agent workflow?

Layer your defenses. First, add retry logic with exponential backoff (use tenacity) on every MCP tool call. Second, define fallback behaviors for each agent. Third, use LangGraph's checkpointing so a mid-workflow failure doesn't lose all progress. Fourth, set timeout thresholds per agent and escalate to a human reviewer if a threshold is breached.

Can MCP workflows run in parallel?

Yes - and this is one of the biggest performance wins. In Pattern 3 (Agent Graphs), independent sub-agents can execute simultaneously. In LangGraph, you do this by adding parallel branches in your StateGraph. The sales research example above fires the CRM, Fireflies, and web research agents concurrently, which is how it achieves the 2.3-second end-to-end time despite querying three external systems.

What frameworks work best with MCP for multi-agent systems?

LangGraph is the strongest choice for production systems - it gives you explicit state management, time-travel debugging, and fine-grained control over routing. CrewAI is excellent for rapid prototyping with role-based crews. PydanticAI is worth watching for teams that want tight type safety throughout. All three have native MCP integrations via langchain-mcp-adapters or their own MCP client libraries. The A2A protocol (Google, April 2025) is emerging as a complement for agent-to-agent communication across organizational boundaries.


Ready to Build?

You've got the full picture now - architecture, patterns, code, benchmarks, and the pitfalls to dodge. The next step is yours.

Pick the simplest workflow you want to automate, define two or three agent roles, stand up your MCP servers, and wire it together with LangGraph. Start with Pattern 1 (Handoffs). Get it working end-to-end. Add parallelism once you've measured the latency and know where the bottleneck is.

If this guide saved you hours of research, share it with your team. And if you build something interesting with it, we'd genuinely love to hear about it.


Useful Sources