Multi-Agent AI Systems — Complete Guide: Patterns, Frameworks, and Production Deployment
Multi-agent AI systems use multiple AI models working together — each with specialized roles, tools, and capabilities — to accomplish complex tasks that a single LLM cannot handle alone. This guide covers architectures, frameworks, real-world patterns, and what it takes to deploy multi-agent systems reliably in production.
Orchestrator
agent that coordinates other agents
Tool use
agents call APIs, databases, code executors
AutoGen
Microsoft's multi-agent framework
CrewAI
role-based multi-agent framework
What is a Multi-Agent System?
Multi-agent architecture overview
A multi-agent system is a network of AI agents, each with a specific role, set of tools, and memory. Agents communicate through messages. An orchestrator agent breaks down complex tasks and delegates to specialist agents — researcher, writer, code executor, reviewer. The result is greater capability than any single agent, with each agent optimized for its role.
User submits complex task
The task is too large or multi-dimensional for one agent: "Research competitor pricing, write a report, validate the data, and format for the board."
Orchestrator breaks it down
The orchestrator agent decomposes the task into subtasks and assigns each to the appropriate specialist agent with the right tools.
Research agent gathers data
The research agent uses web search, database queries, or API calls to gather relevant information. It returns structured findings to the orchestrator.
Writer agent composes content
The writer agent takes the research output and composes a structured document, report, or response following the required format.
Reviewer agent validates
The reviewer agent checks the output for accuracy, completeness, and quality. It either approves or returns specific revision requests.
Final output delivered
The orchestrator collects all outputs, resolves any conflicts, and delivers the final result to the user.
Agent Architecture Patterns
| Item | Pattern | Use Case |
|---|---|---|
| Pipeline (Sequential) | Chain: A → B → C → D, each step feeds the next | Document processing, content pipeline, ETL workflows where order matters |
| Supervisor | Orchestrator delegates to specialist sub-agents | Research + writing + coding tasks, complex multi-domain workflows |
| Peer-to-peer (Debate) | Agents discuss, critique, and vote on decisions | Code review, fact-checking, consensus tasks requiring adversarial review |
| Hierarchical | Tree of orchestrators managing sub-orchestrators | Enterprise-scale tasks simulating departments of agents |
| Parallel fan-out | Orchestrator spawns multiple agents simultaneously | Tasks that can be parallelized: analyzing multiple documents at once |
| Map-reduce | Fan out to process N items, aggregate results | Summarizing 100 articles, processing large datasets in parallel |
Building a Multi-Agent System with LangGraph
from langchain_anthropic import ChatAnthropic
from langgraph.graph import StateGraph, END
from typing import TypedDict, Annotated
import operator
# Define shared state passed between all agents
class AgentState(TypedDict):
task: str
research: str
draft: str
review: str
final: str
messages: Annotated[list, operator.add]
llm = ChatAnthropic(model="claude-sonnet-4-6")
def research_agent(state: AgentState):
"""Agent 1: Research the topic"""
response = llm.invoke([
{"role": "system", "content": "You are a research expert. Gather key facts and cite sources."},
{"role": "user", "content": f"Research thoroughly: {state['task']}"}
])
return {"research": response.content, "messages": [response]}
def writer_agent(state: AgentState):
"""Agent 2: Write based on research"""
response = llm.invoke([
{"role": "system", "content": "You are a professional writer. Be clear and structured."},
{"role": "user", "content": f"Write about: {state['task']}\n\nResearch: {state['research']}"}
])
return {"draft": response.content}
def reviewer_agent(state: AgentState):
"""Agent 3: Review and provide specific feedback"""
response = llm.invoke([
{"role": "system", "content": "You are a critical editor. Be specific about improvements."},
{"role": "user", "content": f"Review this draft:\n\n{state['draft']}\n\nList specific improvements needed."}
])
return {"review": response.content}
def finalize_agent(state: AgentState):
"""Agent 4: Incorporate review feedback"""
response = llm.invoke([
{"role": "user", "content": f"Revise based on review:\n\nDraft: {state['draft']}\n\nReview: {state['review']}"}
])
return {"final": response.content}
# Build the graph
workflow = StateGraph(AgentState)
workflow.add_node("research", research_agent)
workflow.add_node("writer", writer_agent)
workflow.add_node("reviewer", reviewer_agent)
workflow.add_node("finalize", finalize_agent)
# Sequential pipeline
workflow.set_entry_point("research")
workflow.add_edge("research", "writer")
workflow.add_edge("writer", "reviewer")
workflow.add_edge("reviewer", "finalize")
workflow.add_edge("finalize", END)
app = workflow.compile()
result = app.invoke({"task": "Explain quantum computing for developers", "messages": []})
print(result["final"])Key Frameworks Comparison
LangGraph (LangChain)
Graph-based workflow with explicit state management. Best for complex conditional flows, human-in-the-loop, and long-running agents. Production-ready with LangSmith observability and checkpoint/resume support.
AutoGen (Microsoft)
Conversational agent framework where agents talk to each other via messages. Best for research tasks and code generation with code execution. Easy to prototype, built-in Python code execution sandbox.
CrewAI
Role-based agents organized into crews with tasks. High-level abstraction — define agents, tasks, and process type (sequential or hierarchical). Best for structured team-like workflows that mirror human org structures.
Claude Agent SDK (Anthropic)
Native Anthropic SDK for building agents with tool use, computer use, and multi-turn conversations. Best when building production agents specifically with Claude that need tight integration with Anthropic features.
Swarm (OpenAI)
Lightweight framework for agent handoffs and multi-agent coordination. Simple API: agents hand off to each other based on function return values. Good for exploring agent patterns without framework overhead.
Semantic Kernel (Microsoft)
Enterprise-focused agent framework with .NET and Python support. Plugins, planners, and memory. Best for enterprises already invested in the Microsoft Azure AI ecosystem.
Tool Use — Extending Agent Capabilities
import anthropic
import json
client = anthropic.Anthropic()
# Define tools the agent can call
tools = [
{
"name": "web_search",
"description": "Search the web for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
},
{
"name": "get_database_record",
"description": "Fetch a record from the product database",
"input_schema": {
"type": "object",
"properties": {
"product_id": {"type": "string"}
},
"required": ["product_id"]
}
}
]
def execute_tool(name: str, inputs: dict) -> str:
"""Execute the tool and return result as string."""
if name == "web_search":
# Real implementation would call a search API
return f"Search results for '{inputs['query']}': [placeholder results]"
elif name == "get_database_record":
return json.dumps({"id": inputs["product_id"], "name": "Widget", "price": 29.99})
def run_agent(user_message: str) -> str:
messages = [{"role": "user", "content": user_message}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=4096,
tools=tools,
messages=messages,
)
# No tool calls — final answer
if response.stop_reason == "end_turn":
return response.content[0].text
# Process tool calls
tool_results = []
for block in response.content:
if block.type == "tool_use":
result = execute_tool(block.name, block.input)
tool_results.append({
"type": "tool_result",
"tool_use_id": block.id,
"content": result,
})
# Add agent response + tool results to conversation
messages.append({"role": "assistant", "content": response.content})
messages.append({"role": "user", "content": tool_results})
result = run_agent("What are the current reviews for product ID P123?")
print(result)Start with a single agent, not multi-agent
Production Considerations
| Item | Challenge | Solution |
|---|---|---|
| Failures in mid-pipeline | Any agent can fail, losing all upstream work | Checkpoint state after each step. LangGraph supports resumable workflows. |
| Infinite loops | Agents can get stuck in retry cycles | Set max_iterations on all loops. Use timeout limits per agent step. |
| Cost runaway | Costs multiply with every agent in the pipeline | Use Claude Haiku for simple steps, Sonnet for complex reasoning. Cache prompts. |
| Observability | Hard to debug what went wrong in a 5-agent pipeline | Use LangSmith, Langfuse, or Weave. Log all intermediate state. |
| Prompt injection | External content can inject instructions into agents | Sanitize all external inputs. Use system prompt separation. See Claude safety docs. |