AG-UI Protocol Explained: The New Standard for Human-Agent Interfaces

AG-UI is the open streaming event protocol that standardizes how AI agents communicate with frontends and human operators. Learn how it works, how it compares to MCP, and how to implement it.

by AnyCap

AG-UI Protocol Explained: The New Standard for Human-Agent Interfaces

As AI agents become capable enough to handle real workflows, a new infrastructure challenge has emerged: how do humans and agents communicate during execution—not just at the start and end, but throughout?

The AG-UI protocol is an open specification designed to solve exactly this. It defines a standard for how AI agents stream events, request input, and surface state to frontend applications and human operators in real time. If MCP (Model Context Protocol) standardized how agents access tools, AG-UI standardizes how agents talk to users.

This guide explains what AG-UI is, why it matters, how it works, and how to start using it in your agent stack.


The Problem AG-UI Solves

Before AG-UI, every team building a human-facing AI agent application had to invent its own communication protocol. How does the agent tell the frontend it's thinking? How does it request a human decision? How does the user send a correction mid-task? How is progress displayed?

The answers were different for every team—often ad hoc, poorly documented, and hard to reuse. This created a fragmented ecosystem where:

  • Agent frameworks couldn't share frontend components
  • Developers had to rebuild streaming UI infrastructure from scratch for every project
  • Users got inconsistent experiences across agent-powered products
  • Debugging agent behavior required custom logging in every implementation

AG-UI establishes a shared vocabulary and event structure so that any agent framework can produce events that any AG-UI-compatible frontend can render—without custom integration code.


What Is AG-UI?

AG-UI is an open, streaming event protocol that defines the format and semantics of messages exchanged between AI agents and user-facing interfaces.

It is:

  • Transport-agnostic: works over HTTP (Server-Sent Events), WebSockets, or any streaming transport
  • Framework-agnostic: can be implemented in any language or agent framework
  • Bidirectional: agents send events to the frontend; users send messages and interrupts to the agent
  • Stateful: the protocol includes state snapshots so frontends can reconstruct the full agent context at any point

It is not:

  • A tool protocol (that's MCP's job)
  • An agent framework itself
  • A UI component library (though reference implementations exist)

AG-UI vs. MCP: Understanding the Distinction

A common source of confusion is how AG-UI relates to Anthropic's Model Context Protocol (MCP).

Dimension MCP AG-UI
Purpose Agent ↔ Tool communication Agent ↔ Human/Frontend communication
Direction Agent calls tools, receives results Agent streams events; human sends messages
Audience Tool/server developers Frontend and agent framework developers
Focus What capabilities the agent can use How the agent communicates its state and progress
Relationship Handles the "tool" side of the agent Handles the "user interface" side

They're complementary. An agent running in production typically uses MCP to access tools (web search, image generation, code execution) and AG-UI to communicate its progress and request human input.


Core Concepts in AG-UI

Event Types

AG-UI defines a standard set of event types that agents emit:

Lifecycle events:

  • RUN_STARTED / RUN_FINISHED — the agent has begun or completed execution
  • STEP_STARTED / STEP_FINISHED — a discrete step within the workflow has started or ended
  • RUN_ERROR — the agent encountered an unrecoverable error

Message events:

  • TEXT_MESSAGE_START / TEXT_MESSAGE_CONTENT / TEXT_MESSAGE_END — streaming text output from the agent
  • TOOL_CALL_START / TOOL_CALL_ARGS / TOOL_CALL_END — the agent is invoking a tool

State events:

  • STATE_SNAPSHOT — a full snapshot of the current agent state
  • STATE_DELTA — an incremental update to the state
  • MESSAGES_SNAPSHOT — the full conversation history at a given point

Custom events:

  • CUSTOM — for application-specific events not covered by the standard set

The Human Turn

AG-UI also standardizes how humans interact with running agents. The frontend sends an AgentInput to interrupt, redirect, or provide information to the agent mid-execution. This is distinct from a new conversation turn—the agent is running, and the human is influencing its current task.

Thread-Based Architecture

AG-UI organizes agent runs into threads—persistent conversation contexts that maintain state across multiple runs. A thread in AG-UI is roughly equivalent to a session or conversation in other frameworks, but with explicit protocol support for resuming, branching, and replaying.


How AG-UI Works: A Typical Flow

1. User submits a task via the frontend
2. Frontend sends InitialRun request with RunAgentInput to the agent backend
3. Agent begins execution and emits RUN_STARTED event
4. Agent emits STEP_STARTED for each planning step
5. Agent calls a tool → emits TOOL_CALL_START, TOOL_CALL_ARGS, TOOL_CALL_END
6. Agent generates text → emits TEXT_MESSAGE_START, TEXT_MESSAGE_CONTENT (streaming), TEXT_MESSAGE_END
7. Agent emits STATE_DELTA to update frontend state in real time
8. User decides to redirect the agent → sends AgentInput with correction
9. Agent incorporates the correction and continues
10. Agent emits RUN_FINISHED

The frontend receives these events as a stream and renders them progressively—showing tool calls as they happen, streaming text in real time, and updating a progress indicator based on step events.


Implementing AG-UI

Framework Support

AG-UI is gaining support across major agent frameworks:

  • LangGraph: AG-UI events can be emitted from graph nodes using the AG-UI Python SDK
  • AG-UI CopilotKit integration: CopilotKit (a React frontend framework for AI) ships native AG-UI support
  • Custom implementations: the AG-UI spec is open; any framework can implement it with the event type definitions

Quick Start (Python)

from ag_ui.core import (
    RunAgentInput, EventType,
    RunStartedEvent, TextMessageStartEvent,
    TextMessageContentEvent, TextMessageEndEvent,
    RunFinishedEvent,
)
import uuid

async def run_agent(input: RunAgentInput):
    run_id = str(uuid.uuid4())

    yield RunStartedEvent(
        type=EventType.RUN_STARTED,
        thread_id=input.thread_id,
        run_id=run_id,
    )

    msg_id = str(uuid.uuid4())
    yield TextMessageStartEvent(type=EventType.TEXT_MESSAGE_START, message_id=msg_id, role="assistant")
    
    for chunk in agent.stream(input.messages):
        yield TextMessageContentEvent(
            type=EventType.TEXT_MESSAGE_CONTENT,
            message_id=msg_id,
            delta=chunk
        )

    yield TextMessageEndEvent(type=EventType.TEXT_MESSAGE_END, message_id=msg_id)
    yield RunFinishedEvent(type=EventType.RUN_FINISHED, thread_id=input.thread_id, run_id=run_id)

Connecting to AnyCap

When your AG-UI-powered agent needs real-world capabilities—web search, image generation, file storage—AnyCap integrates as a tool layer beneath the orchestration. The agent calls AnyCap tools during its execution loop and emits the corresponding TOOL_CALL_* events so the frontend shows what's happening:

User: "Research the top 5 AI frameworks and create a summary image"

Agent emits: TOOL_CALL_START (tool: "anycap_search", args: {...})
Agent emits: TOOL_CALL_END (result: search results)
Agent emits: TOOL_CALL_START (tool: "anycap_image_generate", args: {...})
Agent emits: TOOL_CALL_END (result: image URL)
Agent emits: TEXT_MESSAGE (streaming summary with embedded image)

This full transparency—surfaced through AG-UI events—is what separates a trustworthy human-agent interface from a black box.


Why AG-UI Matters for Production Agent Applications

If you're building agent-powered products, AG-UI provides:

Component reusability. Frontend components built to the AG-UI spec work with any compliant backend. Build a streaming chat UI once; use it with LangGraph, CrewAI, and AutoGen without changes.

Consistent user experience. Users see the same interaction patterns across different agent workflows because the event types are standardized.

Debugging. AG-UI's state snapshots and event stream give you a complete record of agent execution. Replaying an event stream shows exactly what the agent saw and did at each step.

Human oversight. The AgentInput mechanism for mid-task human intervention is built into the protocol—not bolted on as an afterthought.


Conclusion

AG-UI fills a real gap in the agentic AI infrastructure stack. As agents become more capable and more user-facing, the protocol for how they communicate their state and receive human input becomes as important as the tools they can access.

For developers building agent-powered products in 2026, adopting AG-UI early means building on a foundation that the ecosystem is converging toward—rather than maintaining a bespoke communication layer that becomes a liability as your product grows.

Further reading: