
AI agents have made remarkable progress. They can write production-quality code, conduct multi-source research, generate images and video, orchestrate complex workflows, and make decisions across long-running tasks. But there are real limits — and understanding them is essential for anyone building serious systems.
This isn't a pessimistic take. Understanding what agents can't do reliably is the foundation for designing systems that work.
1. Maintain State Across Sessions
The problem: Most AI agents don't have persistent memory between sessions. When a new conversation starts, the agent has no recollection of previous interactions unless memory is explicitly architected.
This is the single biggest practical limitation for agentic applications. Users expect the agent to remember them. The underlying models don't.
Workarounds:
- External memory storage: Use a database or vector store to persist and retrieve user context
- Session summaries: At session end, generate a structured summary and prepend it to the next session
- Workspace context: Tools like AnyCap's workspace context inject persistent project information automatically
# Pattern: inject persistent context at session start
def start_session(user_id: str) -> str:
memory = memory_store.get(user_id)
return f"""
Previous context for this user:
{memory.summary if memory else "New user — no previous context"}
---
"""
2. Perform Real-Time Actions Reliably
The problem: Agents can plan, reason, and generate content well. Executing real-world actions reliably — sending emails, making purchases, updating databases — remains error-prone at scale.
The failure modes are subtle: an agent might correctly identify the action to take but mis-format an API call, miss a required field, or not handle an error response correctly.
Workarounds:
- Human-in-the-loop for high-stakes actions: Require confirmation before irreversible operations
- Structured output validation: Always validate agent-generated action payloads before execution
- Idempotent design: Build systems where repeated execution of the same action doesn't cause double effects
3. Understand True Causality
The problem: AI models are correlation machines. They identify patterns that correlate with correct answers, but don't understand the underlying causal structure of the world.
In practice, this means agents can be confidently wrong about cause-and-effect relationships, particularly in novel situations that don't match training data patterns.
Workarounds:
- Grounded web search: Give agents access to current, factual information rather than relying on model knowledge
- Require reasoning traces: "Explain why this is true" before accepting conclusions
- Cross-check important outputs: Validate causal claims against authoritative sources
# Instead of trusting model knowledge, ground in current data
anycap web search "latest Kubernetes v1.32 breaking changes"
anycap web crawl https://kubernetes.io/docs/setup/release/notes/
4. Handle Ambiguity Well Without Guidance
The problem: Agents presented with ambiguous tasks often either hallucinate a resolution (pick an interpretation without flagging the ambiguity) or get paralyzed and fail to proceed.
Neither behavior is what you want in production.
Workarounds:
- Structured ambiguity handling: Instruct the agent to output a structured list of ambiguities before proceeding
- Default resolution rules: Define explicit defaults for common ambiguous cases in your system prompt
- Confidence thresholds: If the agent isn't confident in its interpretation, route to human review
SYSTEM_PROMPT = """
Before starting any task, identify any ambiguities.
If ambiguities exist:
1. List each ambiguity clearly
2. State your assumed resolution
3. Proceed with the assumption
4. Flag the assumption in your output
Example output format:
AMBIGUITY: "Latest version" — which package?
ASSUMPTION: Using the version in package.json (1.2.3)
"""
5. Generate Reliable Long-Form Structured Output
The problem: Agents can write excellent prose and decent code, but generating large, consistently structured data — hundreds of rows of JSON, complex nested schemas, or long tables — leads to format drift, missing fields, and inconsistency.
Workarounds:
- Structured output schemas: Use tool calls or JSON schema validation to constrain output format
- Batch with validation: Generate in smaller chunks, validate each chunk
- Schema-first design: Define the exact output schema before the task, not after
6. Know What They Don't Know
The problem: AI models are overconfident in areas where they have training data gaps. The model doesn't have a reliable internal sense of "I'm not confident about this."
This is particularly problematic for recent events, domain-specific knowledge, and edge cases.
Workarounds:
- Grounded search for time-sensitive facts: Don't trust model knowledge for anything that could have changed
- Explicit uncertainty instructions: "If you're not certain, say so and provide a confidence level"
- Source citation requirements: Require the agent to cite sources for factual claims
7. Handle Complex Multi-Modal Reasoning
The problem: While vision capabilities have improved dramatically, agents still struggle with complex spatial reasoning, reading dense visual data (charts, diagrams), and integrating information from multiple images coherently.
Workarounds:
- Pre-process visual data: Extract text, numbers, or structured data from images before sending to the agent
- Specialized vision models: Route vision tasks to models optimized for specific visual tasks
- Break multi-image tasks into single-image steps
Building Reliable Systems Despite These Limits
The pattern across all these limitations is the same: agents work best when you design systems that acknowledge the limits rather than ignore them.
A reliable agentic system:
- Grounds agents in external facts (web search, databases, APIs)
- Validates outputs before acting on them
- Maintains explicit state rather than relying on context memory
- Routes to humans for genuinely ambiguous or high-stakes decisions
- Uses specialized capabilities (image gen, video gen, search) via purpose-built tools rather than expecting the model to do everything
AnyCap is designed around this philosophy: each capability (image generation, video creation, web search, publishing) is handled by a purpose-built system, orchestrated by the agent's reasoning — not crammed into the context window.