Every major AI company now ships a deep research feature. But if you're building an agent — not a chat experience — the question isn't "which produces the best report." It's "which one can my agent actually call."
That question eliminates most of the field. The tools that produce the most impressive demos — ChatGPT Deep Research, Perplexity Deep Research — are locked inside chat interfaces. No API. No CLI. No way for your agent to use them.
Here's what's actually available at the API/CLI level, how they compare on the criteria that matter for agent workflows, and which one fits each use case.
The evaluation criteria (agent-specific)
Consumer deep research is evaluated on report quality. Agent deep research needs to be evaluated on:
| Criterion | Why it matters |
|---|---|
| Programmatic access | Can your agent call it? CLI, API, or SDK? If it's UI-only, it doesn't exist for your workflow. |
| Structured output | Can your agent parse the results? Sections, citations, confidence scores? Or is it a wall of text? |
| Controllable depth | Can your agent choose breadth vs speed? Deep research isn't one size — a quick overview costs less than a comprehensive analysis. |
| Citation density | Does every claim link to a source? An agent that passes unverifiable findings downstream is worse than an agent that admits uncertainty. |
| Latency | How long does it take? Agent workflows are latency-sensitive — a 15-minute research step dominates total time. |
| Composability | Can the agent chain research with other capabilities? Search → research → generate → publish in one workflow? |
| Cost predictability | Does the agent know the cost before running? Surprise $5 research runs that auto-fire 20 times become expensive fast. |
The APIs that actually exist
AnyCap Deep Research
Access: CLI (anycap research --query "...")
How it works: Your agent invokes a shell command. AnyCap decomposes the query, runs multi-round web search, crawls top sources, synthesizes findings into structured Markdown with citations, and returns the output — all through the same CLI the agent already uses for everything else.
Output format: Structured Markdown with H2 sections, inline citations with source URLs, and a reference list at the bottom. Parseable by the agent for downstream processing.
Depth control: --depth standard (5-10 sources, 1-3 min) or --depth comprehensive (20-50+ sources, 5-10 min). Agent chooses based on task requirements.
Composability: Full. Research is one tool alongside anycap search, anycap image generate, and anycap page publish. One CLI. One authentication. The agent chains capabilities without middleware.
Cost: Included in AnyCap subscription. No per-query pricing. Credit-based with cost preview before comprehensive runs.
Best for: Agent-first workflows. Any scenario where research feeds into the next step of a pipeline. Developers who want deep research as a capability, not a destination.
Google Gemini Deep Research (via AI Studio / Vertex AI)
Access: API via Google AI Studio (free tier) or Vertex AI (paid). Limited deep research endpoints available.
How it works: Google's Gemini models power multi-round search and synthesis, leveraging Google's search index for retrieval quality. Available through limited API endpoints on both AI Studio and Vertex AI.
Output format: Text report — formatted for human reading, not structured for agent parsing. Citations are inline text references, not structured arrays. The agent can technically read the output, but parsing sections and citations programmatically is fragile.
Depth control: Limited. Gemini Deep Research runs at a single depth level. No explicit "standard vs comprehensive" toggle for the API.
Composability: Moderate. The API exists, so your agent can call it — but the output requires custom parsing, and combining it with other capabilities means managing separate authentication for each service.
Cost: AI Studio: free tier available with rate limits. Vertex AI: pay-per-use, roughly $35/1,000 requests for grounded search (deep research pricing less transparent).
Best for: Teams already on Google Cloud who can tolerate parsing text output. Workflows where Google's search index quality is the dominant concern.
OpenAI Deep Research (via API — limited)
Access: ChatGPT Pro subscription ($200/month) required. Limited API availability through OpenAI's platform. Primarily a consumer product — API access is restricted and expensive.
How it works: o3-based reasoning model performs multi-step research across 20-100+ sources. Produces narrative reports with inline citations.
Output format: Conversational text. No structured sections, no JSON output, no machine-parseable citation format. The agent would need to parse natural language reports to extract data.
Depth control: None from the API. Research depth is determined by the model, not controllable by the caller.
Composability: Poor. Even with API access, the text output format makes chaining with other tools impractical. Separate authentication and billing from any other capabilities.
Cost: $200/month flat (Pro subscription) plus API usage at premium rates. No per-query cost visibility before running.
Best for: Individual knowledge workers who need the highest synthesis quality and aren't constrained by cost or pipeline requirements. Not recommended for agent workflows.
GPT Researcher (open-source)
Access: Self-hosted Python application. REST API available for programmatic access.
How it works: Open-source autonomous research agent. Generates search queries, scrapes results, extracts content, and synthesizes findings. Runs as a local service your agent calls via HTTP.
Output format: Structured report with sections and sources. Better parseability than ChatGPT/Gemini text output, but format depends on your configuration.
Depth control: Configurable — number of search queries, sources per query, and synthesis depth can all be tuned.
Composability: Moderate. Self-hosted, so you control the full stack. But integration requires running a separate service, and combining with image generation or publishing means yet more integrations.
Cost: Free (open-source). Infrastructure costs: server hosting, web crawling bandwidth. No per-query pricing, but crawler quality (using your own IPs) is meaningfully worse than Google/Bing-backed tools.
Best for: Teams with infrastructure to self-host who need full control and zero per-query costs. High volume use cases where infrastructure investment amortizes.
Comparison matrix
| AnyCap Deep Research | Gemini Deep Research | OpenAI Deep Research | GPT Researcher | |
|---|---|---|---|---|
| Access | CLI | API (limited) | API (limited) | Self-hosted REST |
| Structured output | ✅ Markdown + citations | ⚠️ Text report | ❌ Conversational | ✅ Configurable |
| Depth control | ✅ Standard/Comp | ❌ Fixed | ❌ Fixed | ✅ Configurable |
| Citation quality | ✅ Inline + list | ⚠️ Inline text | ⚠️ Inline text | ✅ Structured |
| Latency (quick) | 1-3 min | ~5 min | 5-30 min | 3-10 min |
| Composability | ✅ Full CLI chain | ⚠️ Separate auth | ❌ Standalone | ⚠️ Separate service |
| Cost model | Subscription (credits) | Pay-per-use | $200/mo + API | Infrastructure cost |
| Search quality | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐ |
| Setup complexity | 1 CLI command | GCP project setup | API application | Server deployment |
| Agent-native | ✅ Built for agents | ⚠️ Retrofitted | ❌ Consumer-first | ⚠️ Technical setup |
What to choose based on your use case
Your agent needs research as one step in a multi-capability pipeline: → AnyCap Deep Research. Research, search, generate, publish — all through one CLI.
Research quality is the only criterion; cost and pipeline integration don't matter: → ChatGPT Deep Research. Best synthesis quality, no question. Just don't expect your agent to use it.
You're on Google Cloud and need Google's search index: → Gemini Deep Research. Best retrieval quality. Accept the text parsing overhead.
You have infrastructure and high volume; per-query pricing is a dealbreaker: → GPT Researcher. Self-hosted, zero per-query cost. Accept the crawler quality tradeoff.
The framework: evaluate based on agent needs, not human demos
Consumer deep research tools are evaluated on report quality because the evaluator is a human reading the report. Agent deep research tools need to be evaluated on:
- Can the agent call it? (CLI or API — not UI)
- Can the agent parse the output? (Structured, not conversational)
- Can the agent control depth and cost? (Predictable, not opaque)
- Can the agent chain it with other tools? (Composable, not standalone)
Most consumer tools fail on criteria 1-4. That's not because they're bad products. It's because they were built for humans, not agents. The tools that pass all four are the ones your agent can actually use.
Further reading:
- ChatGPT Deep Research vs AnyCap: Head-to-Head — Detailed comparison of the two approaches
- Best Deep Research Tools for AI Agents in 2026 — Full landscape including consumer tools
- AI Workflow Automation: Build an Agentic Pipeline — How research fits into multi-step pipelines