How to Add Web Crawling to Claude Code: Full-Page Access for Your Agent

Web search returns snippets. Web crawl returns the full page. Here's how to give Claude Code full-page web access — for research, competitor analysis, and content extraction — through one CLI.

You ask Claude Code to research a competitor's pricing page. It searches the web and returns a snippet: "Starts at $29/month." That's not enough. You need the full pricing table, the feature comparison, the enterprise tier — the actual page content.

Web search gives you summaries. Web crawl gives you the page.

Here's how to add web crawling to Claude Code — so your agent can read full web pages, extract structured data, and feed that research directly into its workflow.

Web Search vs Web Crawl: What's the Difference?

They're related but do different jobs:

	Web Search	Web Crawl
What it returns	Snippets, links, citations	Full page content as clean Markdown
Best for	Quick answers, discovery, fact-checking	Deep research, content extraction, competitor analysis
Speed	Seconds	Seconds to a minute (full page fetch)
Data depth	Surface-level	Complete — every heading, paragraph, table
Use case	"What's the pricing for X?"	"Extract the entire pricing page and compare it to our pricing"

Your agent needs both. Search to find the right pages. Crawl to read them properly.

Why Claude Code Needs Web Crawl

Claude Code reasons about your codebase. It can refactor functions, write tests, and debug issues across files. But when it needs to research something — a competitor's API docs, a library's changelog, a product's feature list — it hits a wall.

Web search helps, but snippets only go so far. A pricing page might have 12 tiers. A docs page might have 40 sections. A changelog might span 3 years of releases. A 150-character snippet tells you one thing. The full page tells you everything.

Web crawl gives your agent the complete page. It can then:

Extract structured data (pricing tiers, feature lists, API endpoints)
Compare competitor offerings point by point
Feed documentation into code generation ("implement the authentication exactly as described in the docs")
Monitor changes over time (crawl the same page weekly, diff the results)

For the broader picture of how search and crawl fit into your agent's tool stack, read What Is a Capability Runtime?.

Method 1: Manual Web Scraping (The Fragile Way)

You can configure Claude Code to call a scraping service directly. Pick a provider (Firecrawl, Jina, ScrapingBee), sign up, get an API key, and wire it into your agent.

The manual approach:

Sign up for a scraping service
Get an API key
Write a shell script or MCP config that Claude Code can call
Handle rate limits, retries, and failed fetches
Parse the response and feed it back into the agent context

This works for occasional use. It breaks when you scale — different websites block different scrapers, rate limits vary by provider, and maintaining the integration consumes time you wanted to spend building.

Method 2: MCP Server for Crawling

MCP servers for web crawling bundle the scraping logic into a reusable integration. Firecrawl's MCP server is the most common — Claude Code calls it, and it returns clean Markdown from any URL.

The setup is lighter than manual API wiring, but you're still managing:

One MCP server per capability (crawl is separate from search)
Provider-specific rate limits and authentication
Format inconsistencies when switching between scraping providers

Method 3: One CLI for Search + Crawl (The AnyCap Way)

This approach bundles search and crawl into one command surface. Your agent searches to find pages, then crawls to read them fully — all through the same CLI.

# Step 1: Search for relevant pages
anycap search --prompt "competitor pricing pages SaaS 2026" --citations

# Step 2: Crawl the most relevant result for full content
anycap crawl --url "https://competitor.com/pricing" -o pricing.md

The runtime handles:

Structured output. Pages convert to clean Markdown — headings, paragraphs, tables, and code blocks preserved.
JavaScript rendering. Dynamic pages (SPAs, React apps) render before extraction.
Clean content. Navigation, ads, and boilerplate are stripped. What's left is the article body.
Consistent format. Every crawled page returns the same Markdown structure, regardless of the source.

Install:

npm i -g anycap
anycap login
anycap skill install --target ~/.claude/skills/anycap-cli/

→ Install AnyCap free — 250 credits for new users

Real Use Case: Competitor Research Pipeline

Your agent needs to compare your product's pricing against three competitors. Here's the full workflow:

# 1. Search for competitor pricing pages
anycap search --prompt "competitor A pricing plans 2026" --citations
anycap search --prompt "competitor B pricing plans 2026" --citations
anycap search --prompt "competitor C pricing plans 2026" --citations

# 2. Crawl each pricing page for full content
anycap crawl --url "https://competitor-a.com/pricing" -o competitor-a.md
anycap crawl --url "https://competitor-b.com/pricing" -o competitor-b.md
anycap crawl --url "https://competitor-c.com/pricing" -o competitor-c.md

# 3. Feed the crawled content to Claude Code for analysis
# Claude Code now has the full pricing data and can produce:
# - A comparison table
# - Pricing positioning recommendations
# - Feature gap analysis

Your agent researched, crawled, analyzed, and recommended — all in one session. No manual browser tabs. No copy-pasting.

Real Use Case: Documentation-Driven Development

Your agent needs to implement an API integration. Instead of guessing the auth flow, it crawls the official docs:

# Crawl the API authentication docs
anycap crawl --url "https://api.provider.com/docs/auth" -o auth-docs.md

# Crawl the endpoint reference
anycap crawl --url "https://api.provider.com/docs/endpoints" -o endpoints.md

# Claude Code now implements the integration from the actual docs,
# not from its training data which may be outdated

This is the difference between "Claude Code, implement the Stripe integration" (works from training data, may be outdated) and "Claude Code, crawl the latest Stripe docs and implement the integration exactly as described" (accurate, current, reliable).

Real Use Case: Competitive Monitoring

Set up a recurring research workflow. Your agent crawls competitor pages on a schedule and diffs the results:

# Crawl competitor changelog
anycap crawl --url "https://competitor.com/changelog" -o competitor-changelog-$(date +%Y%m%d).md

# Crawl competitor feature page
anycap crawl --url "https://competitor.com/features" -o competitor-features-$(date +%Y%m%d).md

# Compare against last week's crawl
diff competitor-features-20260511.md competitor-features-20260518.md

Run this weekly. Your agent flags new features, changed pricing, updated messaging — before your product team hears about it from a customer.

Search + Crawl: The Complete Research Stack

Web search finds. Web crawl reads. Together, they form a complete research capability for your agent:

Step	Command	What it does
1. Discover	`anycap search`	Finds relevant pages with grounded citations
2. Extract	`anycap crawl`	Pulls full page content as clean Markdown
3. Analyze	Claude Code	Reasons about the extracted content
4. Act	Claude Code	Implements, compares, or reports based on findings

This is grounded research — your agent doesn't rely on training data or partial snippets. It works from the actual, current content of the pages that matter.

When to Crawl vs When to Search

Use search when...	Use crawl when...
You need a quick answer	You need the complete page
You're discovering which pages exist	You know which page you need and want all of it
You need cited, grounded summaries	You need structured data extraction
Speed is the priority	Depth is the priority
The answer fits in a snippet	The answer is a table, a list, or spans multiple sections

Most research workflows use both: search to discover, crawl to extract.

FAQ

Does web crawl work on JavaScript-rendered pages?

Yes. The runtime renders dynamic content (React, Vue, SPAs) before extracting. What you see in a browser is what your agent gets.

How is web crawl different from Claude Code's built-in web search?

Claude Code's built-in web search returns snippets and summaries. Web crawl returns the full page content as Markdown — every heading, paragraph, table, and code block. Use search for quick answers. Use crawl when you need depth.

Can I crawl multiple pages in one session?

Yes. Run anycap crawl once per URL. Your agent can loop through a list of URLs and crawl them sequentially. All results are saved as local Markdown files.

What if a page blocks crawlers?

Some pages block automated access. The runtime respects robots.txt and handles access restrictions gracefully. If a page can't be crawled, your agent gets a clear error message — it doesn't silently fail.

Does this work across Cursor and Codex too?

Yes. anycap crawl uses the same CLI and works across Claude Code, Cursor, and Codex. One install, all agents.

The Bottom Line

Web search tells your agent what exists. Web crawl lets your agent read it. For competitive research, documentation-driven development, and content extraction, search alone isn't enough.

Give your agent both. Search to discover. Crawl to understand.

→ Give Claude Code full web access — search + crawl through one CLI

📖 What to Read Next

How to Give Your AI Agent Web Search Capability — One CLI Command — The web search companion to this crawl guide.
How to Generate Video with Claude Code: The Complete 2026 Guide — Research, then create. The capabilities keep stacking.
How to Deploy a Website from Claude Code — Crawl content, build a page, deploy it. Complete pipeline.

What Is a Capability Runtime? — The infrastructure that bundles search, crawl, image, video, and storage into one CLI.
What Is an AI Agent? The Complete Developer Guide — The fundamentals: what agents are and what tools they need.
Agentic AI vs Traditional AI: 5 Key Differences — Why the bottleneck isn't the model — it's whether the agent has tools like web access.

Written by the AnyCap team. We build the capability runtime that gives your agent web search with citations, full-page crawling, and everything it needs to research without you.