How to Add Web Crawling to Claude Code: Full-Page Access for Your Agent

Web search returns snippets. Web crawl returns the full page. Here's how to give Claude Code full-page web access — for research, competitor analysis, and content extraction — through one CLI.

by AnyCap

You ask Claude Code to research a competitor's pricing page. It searches the web and returns a snippet: "Starts at $29/month." That's not enough. You need the full pricing table, the feature comparison, the enterprise tier — the actual page content.

Web search gives you summaries. Web crawl gives you the page.

Here's how to add web crawling to Claude Code — so your agent can read full web pages, extract structured data, and feed that research directly into its workflow.


Web Search vs Web Crawl: What's the Difference?

They're related but do different jobs:

Web Search Web Crawl
What it returns Snippets, links, citations Full page content as clean Markdown
Best for Quick answers, discovery, fact-checking Deep research, content extraction, competitor analysis
Speed Seconds Seconds to a minute (full page fetch)
Data depth Surface-level Complete — every heading, paragraph, table
Use case "What's the pricing for X?" "Extract the entire pricing page and compare it to our pricing"

Your agent needs both. Search to find the right pages. Crawl to read them properly.


Why Claude Code Needs Web Crawl

Claude Code reasons about your codebase. It can refactor functions, write tests, and debug issues across files. But when it needs to research something — a competitor's API docs, a library's changelog, a product's feature list — it hits a wall.

Web search helps, but snippets only go so far. A pricing page might have 12 tiers. A docs page might have 40 sections. A changelog might span 3 years of releases. A 150-character snippet tells you one thing. The full page tells you everything.

Web crawl gives your agent the complete page. It can then:

  • Extract structured data (pricing tiers, feature lists, API endpoints)
  • Compare competitor offerings point by point
  • Feed documentation into code generation ("implement the authentication exactly as described in the docs")
  • Monitor changes over time (crawl the same page weekly, diff the results)

Method 1: Manual Web Scraping (The Fragile Way)

You can configure Claude Code to call a scraping service directly. Pick a provider (Firecrawl, Jina, ScrapingBee), sign up, get an API key, and wire it into your agent.

The manual approach:

  1. Sign up for a scraping service
  2. Get an API key
  3. Write a shell script or MCP config that Claude Code can call
  4. Handle rate limits, retries, and failed fetches
  5. Parse the response and feed it back into the agent context

This works for occasional use. It breaks when you scale — different websites block different scrapers, rate limits vary by provider, and maintaining the integration consumes time you wanted to spend building.


Method 2: MCP Server for Crawling

MCP servers for web crawling bundle the scraping logic into a reusable integration. Firecrawl's MCP server is the most common — Claude Code calls it, and it returns clean Markdown from any URL.

The setup is lighter than manual API wiring, but you're still managing:

  • One MCP server per capability (crawl is separate from search)
  • Provider-specific rate limits and authentication
  • Format inconsistencies when switching between scraping providers

Method 3: One CLI for Search + Crawl (The AnyCap Way)

This approach bundles search and crawl into one command surface. Your agent searches to find pages, then crawls to read them fully — all through the same CLI.

# Step 1: Search for relevant pages
anycap search --prompt "competitor pricing pages SaaS 2026" --citations

# Step 2: Crawl the most relevant result for full content
anycap crawl --url "https://competitor.com/pricing" -o pricing.md

The runtime handles:

  • Structured output. Pages convert to clean Markdown — headings, paragraphs, tables, and code blocks preserved.
  • JavaScript rendering. Dynamic pages (SPAs, React apps) render before extraction.
  • Clean content. Navigation, ads, and boilerplate are stripped. What's left is the article body.
  • Consistent format. Every crawled page returns the same Markdown structure, regardless of the source.

Install:

npm i -g anycap
anycap login
anycap skill install --target ~/.claude/skills/anycap-cli/

Install AnyCap free — 250 credits for new users


Real Use Case: Competitor Research Pipeline

Your agent needs to compare your product's pricing against three competitors. Here's the full workflow:

# 1. Search for competitor pricing pages
anycap search --prompt "competitor A pricing plans 2026" --citations
anycap search --prompt "competitor B pricing plans 2026" --citations
anycap search --prompt "competitor C pricing plans 2026" --citations

# 2. Crawl each pricing page for full content
anycap crawl --url "https://competitor-a.com/pricing" -o competitor-a.md
anycap crawl --url "https://competitor-b.com/pricing" -o competitor-b.md
anycap crawl --url "https://competitor-c.com/pricing" -o competitor-c.md

# 3. Feed the crawled content to Claude Code for analysis
# Claude Code now has the full pricing data and can produce:
# - A comparison table
# - Pricing positioning recommendations
# - Feature gap analysis

Your agent researched, crawled, analyzed, and recommended — all in one session. No manual browser tabs. No copy-pasting.


Real Use Case: Documentation-Driven Development

Your agent needs to implement an API integration. Instead of guessing the auth flow, it crawls the official docs:

# Crawl the API authentication docs
anycap crawl --url "https://api.provider.com/docs/auth" -o auth-docs.md

# Crawl the endpoint reference
anycap crawl --url "https://api.provider.com/docs/endpoints" -o endpoints.md

# Claude Code now implements the integration from the actual docs,
# not from its training data which may be outdated

This is the difference between "Claude Code, implement the Stripe integration" (works from training data, may be outdated) and "Claude Code, crawl the latest Stripe docs and implement the integration exactly as described" (accurate, current, reliable).


Real Use Case: Competitive Monitoring

Set up a recurring research workflow. Your agent crawls competitor pages on a schedule and diffs the results:

# Crawl competitor changelog
anycap crawl --url "https://competitor.com/changelog" -o competitor-changelog-$(date +%Y%m%d).md

# Crawl competitor feature page
anycap crawl --url "https://competitor.com/features" -o competitor-features-$(date +%Y%m%d).md

# Compare against last week's crawl
diff competitor-features-20260511.md competitor-features-20260518.md

Run this weekly. Your agent flags new features, changed pricing, updated messaging — before your product team hears about it from a customer.


Search + Crawl: The Complete Research Stack

Web search finds. Web crawl reads. Together, they form a complete research capability for your agent:

Step Command What it does
1. Discover anycap search Finds relevant pages with grounded citations
2. Extract anycap crawl Pulls full page content as clean Markdown
3. Analyze Claude Code Reasons about the extracted content
4. Act Claude Code Implements, compares, or reports based on findings

This is grounded research — your agent doesn't rely on training data or partial snippets. It works from the actual, current content of the pages that matter.


Use search when... Use crawl when...
You need a quick answer You need the complete page
You're discovering which pages exist You know which page you need and want all of it
You need cited, grounded summaries You need structured data extraction
Speed is the priority Depth is the priority
The answer fits in a snippet The answer is a table, a list, or spans multiple sections

Most research workflows use both: search to discover, crawl to extract.


FAQ

Does web crawl work on JavaScript-rendered pages?

Yes. The runtime renders dynamic content (React, Vue, SPAs) before extracting. What you see in a browser is what your agent gets.

How is web crawl different from Claude Code's built-in web search?

Claude Code's built-in web search returns snippets and summaries. Web crawl returns the full page content as Markdown — every heading, paragraph, table, and code block. Use search for quick answers. Use crawl when you need depth.

Can I crawl multiple pages in one session?

Yes. Run anycap crawl once per URL. Your agent can loop through a list of URLs and crawl them sequentially. All results are saved as local Markdown files.

What if a page blocks crawlers?

Some pages block automated access. The runtime respects robots.txt and handles access restrictions gracefully. If a page can't be crawled, your agent gets a clear error message — it doesn't silently fail.

Does this work across Cursor and Codex too?

Yes. anycap crawl uses the same CLI and works across Claude Code, Cursor, and Codex. One install, all agents.


The Bottom Line

Web search tells your agent what exists. Web crawl lets your agent read it. For competitive research, documentation-driven development, and content extraction, search alone isn't enough.

Give your agent both. Search to discover. Crawl to understand.


Give Claude Code full web access — search + crawl through one CLI




Written by the AnyCap team. We build the capability runtime that gives your agent web search with citations, full-page crawling, and everything it needs to research without you.