
Yes — AI agents can call Veo 3.1 directly. Google's flagship video model is available through the AnyCap API and CLI, letting Claude Code, Cursor, Codex, and any other agent runtime generate up to 4K video in a single tool call.
What Is Veo 3.1?
Veo 3.1 is Google DeepMind's highest-quality text-to-video model as of 2026. It produces 4–8 second native clips — extendable up to 148 seconds via Scene Extension chaining — at up to 4K resolution, with cinematic lighting, coherent motion, and native 48kHz stereo audio. It is the go-to choice when you need a premium first-pass render that needs minimal post-processing.
Key specs:
- Resolution: 720p, 1080p, or 4K (3840×2160 on premium tiers)
- Duration: 4, 6, or 8 seconds native; chainable up to 148 seconds via Scene Extension
- Audio: native 48kHz stereo AAC, synchronized dialogue and soundscapes
- Output: MP4 with optional audio track
- Strength: photorealistic first-pass quality, product demos, brand announcements
- Credit cost: ~20 credits/second of output
Why Agents Can't Call Veo 3.1 Directly (and How to Fix It)
Veo 3.1 lives behind Google's Vertex AI platform, which requires OAuth 2.0 service-account auth, project IAM setup, and streaming gRPC calls — none of which a standard agent tool call can handle out of the box.
AnyCap wraps the entire auth and delivery layer behind a single REST endpoint and CLI binary, so your agent calls one tool, gets back a video URL, and moves on.
What Veo 3.1 Unlocks for Agents
- Product demo videos — turn a spec sheet into a 30-second MP4 automatically
- Announcement clips — generate a launch video the moment a PR draft is approved
- Visual documentation — convert how-to instructions into narrated walkthroughs
- A/B creative testing — generate multiple prompt variants at scale and compare
Method 1: Direct Vertex AI API (manual setup)
If you need raw access without AnyCap, the Vertex AI REST endpoint is:
POST https://us-central1-aiplatform.googleapis.com/v1/projects/{PROJECT}/locations/us-central1/publishers/google/models/veo-3.1:generateVideo
Authorization: Bearer $(gcloud auth print-access-token)
Content-Type: application/json
{
"instances": [{
"prompt": "A golden retriever running on a beach at sunset, slow motion",
"parameters": {
"durationSeconds": 10,
"resolution": "1080p"
}
}]
}
Downsides: requires GCP project, service account JSON, IAM roles, and polling logic for async jobs. Not practical inside a sandboxed agent.
Method 2: MCP Server
AnyCap ships an MCP server that exposes video_generate as a tool. Add it to your mcp.json:
{
"mcpServers": {
"anycap": {
"command": "anycap",
"args": ["mcp", "serve"],
"env": { "ANYCAP_API_KEY": "your_key_here" }
}
}
}
Once connected, instruct your agent:
Generate a 10-second product demo video of a sleek laptop opening on a white desk.
Use Veo 3.1. Save it to Drive and return the shareable link.
The agent calls video_generate → drive_upload → returns URL. No boilerplate.
Method 3: AnyCap CLI (Recommended for Agent Runtimes)
The CLI is the fastest path for Claude Code, Cursor terminals, and Codex sandbox shells.
Install
curl -fsSL https://anycap.ai/install.sh | sh
anycap login # paste your API key once
Generate a video with Veo 3.1
anycap video generate \
--model veo-3-1 \
--prompt "A drone flyover of a modern city at golden hour, cinematic" \
--duration 15 \
--output /workspace/city-flyover.mp4
Output:
✓ Queued veo-3-1 [job: v3x-7891]
✓ Rendering 15s @ 1080p …
✓ Complete /workspace/city-flyover.mp4 (284 MB)
Credits used: 300
Upload to Drive and get a shareable link
anycap drive upload /workspace/city-flyover.mp4 --share
# → https://drive.anycap.ai/f/abc123 (public link, no login required)
Image-to-video (reference frame)
anycap video generate \
--model veo-3-1 \
--image /workspace/product-shot.png \
--prompt "Slowly rotate the product 360 degrees, studio lighting" \
--duration 8 \
--output /workspace/product-360.mp4
Veo 3.1 vs Other Video Models — Which Should Agents Use?
| Model | Best For | Quality | Speed | Credits/sec |
|---|---|---|---|---|
| Veo 3.1 | Premium first-pass, photorealism | ★★★★★ | Moderate | ~20 |
| Veo 3.1 Fast | Quick drafts, iteration | ★★★★☆ | Fast | ~10 |
| Kling 3.0 | Cinematic camera motion, drama | ★★★★★ | Moderate | ~18 |
| Kling O1 | Consistent style, batch | ★★★★☆ | Fast | ~12 |
| Seedance 2.0 | Character consistency, series | ★★★★☆ | Moderate | ~15 |
| Seedance 1.5 Pro | High-volume batch generation | ★★★★☆ | Fast | ~10 |
| Sora 2 Pro | Realistic physics, long duration | ★★★★★ | Slow | ~25 |
| Hailuo 2.3 | Fast drafts, stylized | ★★★☆☆ | Very Fast | ~8 |
Decision guide:
- Need the best possible quality on the first try → Veo 3.1
- Need dramatic camera sweeps and motion → Kling 3.0
- Need consistent characters across multiple clips → Seedance 2.0
- Need realistic gravity/fluid physics → Sora 2 Pro
- Need a fast cheap draft → Veo 3.1 Fast or Hailuo 2.3
Pricing
AnyCap credits are shared across all models. New accounts start with 250 free credits.
| Clip Length | Veo 3.1 Cost | Veo 3.1 Fast Cost |
|---|---|---|
| 5 seconds | 100 credits | 50 credits |
| 10 seconds | 200 credits | 100 credits |
| 30 seconds | 600 credits | 300 credits |
| 60 seconds | 1,200 credits | 600 credits |
FAQ
Q: Does Veo 3.1 support audio?
A: Yes. Pass --audio to request a synthesized audio track. You can also specify --no-audio for a silent clip and add music separately with anycap music generate.
Q: What's the difference between Veo 3.1 and Veo 3.1 Fast?
A: Veo 3.1 Fast uses a distilled model that renders in roughly half the time at half the credit cost. Quality is slightly lower but acceptable for drafts and iteration. Switch to full Veo 3.1 for final renders.
Q: Can I run Veo 3.1 inside a Claude Code session?
A: Yes. Install the AnyCap CLI in your Claude Code project shell and call it as a bash tool. The output file path is returned synchronously, so subsequent tool calls can reference it immediately.
Q: How do I use Veo 3.1 inside Cursor?
A: Open the Cursor terminal, install the CLI, and run anycap video generate commands. Or add the AnyCap MCP server to .cursor/mcp.json and ask Cursor's agent to generate video through natural language.
Q: Is there a resolution limit?
A: Veo 3.1 supports 720p, 1080p, and 4K (3840×2160). 4K is available on premium API tiers via Vertex AI. Aspect ratio defaults to 16:9; pass --aspect 9:16 for vertical/mobile format.
Q: How long does a 10-second Veo 3.1 clip take to render?
A: Typical queue-to-download time is 60–90 seconds for a 10-second clip. Render time scales with duration. Veo 3.1 Fast is roughly 30–45 seconds for the same length.
Q: Do my generated videos belong to me?
A: Yes. Videos generated through AnyCap are fully owned by you. AnyCap does not use your prompts or outputs to train models.
What to Read Next
- Best AI Video Models for Agents 2026 — full video model comparison with benchmark results
- Sora 2 Pro API Guide (2026) — OpenAI's cinematic video model for agents
- Kling 3.0 API Guide (2026) — 15-second clips with realistic motion and native audio
- Seedance 1.5 Pro Complete Guide — ByteDance's video model for consistent human motion
- OpenAI Codex Has No Audio Tools — Add Them in 30 Seconds — pair Veo 3.1 videos with generated soundtracks
- Best Image Models for AI Agents 2026 — complete the multimedia stack