DeepSeek V4 Capability Guide: What It Can & Can't Do

Everything DeepSeek V4 can do, cannot do, and how to close the gaps. Covers 1M-token context, agentic coding, self-hosting, multimodal limitations, and capability extension with AnyCap.

DeepSeek V4 is a 1.6-trillion-parameter Mixture-of-Experts language model that matches GPT-5.5 on agentic coding benchmarks at 1/18th the cost. It has a 1M-token context window — the longest of any frontier model. It is Apache 2.0 licensed, which means you can self-host it, fine-tune it, and deploy it without restrictions. And it is text-only: no native image generation, no video, no audio, no web search, no storage, no publishing.

This guide covers everything DeepSeek V4 can do, everything it cannot do, and how to close the gaps so your agents can actually ship complete work. For a full technical walkthrough of the architecture, benchmarks, and API, start with our DeepSeek V4 developer guide.

What DeepSeek V4 can do

Frontier reasoning at 1/18th the cost

DeepSeek V4 Pro scores 81% on SWE-bench Verified, 85.2% on MMLU-Pro, and 96.8% on MATH-500 — all within striking distance of GPT-5.5 and Claude Opus 4.7. The difference: DeepSeek V4 Pro costs $0.28/1M input tokens and $1.12/1M output tokens. GPT-5.5 costs $5/1M input and $30/1M output.

For a typical agent coding session — 10K tokens in, 2K out — DeepSeek V4 Pro costs about $0.005. GPT-5.5 costs about $0.11. Over a month of daily use, the difference is measured in hundreds of dollars. For a head-to-head comparison of benchmarks, pricing, and features, see DeepSeek V4 vs GPT-5.5.

1M-token context window

DeepSeek V4 can ingest 1 million tokens in a single pass — roughly 750,000 words, or the equivalent of three full novels. For developers, this means you can feed an entire codebase into the model without chunking, summarization, or retrieval. Claude Code, when routed through DeepSeek V4, can index and understand a large monorepo in one session.

This is enabled by DeepSeek's Multi-head Latent Attention (MLA) architecture, which compresses the key-value cache to reduce memory usage during long-context inference. The result is practical: 1M-token context at a cost that does not break your API budget.

Agentic coding — open-source SOTA

DeepSeek V4 Pro achieves state-of-the-art results among open-source models on agentic coding benchmarks. It was specifically post-trained for agent tasks: tool calling, multi-step planning, error recovery, and code execution. CNBC reported on launch day that V4 has been optimized for use with Claude Code and OpenClaw.

In practice, this means a DeepSeek V4-powered agent can:

Read a full repository and build an internal map of the codebase
Plan multi-step changes across dozens of files
Execute those changes, run tests, and iterate on failures
Call external tools through function calling or MCP

For a complete setup walkthrough, see DeepSeek V4 with Claude Code: Agent Integration Guide.

Self-hosting and data sovereignty

DeepSeek V4 is released under the Apache 2.0 license. You can download the weights, run the model on your own hardware, and deploy it in air-gapped environments. V4 Flash quantized to 4-bit runs on a single consumer GPU. V4 Pro requires more VRAM but is viable on workstation-class hardware.

For teams with compliance requirements, data sovereignty constraints, or a preference for infrastructure ownership, this is a decisive advantage over API-only models like GPT-5.5 or Claude.

Multi-model routing

DeepSeek V4 can be used alongside other models through routing layers like OpenRouter. A common pattern: use DeepSeek V4 Flash ($0.14/1M tokens) for simple tasks, DeepSeek V4 Pro for complex reasoning, and a multimodal model for tasks that need native image understanding. Multi-model routing is becoming standard practice — and DeepSeek V4's price point makes it the default choice for cost-sensitive routing tiers.

What DeepSeek V4 cannot do

No native multimodal support

This is the single biggest limitation. DeepSeek V4 is text-only. The official documentation states: "No native image, audio, or video input or output in the preview."

Specifically, a DeepSeek V4-powered agent cannot, out of the box:

Generate images or edit photos
Create videos or analyze video content
Process audio — transcription, voice synthesis, music generation
Understand images — describe a photo, extract text from a screenshot, answer questions about a diagram
Search the live web for current information
Store files in cloud storage or generate share links
Publish content to the web

No voice or audio processing

GPT-5.5 and Gemini 3.1 support voice mode and audio understanding. DeepSeek V4 does not. If your workflow involves transcribing meetings, building voice agents, or processing audio files, DeepSeek V4 alone is not the right tool.

Knowledge cutoff

Like all large language models, DeepSeek V4 has a training data cutoff. It does not know about events after its training date. The 1M-token context window helps — you can feed it recent documentation or search results — but the model itself has no live awareness.

API ecosystem maturity

DeepSeek's API ecosystem is newer and smaller than OpenAI's or Anthropic's. The Assistants API, structured outputs, fine-tuning API, and managed deployment options are less mature. For teams that rely heavily on managed AI infrastructure, this is a consideration — though the Apache 2.0 license means you can build whatever infrastructure you need on top of the model.

How to close the capability gaps

Every limitation listed above has a solution. The architecture is straightforward: DeepSeek V4 handles reasoning and code generation. Other tools handle everything else.

Image generation, video, search, storage, and publishing

These capabilities can be added through MCP (Model Context Protocol), the open standard for connecting AI agents to external tools. Claude Code, Cursor, and OpenClaw all support MCP natively. The fastest path: install AnyCap with one command. One runtime adds all five capabilities to any MCP-compatible agent:

npx -y skills add anycap-ai/anycap -a claude-code

After installation, your DeepSeek V4-powered agent can:

Capability	Command
Generate images	`anycap image generate "description"`
Create videos	`anycap video generate "description"`
Search the web	`anycap search "query"`
Store files	`anycap drive upload ./path`
Publish content	`anycap page publish ./file.md`

Full guide: How to Add Multimodal Capabilities to DeepSeek V4 Agents

Claude Code and OpenClaw integration

DeepSeek V4 has been optimized for agent tools. CNBC confirmed this at launch. To route Claude Code through DeepSeek V4:

export OPENROUTER_API_KEY=sk-or-your-key
claude --model openrouter/deepseek/deepseek-v4-pro

Your agent uses DeepSeek V4 for reasoning and code generation, Claude Code for agent execution (reading files, running commands, managing git), and AnyCap for multimodal capabilities.

Full guide: DeepSeek V4 with Claude Code: Agent Integration Guide

Web search and live information

DeepSeek V4's 1M-token context window is uniquely suited for search-augmented workflows. Feed it search results from AnyCap's web search, and the model can ingest and synthesize the full output in one pass — no chunking, no retrieval-augmented generation pipeline, just raw context.

Model comparison: DeepSeek V4 vs GPT-5.5

If you are evaluating DeepSeek V4 against GPT-5.5 specifically — benchmarks, pricing, multimodal gap, deployment flexibility — see the full comparison.

Full comparison: DeepSeek V4 vs GPT-5.5: Capability Comparison

Recommended stacks for different use cases

Budget-conscious agent development

DeepSeek V4 Flash ($0.14/1M tokens)
  + Claude Code (agent execution)
  + AnyCap (multimodal capabilities)
= Full agent stack at ~$5-10/month for daily use

Maximum performance, best cost

DeepSeek V4 Pro ($0.28/1M tokens) for complex reasoning
DeepSeek V4 Flash ($0.14/1M tokens) for simple tasks
  + Claude Code or OpenClaw (agent execution)
  + AnyCap (multimodal capabilities)
  + Multi-model router (OpenRouter)
= Frontier agentic coding at ~$15-30/month

Self-hosted, air-gapped

DeepSeek V4 Pro (self-hosted on workstation GPU)
  + Claude Code (agent execution)
  + AnyCap (multimodal capabilities)
  + Local network only
= No data leaves your infrastructure

Enterprise OpenAI ecosystem

GPT-5.5 for native multimodal tasks
DeepSeek V4 Flash for cost-sensitive code generation
  + Multi-model router
  + AnyCap (unified capability layer across both models)
= Best of both ecosystems

FAQ

Is DeepSeek V4 actually free?

The model weights are free and open-source under Apache 2.0. Running it yourself costs compute — electricity and hardware. Using the DeepSeek API costs $0.28/1M input tokens for V4 Pro, $0.14/1M for V4 Flash. Using it through OpenRouter or other providers may have different pricing.

Can DeepSeek V4 generate images?

Not natively. It is a text-only model. You can add image generation through MCP servers or a capability runtime like AnyCap. The model handles reasoning and code; the capability layer handles multimodal outputs. See our guide to adding multimodal capabilities to DeepSeek V4.

What is the difference between V4 Pro and V4 Flash?

V4 Pro is the full model: 1.6T total parameters, 49B active per token, strongest reasoning performance. V4 Flash is a smaller, faster variant: lower latency, lower cost ($0.14 vs $0.28/1M tokens), slightly lower benchmark scores. Use Flash for rapid iteration and simple tasks. Use Pro for complex multi-file refactoring and architectural reasoning.

Does DeepSeek V4 work with Cursor?

Yes. Add DeepSeek V4 as a model provider in Cursor settings. AnyCap installs the same way as an MCP skill. The same stack works across Claude Code, Cursor, and OpenClaw — you are not locked into one agent shell.

How does DeepSeek V4 compare to Claude Opus 4.7?

They are competitive on benchmarks. The main differences: Claude Opus 4.7 is more expensive (subscription or API pricing), has tighter integration with Claude Code (native, not routed), and benefits from Anthropic's extended thinking capability. DeepSeek V4 is 1/35th the cost, open-source, and self-hostable. The choice depends on whether you value integration smoothness or cost and deployment flexibility.

DeepSeek V4: Complete Developer Guide — Architecture, benchmarks, API integration, self-hosting, and everything you need to integrate DeepSeek V4.
DeepSeek V4 vs GPT-5.5: Full Capability Comparison — Benchmarks, pricing, multimodal gap, and deployment flexibility compared side-by-side.
DeepSeek V4 with Claude Code: Agent Integration Guide — Route Claude Code through DeepSeek V4 for agentic coding at 1/35th the cost.
How to Add Multimodal Capabilities to DeepSeek V4 Agents — Add image generation, video, web search, and cloud storage to your DeepSeek V4 agent in under 2 minutes.

Get started with DeepSeek V4:

# Route Claude Code through DeepSeek V4
export OPENROUTER_API_KEY=sk-or-your-key
claude --model openrouter/deepseek/deepseek-v4-pro

# Add multimodal capabilities
npx -y skills add anycap-ai/anycap -a claude-code

DeepSeek V4 Developer Guide · Add Multimodal to V4 · V4 + Claude Code · V4 vs GPT-5.5

DeepSeek V4 Capability Guide: What It Can (and Can't) Do (2026)