You're building with Codex CLI. It plans the implementation, writes the code, runs tests. Then you ask it to generate a product hero image or a UI mockup.
Codex stops. Image generation isn't in its native toolkit — same limitation as Claude Code, Cursor, and every other coding agent.
Here's how to add image generation to Codex. Three approaches, from manual integration to a single command.
Why Codex Doesn't Ship With Image Generation
Codex is OpenAI's agentic coding tool. It executes tasks in cloud sandboxes, plans across files, runs terminal commands, and handles the full development loop. Image generation is a separate model family — GPT Image 2, Seedream 5, FLUX.1, DALL-E — that runs on different infrastructure, updates independently, and requires its own API surface.
The gap is intentional. Codex stays focused on code; the capability layer is external. The question is how cleanly that capability plugs in.
What Codex + Image Generation Unlocks
When you add image generation to Codex, visuals become part of the build pipeline, not an afterthought:
- Hero images for landing pages. Codex builds the page, generates the hero image, embeds the URL — same session.
- UI mockups and design references. Describe a design direction, get a visual reference without leaving the terminal.
- Launch assets on demand. Social graphics, announcement visuals, OG images — generated by your agent when it's building the thing they promote.
- Image-to-video pipelines. Generate the still, then animate it. The same CLI handles both steps. See our complete image-to-video pipeline guide.
Method 1: Direct API Integration
Codex can execute shell commands. You can wire it directly to image generation APIs.
Step 1: Choose a provider. GPT Image 2 (OpenAI), Seedream 5 (ByteDance), FLUX.1 Kontext Max (Black Forest Labs), DALL-E 3 (OpenAI). Each has its own API format.
Step 2: Get API credentials. Separate developer console per provider. Separate API keys. Separate billing accounts.
Step 3: Write integration scripts. Codex calls your scripts with prompts. Your scripts handle auth, POST requests, async polling for generation jobs, file downloads, and output handling.
Step 4: Handle format differences. Different providers return different response formats. Base64, URLs, signed CDN links — you handle the normalization.
This works. But you end up maintaining integration code instead of generating images.
Method 2: MCP Server for Image Generation
MCP servers let Codex invoke external capabilities through a standard protocol:
- Replicate MCP — access to hundreds of image models
- FAL.ai MCP — fast inference for Flux models
- Stability MCP — Stable Diffusion variants
Configure once per server. Codex calls them like any tool. Lighter than direct API wiring.
The limitation: a single-provider MCP server locks you to that provider's model selection. When you want to compare GPT Image 2 output against Seedream 5, you're adding a second server.
Method 3: One CLI Across Codex, Claude Code, and Cursor
This is the approach where your agent calls one command regardless of which image model you want:
anycap image generate \
--prompt "a modern SaaS dashboard on a MacBook, floating UI elements, soft studio lighting, product photography style" \
--model seedream-5 \
-o hero.jpg
Change --model seedream-5 to --model gpt-image-2, --model flux-kontext-max, or --model nano-banana-2 — same command, different model. Codex, Claude Code, and Cursor all call the same CLI.
Install for Codex:
npx -y skills add anycap-ai/anycap -a codex -y
anycap login && anycap status
After install, Codex recognizes anycap image generate as an available command in its shell environment.
→ Install AnyCap free — 250 credits for new users
Image Models Available Through AnyCap
| Model | Provider | Best for |
|---|---|---|
| Seedream 5 | ByteDance | Highest quality first-pass. Product photography, hero images, detailed scenes. |
| GPT Image 2 | OpenAI | Native OpenAI ecosystem fit. Strong for UI screenshots and clean product shots. |
| FLUX.1 Kontext Max | Black Forest Labs | Design-heavy work, typography, graphic elements. |
| Nano Banana Pro | Best for revision loops — generates quickly and holds edits well. | |
| Nano Banana 2 | Fast exploration. Use for volume and direction-testing before committing a final model. |
Text-to-Image in Codex: Generate from a Prompt
The simplest case — describe what you need, get the image back:
anycap image generate \
--prompt "a developer dashboard interface, dark theme, neon blue accent color, floating data cards, clean modern UI, product screenshot style" \
--model seedream-5 \
-o dashboard-hero.jpg
Model picker for Codex users:
| Your Codex task | Best model | Why |
|---|---|---|
| Product screenshot, hero image | Seedream 5 | Best first-pass quality — Codex coded it, image should match the quality |
| UI mockup, design reference | Nano Banana Pro | Fast generation for iteration before committing the final visual |
| Social graphic, announcement | GPT Image 2 | OpenAI ecosystem fit — Codex + GPT Image 2 stays end-to-end in the OpenAI stack |
| Design-heavy, typographic | FLUX.1 Kontext Max | Handles graphic design elements better than photography-tuned models |
| Volume, fast exploration | Nano Banana 2 | When you need 5 directions fast before picking one |
Image Editing in Codex: Modify an Existing Image
When you have an approved product screenshot or design asset and need to modify it — change the background, update text, adjust colors — without regenerating from scratch:
anycap image generate \
--prompt "replace the background with a clean white studio background, keep the product interface exactly as-is" \
--model nano-banana-pro \
--mode edit \
--param images=./dashboard-screenshot.jpg \
-o dashboard-clean.jpg
When editing beats regeneration:
- You have an approved product screenshot but need different backgrounds for different markets
- You want to update text or labels in an existing graphic
- You need multiple color variants of a finalized asset
The Full Codex Pipeline: Code → Image → Video → Publish
Codex chains shell commands naturally. AnyCap's CLI fits that pattern:
# 1. Codex builds the landing page
# ... (Codex's own work)
# 2. Generate the hero image (OpenAI-native: GPT Image 2)
anycap image generate \
--prompt "product hero shot for a developer tool, dark background, code editor interface, neon accents" \
--model gpt-image-2 \
-o hero.jpg
# 3. Animate the hero into a motion teaser (OpenAI-native: Sora 2 Pro)
anycap video generate \
--prompt "slow camera push-in, code highlights animate, subtle parallax background" \
--model sora-2-pro \
--mode image-to-video \
--param images=./hero.jpg \
-o teaser.mp4
# 4. Store and share
anycap drive upload hero.jpg teaser.mp4
Codex generated, animated, and stored — all OpenAI-native if you want, or mix providers by changing one flag.
Why Codex + AnyCap Is a Natural Fit
Three things make the AnyCap integration especially clean for Codex workflows:
1. CLI-native design. Codex executes shell commands. anycap image generate is just another shell command. No new paradigm. No API client to initialize. Codex chains it with && the same way it chains npm test or git push.
2. OpenAI ecosystem alignment. If your team is already OpenAI-first — Codex for code, GPT Image 2 for images, Sora 2 Pro for video — AnyCap routes all three through one CLI. But you can also mix: --model seedream-5 or --model flux-kontext-max when you want different output without adding a new API key.
3. Same command across agents. The install target changes (~/.codex/skills/ vs ~/.claude/skills/), but the command is identical:
anycap image generate --prompt "..." --model seedream-5 -o output.jpg
Same CLI. Same auth. Same models. Switch between Codex, Claude Code, and Cursor without reconfiguring.
Cross-Agent: Same Command, Different Agents
| Agent | Skill directory | Unique advantage for image gen |
|---|---|---|
| Codex | ~/.codex/skills/ |
CLI-native, OpenAI ecosystem alignment, seamless shell chaining |
| Claude Code | ~/.claude/skills/ |
Subagent parallelism — compare multiple models simultaneously |
| Cursor | ~/.cursor/skills/ |
In-IDE: generate, embed, and view images in one agent action |
FAQ
Does Codex support image generation natively?
No. Codex is an agentic coding tool from OpenAI — it plans, implements, and ships code. Image generation requires external models. AnyCap bundles GPT Image 2, Seedream 5, FLUX.1, and Nano Banana behind one CLI.
Which image model should Codex users start with?
Seedream 5 for the highest quality first-pass on product images. GPT Image 2 if you want to stay fully in the OpenAI ecosystem (Codex → GPT Image 2 → Sora 2 Pro is a clean OpenAI-native pipeline). Nano Banana 2 for fast exploration when you need volume over perfection.
Can I use the same AnyCap install for image and video generation?
Yes. The same CLI handles both. anycap image generate and anycap video generate share the same auth, same credits, same output handling. The image-to-video pipeline is one workflow, not two separate tool setups.
Do I need separate API keys for different image models?
Not with AnyCap. One key covers GPT Image 2 (OpenAI), Seedream 5 (ByteDance), FLUX.1 (Black Forest Labs), and Nano Banana (Google). The runtime manages provider credentials internally.
Can Codex chain image generation with other shell commands?
Yes — Codex is built for this. npm run build && anycap image generate --prompt "..." -o hero.jpg && git add . && git commit -m "add hero". Codex thinks in shell pipelines. Image generation is just another step.
Can I use image generation in a Codex automation or CI pipeline?
Yes. AnyCap is headless — no UI required. Set your ANYCAP_API_KEY environment variable and call anycap image generate in any shell context where Codex runs automated tasks.
The Bottom Line
Codex plans features, writes code, runs tests, and ships. It can't make images — and that's by design.
The question is how you connect the two. A separate API key per provider and an integration script per model, or one CLI command that chains naturally into your existing Codex shell workflow.
→ Give Codex image generation — one install, all models
📖 What to Read Next
- How to Generate Video with Codex: The Complete 2026 Guide — The next step: take your generated image and animate it into a video clip.
- AI Image-to-Video: The Complete Pipeline for Coding Agents — Model pairing matrix for image-to-video workflows.
- Best AI Video Models for Coding Agents Compared — Which video model to animate your images with.
- How to Generate Images with Claude Code (2026) — The Claude Code variant of this guide.
- What Is a Capability Runtime? — The infrastructure that bundles image, video, search, and storage into one CLI.
Related Articles
- Terminal Agent Showdown: Claude Code vs Codex CLI vs Windsurf — How Codex compares to other terminal agents.
- What Is an AI Agent? The Complete Developer Guide — Agent fundamentals: why tools make the agent.
- How to Give Claude Code Cloud Storage — Store your generated images and share them from your agent.
Written by the AnyCap team. We build the capability runtime that gives Codex image generation through one CLI — so your agent doesn't stop at "I can't create visuals."