Skills

Collect structured visual feedback from humans
directly inside your agent session

anycap-human-interaction is an installable skill that teaches Claude Code, Cursor, Codex, and similar coding agents how to bring humans into a running workflow for visual review and feedback. Without this skill, agents must rely on text descriptions alone when a task requires a human to annotate an image, review a screen recording, examine a video output, or sketch a diagram. With the skill installed, the agent knows how to invoke AnyCap's capture and annotation capabilities at the right moment and how to incorporate the resulting structured feedback into the next step. The skill covers image annotation workflows, screen and audio recording, video review checkpoints, and interactive Excalidraw diagramming sessions. This is particularly useful for design review loops, UX testing pipelines, and any workflow where visual sign-off is a required gate before the agent continues. The install path works through skills.sh or the AnyCap CLI and is compatible with Claude Code, Cursor, and Codex.

Install time

< 5 min

Supported agents

Claude Code · Cursor · Codex

Platform

macOS · Linux · Windows

Install this skill

npx -y skills add anycap-ai/anycap -s 'anycap-human-interaction' -g -y

View SKILL.md on GitHub →

How the workflow changes after install

BEFORE

Agent can only ask for feedback in text. Human must describe what they see and hope the agent interprets it correctly.

AFTER

Agent pauses at a defined checkpoint, captures the visual, and presents it for structured annotation. Human marks up the image directly. Agent reads the annotation as structured data and continues.

1
Reach review checkpoint
Agent identifies a step that requires human visual judgment — a design, screenshot, or generated output.
2
Capture or upload visual
Agent uses AnyCap to capture the current screen or upload the relevant image/video for review.
3
Human annotates
Human opens the AnyCap review interface, marks up the visual, adds comments, or sketches revisions in Excalidraw.
4
Agent reads feedback
Agent uses AnyCap image read or video analysis to parse the annotated output as structured data.
5
Continue workflow
Agent incorporates the feedback into the next step — revising the design, adjusting code, or updating the brief.

SAMPLE OUTPUT

[Agent] Capturing screen for human review...
→ Review URL: https://anycap.ai/review/r_abc123
[Human] Annotated 3 points on screenshot:
  1. "Move CTA above fold"
  2. "Reduce padding on mobile"
  3. "Change button color to #6b7d3a"
[Agent] Parsed 3 structured annotations. Applying changes...

Capabilities used by this skill

Image Understanding

Read and parse annotated images returned by human reviewers.

Video Analysis

Analyze screen recordings and video feedback from human reviewers.

Supported agents

Claude Code Cursor Codex

Frequently asked questions

What does this skill do that direct prompting doesn't?: Direct prompting can't pause a workflow for a human to annotate a specific image and return that as structured data. This skill teaches the agent exactly when and how to invoke visual capture and how to parse the annotation result.
Does this skill require a separate annotation tool?: No. The annotation interface is built into AnyCap. The agent opens the session; the human reviews directly in the AnyCap interface.
Do I need an AnyCap account?: Yes. Install the AnyCap CLI and run `anycap login` before your first session.
How is this different from MCP?: MCP is a communication protocol. This skill is an instruction file that tells the agent when to trigger human review checkpoints and how to read the structured output. They serve different layers of the stack.

View all skills Image Understanding Video Analysis Install AnyCap

Skills

Collect structured visual feedback from humans
directly inside your agent session

Install time

< 5 min

Supported agents

Claude Code · Cursor · Codex

Platform

macOS · Linux · Windows

Install this skill

npx -y skills add anycap-ai/anycap -s 'anycap-human-interaction' -g -y

View SKILL.md on GitHub →

How the workflow changes after install

BEFORE

Agent can only ask for feedback in text. Human must describe what they see and hope the agent interprets it correctly.

AFTER

Agent pauses at a defined checkpoint, captures the visual, and presents it for structured annotation. Human marks up the image directly. Agent reads the annotation as structured data and continues.

1
Reach review checkpoint
Agent identifies a step that requires human visual judgment — a design, screenshot, or generated output.
2
Capture or upload visual
Agent uses AnyCap to capture the current screen or upload the relevant image/video for review.
3
Human annotates
Human opens the AnyCap review interface, marks up the visual, adds comments, or sketches revisions in Excalidraw.
4
Agent reads feedback
Agent uses AnyCap image read or video analysis to parse the annotated output as structured data.
5
Continue workflow
Agent incorporates the feedback into the next step — revising the design, adjusting code, or updating the brief.

SAMPLE OUTPUT

[Agent] Capturing screen for human review...
→ Review URL: https://anycap.ai/review/r_abc123
[Human] Annotated 3 points on screenshot:
  1. "Move CTA above fold"
  2. "Reduce padding on mobile"
  3. "Change button color to #6b7d3a"
[Agent] Parsed 3 structured annotations. Applying changes...

Capabilities used by this skill

Image Understanding

Read and parse annotated images returned by human reviewers.

Video Analysis

Analyze screen recordings and video feedback from human reviewers.

Supported agents

Claude Code Cursor Codex

Frequently asked questions

What does this skill do that direct prompting doesn't?: Direct prompting can't pause a workflow for a human to annotate a specific image and return that as structured data. This skill teaches the agent exactly when and how to invoke visual capture and how to parse the annotation result.
Does this skill require a separate annotation tool?: No. The annotation interface is built into AnyCap. The agent opens the session; the human reviews directly in the AnyCap interface.
Do I need an AnyCap account?: Yes. Install the AnyCap CLI and run `anycap login` before your first session.
How is this different from MCP?: MCP is a communication protocol. This skill is an instruction file that tells the agent when to trigger human review checkpoints and how to read the structured output. They serve different layers of the stack.

View all skills Image Understanding Video Analysis Install AnyCap

Collect structured visual feedback from humansdirectly inside your agent session

Install this skill

How the workflow changes after install

Capabilities used by this skill

Supported agents

Frequently asked questions

Collect structured visual feedback from humansdirectly inside your agent session

Install this skill

How the workflow changes after install

Capabilities used by this skill

Supported agents

Frequently asked questions

Collect structured visual feedback from humans
directly inside your agent session

Collect structured visual feedback from humans
directly inside your agent session