Skills
Collect structured visual feedback from humans
directly inside your agent session
anycap-human-interaction is an installable skill that teaches Claude Code, Cursor, Codex, and similar coding agents how to bring humans into a running workflow for visual review and feedback. Without this skill, agents must rely on text descriptions alone when a task requires a human to annotate an image, review a screen recording, examine a video output, or sketch a diagram. With the skill installed, the agent knows how to invoke AnyCap's capture and annotation capabilities at the right moment and how to incorporate the resulting structured feedback into the next step. The skill covers image annotation workflows, screen and audio recording, video review checkpoints, and interactive Excalidraw diagramming sessions. This is particularly useful for design review loops, UX testing pipelines, and any workflow where visual sign-off is a required gate before the agent continues. The install path works through skills.sh or the AnyCap CLI and is compatible with Claude Code, Cursor, and Codex.
Install time
< 5 min
Supported agents
Claude Code · Cursor · Codex
Platform
macOS · Linux · Windows
Install this skill
npx -y skills add anycap-ai/anycap -s 'anycap-human-interaction' -g -y
How the workflow changes after install
BEFORE
Agent can only ask for feedback in text. Human must describe what they see and hope the agent interprets it correctly.
AFTER
Agent pauses at a defined checkpoint, captures the visual, and presents it for structured annotation. Human marks up the image directly. Agent reads the annotation as structured data and continues.
- 1
Reach review checkpoint
Agent identifies a step that requires human visual judgment — a design, screenshot, or generated output.
- 2
Capture or upload visual
Agent uses AnyCap to capture the current screen or upload the relevant image/video for review.
- 3
Human annotates
Human opens the AnyCap review interface, marks up the visual, adds comments, or sketches revisions in Excalidraw.
- 4
Agent reads feedback
Agent uses AnyCap image read or video analysis to parse the annotated output as structured data.
- 5
Continue workflow
Agent incorporates the feedback into the next step — revising the design, adjusting code, or updating the brief.
SAMPLE OUTPUT
[Agent] Capturing screen for human review... → Review URL: https://anycap.ai/review/r_abc123 [Human] Annotated 3 points on screenshot: 1. "Move CTA above fold" 2. "Reduce padding on mobile" 3. "Change button color to #6b7d3a" [Agent] Parsed 3 structured annotations. Applying changes...
Capabilities used by this skill
Supported agents
Frequently asked questions
- What does this skill do that direct prompting doesn't?
- Direct prompting can't pause a workflow for a human to annotate a specific image and return that as structured data. This skill teaches the agent exactly when and how to invoke visual capture and how to parse the annotation result.
- Does this skill require a separate annotation tool?
- No. The annotation interface is built into AnyCap. The agent opens the session; the human reviews directly in the AnyCap interface.
- Do I need an AnyCap account?
- Yes. Install the AnyCap CLI and run `anycap login` before your first session.
- How is this different from MCP?
- MCP is a communication protocol. This skill is an instruction file that tells the agent when to trigger human review checkpoints and how to read the structured output. They serve different layers of the stack.