Rosply
Back to home

Documentation

Everything you need to install, configure, and use Rosply.

Overview

Rosply is an AI desktop agent for Windows. You describe a task in natural language and Rosply executes it by seeing your screen and controlling the mouse and keyboard.

Under the hood Rosply takes a screenshot, sends it along with your task to a vision AI model, receives a list of actions, and executes them. This loop repeats until the task is done or you press Ctrl+H.

Rosply does not require any browser extension, accessibility APIs, or app-specific plugins. It works with any Windows application by treating the screen as its only input.

What It Can Do

  • Open apps, navigate websites, search the web
  • Read content from the screen and save it to files
  • Control any Windows application by clicking, typing, scrolling, and dragging
  • Generate complete code projects in VS Code via the built-in rosply-code extension
  • Listen for a wake word ("Hey Rosply") and respond to voice commands
  • Remember values across steps (e.g. read a price on one page, use it in another)
  • Emergency stop at any moment with Ctrl+H

Requirements

RequirementNotes
Windows 10 or 11 (64-bit)Required. Only the primary monitor is controlled.
Python 3.11 or newerDownload from python.org. Must be in PATH.
OpenRouter API keyFree tier available at openrouter.ai.
VS CodeOptional. Only required for code generation tasks.

Setup

1. Get an API key

Create a free account at openrouter.ai and copy your API key. The free tier gives access to several vision-capable models including Gemini Flash.

2. Clone or download

git clone https://github.com/harkixsha/rosply.git
cd rosply

3. Run setup

Double-click setup.bat or run it from a terminal. The setup script will:

  • Check your Python version
  • Create a virtual environment in .venv/
  • Install all Python dependencies
  • Download the Whisper speech recognition model (~230 MB, first run only)
  • Create a .env file from the template and open it in Notepad
  • Install the rosply-code VS Code extension automatically

4. Add your API key

When Notepad opens, replace the placeholder with your key:

OPENROUTER_API_KEY=sk-or-...your-key-here...

5. Launch

start.bat

Configuration

All settings live in .env. The only required field is OPENROUTER_API_KEY.

VariableDefaultDescription
OPENROUTER_API_KEYrequiredYour OpenRouter API key
OPENROUTER_MODELgoogle/gemini-2.0-flash-exp:freeVision model. Must support image input.
MAX_ACTIONS_PER_TASK200Max actions before auto-stop
MAX_TASK_ITERATIONS30Max screen observations per task
SCREENSHOT_MAX_WIDTH1440Screenshot width (px). Higher = more detail, slower.
SCREENSHOT_QUALITY88JPEG quality (1-95). Lower = faster uploads.

Recommended vision models

ModelSpeedCost
google/gemini-2.0-flash-exp:freeFastFree
google/gemini-flash-1.5-8bFastVery cheap
qwen/qwen2.5-vl-7b-instruct:freeMediumFree
meta-llama/llama-3.2-11b-vision-instruct:freeMediumFree

Usage

Type your task in the input box and press Enter or click the send button.

Example tasks

Open Chrome and search for the current Bitcoin price
Open Notepad and write a short poem about space
Go to youtube.com and play the first video about GSAP animations
Create a portfolio website with dark theme and GSAP animations in VS Code
Open the Downloads folder and tell me what files are there

Voice input

  • Click the microphone button or press Space to start listening
  • Say "Hey Rosply" at any time to wake the agent without touching the keyboard

Emergency stop

Press Ctrl+H at any time to immediately stop the current task.

VS Code Code Generation

When you ask Rosply to create a project or write code, it uses the rosply-code VS Code extension.

  1. 1The extension receives a description of what to build from the agent
  2. 2It sends the description to a coding-focused AI model (DeepSeek V3 via OpenRouter)
  3. 3The model generates all project files at once
  4. 4The extension writes them directly into the current VS Code workspace (or creates a new folder on the Desktop)
You do not need to configure anything for this to work. setup.bat installs the extension automatically. VS Code must be installed.

To update the extension after pulling new code, run install_extension.bat and restart VS Code completely.

Claude Code Integration

Rosply can be registered as a global MCP server so you can trigger PC automation directly from within Claude Code. After setup, you can type a task inside Claude Code and Rosply will execute it on your desktop.

Step 1 - Run claude-setup.bat

The setup script verifies that Claude Code is installed natively, then registers Rosply as a global MCP server using the claude CLI:

claude-setup.bat

The script registers the MCP server at user scope so it is available in every Claude Code project.

Step 2 - Install the plugin

After restarting Claude Code, run these two commands inside it:

/plugin marketplace add harkixsha/rosply-agent-plugin
/plugin install rosply-agent@harkixsha-plugins

Step 3 - Give a command

Type any task and Claude Code sends it to Rosply via the MCP tool. Rosply executes the task on your desktop and reports back.

use the rosply skill to open Chrome and navigate to github.com
Claude Code must be installed natively (not via npx or npm). Run "where claude" in a terminal to verify it is available globally.

How it works

The claude-setup.bat script calls claude mcp add to register agent/mcp_server.py as a global MCP server. The Python server exposes a run_task tool that receives a task string and runs the Rosply agent loop on your desktop.

Project Structure

rosply/
  agent/
    brain.py         AI reasoning loop
    actions.py       Action implementations (click, type, scroll...)
    memory.py        Persistent key-value memory
    security.py      Blocks dangerous input and protected system paths
  inputs/
    vision.py        Screen capture and encoding
    stt.py           Speech-to-text (faster-whisper)
    tts.py           Text-to-speech (edge-tts, fallback pyttsx3)
    wake_word.py     Always-on wake word detection
  ui/
    main_window.py   Main pywebview window
    bubble.py        Floating overlay during tasks
    vignette.py      Fullscreen dark border during capture
    chats/           Saved chat history (JSON)
  extensions/
    rosply-code/     VS Code extension for code generation
  config.py          Loads .env, exposes settings
  main.py            Entry point
  setup.bat          First-time setup
  start.bat          Launch script
  claude-setup.bat   Claude Code MCP integration
  requirements.txt   Python dependencies
  logs/
    rosply.log       Runtime log (auto-rotated at 2 MB)

Troubleshooting

No OpenRouter API key found

Open .env and make sure OPENROUTER_API_KEY is set to your actual key, not the placeholder.

Agent takes screenshots but does nothing

The model may not support image input. Check that OPENROUTER_MODEL in .env is a vision model.

Voice input not working

Make sure a microphone is connected and allowed in Windows privacy settings. The faster-whisper model downloads on first use.

VS Code extension not responding

Run install_extension.bat and restart VS Code completely (not just reload window). The extension activates on VS Code startup.

Task runs in a loop

The model cannot find a way to complete the task. Press Ctrl+H to stop it, then rephrase with more specific instructions.

Claude Code MCP server not found

Make sure Claude Code is installed natively. Run claude-setup.bat and restart Claude Code.