Overview
Rosply is an AI desktop agent for Windows. You describe a task in natural language and Rosply executes it by seeing your screen and controlling the mouse and keyboard.
Under the hood Rosply takes a screenshot, sends it along with your task to a vision AI model, receives a list of actions, and executes them. This loop repeats until the task is done or you press Ctrl+H.
Rosply does not require any browser extension, accessibility APIs, or app-specific plugins. It works with any Windows application by treating the screen as its only input.
What It Can Do
- Open apps, navigate websites, search the web
- Read content from the screen and save it to files
- Control any Windows application by clicking, typing, scrolling, and dragging
- Generate complete code projects in VS Code via the built-in rosply-code extension
- Listen for a wake word ("Hey Rosply") and respond to voice commands
- Remember values across steps (e.g. read a price on one page, use it in another)
- Emergency stop at any moment with Ctrl+H
Requirements
| Requirement | Notes |
|---|---|
| Windows 10 or 11 (64-bit) | Required. Only the primary monitor is controlled. |
| Python 3.11 or newer | Download from python.org. Must be in PATH. |
| OpenRouter API key | Free tier available at openrouter.ai. |
| VS Code | Optional. Only required for code generation tasks. |
Setup
1. Get an API key
Create a free account at openrouter.ai and copy your API key. The free tier gives access to several vision-capable models including Gemini Flash.
2. Clone or download
git clone https://github.com/harkixsha/rosply.git
cd rosply3. Run setup
Double-click setup.bat or run it from a terminal. The setup script will:
- Check your Python version
- Create a virtual environment in .venv/
- Install all Python dependencies
- Download the Whisper speech recognition model (~230 MB, first run only)
- Create a .env file from the template and open it in Notepad
- Install the rosply-code VS Code extension automatically
4. Add your API key
When Notepad opens, replace the placeholder with your key:
OPENROUTER_API_KEY=sk-or-...your-key-here...5. Launch
start.batConfiguration
All settings live in .env. The only required field is OPENROUTER_API_KEY.
| Variable | Default | Description |
|---|---|---|
| OPENROUTER_API_KEY | required | Your OpenRouter API key |
| OPENROUTER_MODEL | google/gemini-2.0-flash-exp:free | Vision model. Must support image input. |
| MAX_ACTIONS_PER_TASK | 200 | Max actions before auto-stop |
| MAX_TASK_ITERATIONS | 30 | Max screen observations per task |
| SCREENSHOT_MAX_WIDTH | 1440 | Screenshot width (px). Higher = more detail, slower. |
| SCREENSHOT_QUALITY | 88 | JPEG quality (1-95). Lower = faster uploads. |
Recommended vision models
| Model | Speed | Cost |
|---|---|---|
| google/gemini-2.0-flash-exp:free | Fast | Free |
| google/gemini-flash-1.5-8b | Fast | Very cheap |
| qwen/qwen2.5-vl-7b-instruct:free | Medium | Free |
| meta-llama/llama-3.2-11b-vision-instruct:free | Medium | Free |
Usage
Type your task in the input box and press Enter or click the send button.
Example tasks
Open Chrome and search for the current Bitcoin priceOpen Notepad and write a short poem about spaceGo to youtube.com and play the first video about GSAP animationsCreate a portfolio website with dark theme and GSAP animations in VS CodeOpen the Downloads folder and tell me what files are thereVoice input
- Click the microphone button or press Space to start listening
- Say "Hey Rosply" at any time to wake the agent without touching the keyboard
Emergency stop
Press Ctrl+H at any time to immediately stop the current task.
VS Code Code Generation
When you ask Rosply to create a project or write code, it uses the rosply-code VS Code extension.
- 1The extension receives a description of what to build from the agent
- 2It sends the description to a coding-focused AI model (DeepSeek V3 via OpenRouter)
- 3The model generates all project files at once
- 4The extension writes them directly into the current VS Code workspace (or creates a new folder on the Desktop)
To update the extension after pulling new code, run install_extension.bat and restart VS Code completely.
Claude Code Integration
Rosply can be registered as a global MCP server so you can trigger PC automation directly from within Claude Code. After setup, you can type a task inside Claude Code and Rosply will execute it on your desktop.
Step 1 - Run claude-setup.bat
The setup script verifies that Claude Code is installed natively, then registers Rosply as a global MCP server using the claude CLI:
claude-setup.batThe script registers the MCP server at user scope so it is available in every Claude Code project.
Step 2 - Install the plugin
After restarting Claude Code, run these two commands inside it:
/plugin marketplace add harkixsha/rosply-agent-plugin
/plugin install rosply-agent@harkixsha-pluginsStep 3 - Give a command
Type any task and Claude Code sends it to Rosply via the MCP tool. Rosply executes the task on your desktop and reports back.
use the rosply skill to open Chrome and navigate to github.comHow it works
The claude-setup.bat script calls claude mcp add to register agent/mcp_server.py as a global MCP server. The Python server exposes a run_task tool that receives a task string and runs the Rosply agent loop on your desktop.
Project Structure
rosply/
agent/
brain.py AI reasoning loop
actions.py Action implementations (click, type, scroll...)
memory.py Persistent key-value memory
security.py Blocks dangerous input and protected system paths
inputs/
vision.py Screen capture and encoding
stt.py Speech-to-text (faster-whisper)
tts.py Text-to-speech (edge-tts, fallback pyttsx3)
wake_word.py Always-on wake word detection
ui/
main_window.py Main pywebview window
bubble.py Floating overlay during tasks
vignette.py Fullscreen dark border during capture
chats/ Saved chat history (JSON)
extensions/
rosply-code/ VS Code extension for code generation
config.py Loads .env, exposes settings
main.py Entry point
setup.bat First-time setup
start.bat Launch script
claude-setup.bat Claude Code MCP integration
requirements.txt Python dependencies
logs/
rosply.log Runtime log (auto-rotated at 2 MB)Troubleshooting
No OpenRouter API key found
Open .env and make sure OPENROUTER_API_KEY is set to your actual key, not the placeholder.
Agent takes screenshots but does nothing
The model may not support image input. Check that OPENROUTER_MODEL in .env is a vision model.
Voice input not working
Make sure a microphone is connected and allowed in Windows privacy settings. The faster-whisper model downloads on first use.
VS Code extension not responding
Run install_extension.bat and restart VS Code completely (not just reload window). The extension activates on VS Code startup.
Task runs in a loop
The model cannot find a way to complete the task. Press Ctrl+H to stop it, then rephrase with more specific instructions.
Claude Code MCP server not found
Make sure Claude Code is installed natively. Run claude-setup.bat and restart Claude Code.