An out-of-the-box computer use agent powered by Gemini CLI that lets you control your PC from another device via a web interface.
GACUA is an open-source computer use agent that allows you to control your PC remotely from another device via a web browser. It uses Google's Gemini vision model to understand screen content and execute tasks like software installation, gameplay assistance, and general desktop automation. It solves the problem of needing a transparent, controllable agent for desktop task automation without fighting for mouse/keyboard control.
Developers and tech enthusiasts looking to experiment with AI-powered desktop automation, remote computer control, or building upon agentic AI systems. It's also for users who want a free, open-source alternative to proprietary automation tools.
Developers choose GACUA because it offers an immediate, out-of-the-box experience with a single command, provides unique step-by-step observability and control over AI actions, and supports a decoupled architecture for flexible deployment. Its open-source nature and integration with Gemini CLI make it highly extensible for customization and experimentation.
The World's First Out-of-the-Box Computer Use Agent Powered by Gemini-CLI @openmule
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Start with a single npx command, requiring no complex installation, as emphasized in the Quick Start section for immediate use.
Employs 'Image Slicing + Two-Step Grounding' to enhance Gemini 2.5 Pro's visual grounding, improving task success rates based on the README's technical approach.
Provides step-by-step execution flow where users can review, accept, or reject each action, aligning with the philosophy of moving away from black-box automation.
Allows control from separate devices over the same network and supports running brain and body components on different machines, enabling advanced network setups as described.
Relies on Google's Gemini API for vision processing, which incurs costs and requires stable internet access, limiting use in offline or cost-sensitive scenarios.
Setup requires devices on the same network and firewall adjustments, as noted in the network configuration warnings, adding complexity for novice users.
Key enhancements like pluggable architecture and CLI mode are listed in the roadmap, indicating the project is still in early development with missing functionalities.