How does GACUA compare to AutoHotkey for desktop automation?

GACUA uses AI vision models for dynamic task understanding and remote control, while AutoHotkey relies on pre-written scripts for local automation. GACUA is better for adaptive tasks but requires API access and network setup.

How to set up GACUA on a headless server without a display?

The Troubleshooting guide mentions issues with black screenshots via SSH; you may need virtual display solutions like Xvfb, but the README doesn't provide detailed instructions, so compatibility is limited.

Is GACUA free to use with the Gemini API?

GACUA itself is open-source and free, but you need a Gemini API key from Google, which may incur costs based on usage rates. Refer to Google's pricing for exact details.

Can GACUA control multiple computers simultaneously?

The decoupled mode allows running brain and body on separate machines, but the README doesn't specify support for multi-computer control from a single interface; it's likely designed for one-to-one connections.

What operating systems does GACUA support?

Based on dependencies like nut.js, it likely supports Windows, macOS, and Linux, but the README doesn't explicitly list compatibility, so check underlying tools for specifics.

How to troubleshoot GACUA if it fails to connect over the network?

Check firewall settings to allow Node.js ports, ensure both devices are on the same Wi-Fi, and use the verbose npx command for installation logs, as suggested in the network configuration section.

GACUA

Apache-2.0TypeScript

An out-of-the-box computer use agent powered by Gemini CLI that lets you control your PC from another device via a web interface.

Visit Website

What is GACUA?

GACUA is an open-source computer use agent that allows you to control your PC remotely from another device via a web browser. It uses Google's Gemini vision model to understand screen content and execute tasks like software installation, gameplay assistance, and general desktop automation. It solves the problem of needing a transparent, controllable agent for desktop task automation without fighting for mouse/keyboard control.

Target Audience

Developers and tech enthusiasts looking to experiment with AI-powered desktop automation, remote computer control, or building upon agentic AI systems. It's also for users who want a free, open-source alternative to proprietary automation tools.

Value Proposition

Developers choose GACUA because it offers an immediate, out-of-the-box experience with a single command, provides unique step-by-step observability and control over AI actions, and supports a decoupled architecture for flexible deployment. Its open-source nature and integration with Gemini CLI make it highly extensible for customization and experimentation.

Overview

The World's First Out-of-the-Box Computer Use Agent Powered by Gemini-CLI @openmule

Use Cases

Best For

Remotely controlling a desktop PC from a mobile device or another computer

Related Projects

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

GitHub

138 stars20 forks0 contributors

Automating repetitive software installation and setup tasks

Experimenting with vision-based AI agents for desktop interaction

Building custom AI automation workflows on top of the MCP protocol

Testing and benchmarking different vision models for computer use scenarios

Creating a transparent, debuggable agentic system where each action can be reviewed

Not Ideal For

Projects requiring fully offline operation without internet connectivity
Teams needing fast, script-based automation without AI model latency
Environments with strict security policies that prohibit local network servers or cloud API access
Users who prefer graphical interfaces over command-line setup and configuration

Pros & Cons

Pros

Out-of-the-Box Setup

Start with a single npx command, requiring no complex installation, as emphasized in the Quick Start section for immediate use.

High Accuracy Execution

Employs 'Image Slicing + Two-Step Grounding' to enhance Gemini 2.5 Pro's visual grounding, improving task success rates based on the README's technical approach.

Transparent Action Control

Provides step-by-step execution flow where users can review, accept, or reject each action, aligning with the philosophy of moving away from black-box automation.

Remote and Decoupled Architecture

Allows control from separate devices over the same network and supports running brain and body components on different machines, enabling advanced network setups as described.

Cons

Cloud API Dependence

Relies on Google's Gemini API for vision processing, which incurs costs and requires stable internet access, limiting use in offline or cost-sensitive scenarios.

Network Configuration Overhead

Setup requires devices on the same network and firewall adjustments, as noted in the network configuration warnings, adding complexity for novice users.

Incomplete Feature Set

Key enhancements like pluggable architecture and CLI mode are listed in the roadmap, indicating the project is still in early development with missing functionalities.

Frequently Asked Questions

Home

Gemini CLI

Maestro

Multi-agent orchestration platform for Gemini CLI, Claude Code, Codex, and Qwen Code — 39 specialists, parallel subagents, persistent sessions, and built-in code review, debugging, security, SEO, accessibility, and compliance tools

Stars449

Forks32

Last commit8 days ago

gemini-code-flow

AI-powered development orchestration for Gemini CLI - adapted from Claude Code Flow by ruvnet

Stars157

Forks23

Last commit1 year ago

gemini-cli-commands-demo

This project demonstrates a sub-agent orchestration system built within the Gemini CLI, using its native features to manage complex, asynchronous tasks performed by specialized AI agents. It showcases how prompt-driven commands can replace traditional programming logic to create a transparent and debuggable multi-agent workflow. ## Key Features - **Filesystem-as-State** — The entire system state (tasks, plans, logs) is stored in structured directories, eliminating external databases and making debugging straightforward. - **Prompt-Driven Commands** — Orchestrator logic is defined in `.toml` prompts that create custom Gemini CLI commands (e.g., `/agents:start`, `/agents:run`), avoiding traditional code. - **Asynchronous Agent Management** — Sub-agents run as background processes, with completion tracked via PID and `.done` sentinel files for status reconciliation. - **Specialized Sub-Agents** — Includes agents like `coder-agent` and `reviewer-agent`, each with unique personas and constrained capabilities for specific tasks. - **Transparent Workspace** — Agents operate in a dedicated `.gemini/agents/workspace/` directory, where all outputs and modifications are visible and organized. ## Philosophy The system embraces simplicity and transparency by leveraging the filesystem as a state store and using prompts as the primary configuration method, making complex agent orchestration accessible without deep programming expertise.

Stars91

Forks27

Last commit11 months ago