A fast, local-first web scraper and content extractor optimized for AI agents, with CLI, REST API, and MCP server.
webclaw is a high-performance web content extraction tool designed specifically for AI agents and LLMs. It scrapes, crawls, and extracts structured data from websites with sub-millisecond speed, producing clean, token-efficient output while avoiding bot detection through Chrome-level TLS fingerprinting.
Developers building AI agents and LLM applications that require real-time web access, such as those using Claude, Cursor, or other MCP-compatible clients. It is also suitable for researchers, data engineers, and anyone needing efficient, local web scraping for tasks like price monitoring or training data collection.
Developers choose webclaw for its combination of extreme speed, local-first operation, and native integration with AI agent ecosystems via MCP. Its unique selling points include sub-millisecond extraction without browser overhead, a 67% reduction in token usage compared to raw HTML, and the ability to bypass bot protections through TLS fingerprinting.
Fast, local-first web content extraction for LLMs. Scrape, crawl, extract structured data — all from Rust. CLI, REST API, and MCP server.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Benchmarks show sub-millisecond processing for small pages (e.g., 0.8ms for 10KB), outperforming alternatives like readability and trafilatura by a significant margin.
Reduces token usage by 67% compared to raw HTML, with LLM-friendly output formats that preserve metadata, links, and images for cost-effective AI agent feeding.
Uses Chrome-level TLS fingerprinting to bypass protections like Cloudflare, demonstrated in the README where it avoids 403 errors that standard fetch calls encounter.
Provides an MCP server with 10 tools for native use in Claude, Cursor, and other clients, auto-configured via `npx create-webclaw` for immediate web access.
Includes crawling, brand extraction, LLM-powered summarization, and content diffing in a modular Rust architecture, supporting local-first operation with optional cloud enhancements.
The local engine cannot handle JavaScript-heavy single-page applications; it requires the optional cloud API for rendering, adding cost and external dependency.
Installing from source demands Cargo and Rust expertise, which can be a barrier compared to drop-in Python or Node.js libraries with simpler setup.
As a newer project, it lacks the extensive third-party plugins, community support, and battle-tested documentation of established scrapers like Scrapy or Puppeteer.
The AGPL-3.0 license may deter commercial use where source code disclosure for distributed modifications is undesirable, unlike permissively licensed alternatives.
webclaw is an open-source alternative to the following products: