Showing 25 of 25 projects
An open-source web crawler and scraper that converts web content into clean, LLM-ready Markdown for RAG, agents, and data pipelines.
An incredibly fast web crawler designed for OSINT (Open Source Intelligence) data extraction.
A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.
A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.
An open-source Java web crawler that provides a simple interface for multi-threaded web crawling.
An open-source intelligence (OSINT) tool for crawling and analyzing websites on the dark web and beyond.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
A Node.js tool for crawling websites to find unused and duplicate CSS selectors.
A preconfigured web crawler for backing up websites, producing WARC files with a live dashboard and dynamic ignore patterns.
A lightweight Ruby web crawler and scraper with an elegant DSL for extracting structured data from web pages.
A configurable and extensible PHP web spider for crawling and scraping websites with support for breadth-first/depth-first traversal, caching, and custom filters.
A fast, local-first web scraper and content extractor optimized for AI agents, with CLI, REST API, and MCP server.
An advanced Cross-Site Request Forgery (CSRF) audit and exploitation toolkit for security testing.
A standalone Docker container for high-fidelity, browser-based web archiving crawls using Puppeteer and Brave.
A scalable, mature, and versatile web crawler built on Apache Storm for building low-latency, distributed crawling systems.
A high-performance web crawler and scraper built in Elixir with worker pooling and rate limiting.
A versatile Ruby web spidering library for crawling sites, domains, or specific links with extensive filtering and callback support.
A Java API for searching and downloading Android applications from Google Play, with device emulation capabilities.
A research-driven web crawler for building and analyzing curated web corpora as networks of web entities.
A fast, Unix-style command-line web crawler that extracts links, resources, and API endpoints from web pages.
A fast, powerful, and extensible web crawling and scraping framework for Go, inspired by Scrapy.
A high-fidelity, user-scriptable archival web crawler using Chrome/Chromium to preserve JavaScript-rendered content.
A reliable, flexible, and fast Rust framework for web crawling and request-response services.
An offline-first web browser that archives, searches, and crawls websites for personal use.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.