Showing 10 of 10 projects
An open-source web crawler and scraper that converts web content into clean, LLM-ready Markdown for RAG, agents, and data pipelines.
An incredibly fast web crawler designed for OSINT (Open Source Intelligence) data extraction.
A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.
A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.
An open-source Java web crawler that provides a simple interface for multi-threaded web crawling.
An open-source intelligence (OSINT) tool for crawling and analyzing websites on the dark web and beyond.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
A Node.js tool for crawling websites to find unused and duplicate CSS selectors.
A preconfigured web crawler for backing up websites, producing WARC files with a live dashboard and dynamic ignore patterns.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.