Showing 15 of 15 projects
A fast and elegant scraping and crawling framework for Go, designed for extracting structured data from websites.
An incredibly fast web crawler designed for OSINT (Open Source Intelligence) data extraction.
A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.
A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A Slack bot that reads and summarizes webpages, documents, and videos using ChatGPT, with voice chat capabilities.
A lightweight, efficient, and fast high-level web crawling and scraping framework for .NET.
An open-source intelligence (OSINT) tool for crawling and analyzing websites on the dark web and beyond.
A .NET port of the official Node.js Puppeteer API for headless browser automation.
A .NET port of the official Node.js Puppeteer API for headless browser automation.
A self-hosted web application that indexes torrent sites and saves magnet links to a local database.
A PHP class for detecting bots, crawlers, and spiders via user agent and HTTP headers.
An async Python web scraping micro-framework built on asyncio and aiohttp for fast, extensible crawling.
A batteries-included Ruby framework for easy web-scraping with built-in debug mode and rate limiting.
A preconfigured web crawler for backing up websites, producing WARC files with a live dashboard and dynamic ignore patterns.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.