Showing 36 of 37 projects
A fast and elegant scraping and crawling framework for Go, designed for extracting structured data from websites.
An incredibly fast web crawler designed for OSINT (Open Source Intelligence) data extraction.
A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.
A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A Slack bot that reads and summarizes webpages, documents, and videos using ChatGPT, with voice chat capabilities.
A lightweight, efficient, and fast high-level web crawling and scraping framework for .NET.
An open-source intelligence (OSINT) tool for crawling and analyzing websites on the dark web and beyond.
A .NET port of the official Node.js Puppeteer API for headless browser automation.
A .NET port of the official Node.js Puppeteer API for headless browser automation.
A self-hosted web application that indexes torrent sites and saves magnet links to a local database.
A PHP class for detecting bots, crawlers, and spiders via user agent and HTTP headers.
An async Python web scraping micro-framework built on asyncio and aiohttp for fast, extensible crawling.
A batteries-included Ruby framework for easy web-scraping with built-in debug mode and rate limiting.
A preconfigured web crawler for backing up websites, producing WARC files with a live dashboard and dynamic ignore patterns.
A Swift library for generating link previews (title, description, images) from URLs on Apple platforms.
A lightweight Ruby web crawler and scraper with an elegant DSL for extracting structured data from web pages.
A fast, local-first web scraper and content extractor optimized for AI agents, with CLI, REST API, and MCP server.
An advanced Cross-Site Request Forgery (CSRF) audit and exploitation toolkit for security testing.
Write web scrapers in Ruby using a clean, AI-assisted DSL that caches selectors for fast, LLM-free extraction.
A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.
A standalone Docker container for high-fidelity, browser-based web archiving crawls using Puppeteer and Brave.
A scalable, mature, and versatile web crawler built on Apache Storm for building low-latency, distributed crawling systems.
A high-performance web crawler and scraper built in Elixir with worker pooling and rate limiting.
A versatile Ruby web spidering library for crawling sites, domains, or specific links with extensive filtering and callback support.
A Java API for controlling Chrome and Firefox browsers via DevTools and WebDriver-bidi protocols.
A cross-platform website crawler and analyzer for SEO, security, accessibility, and performance optimization, built in Rust.
A Go web scraping framework that extracts structured data from websites using CSS selectors, including JavaScript-rendered pages.
A Node.js library to automatically scrape and extract readable article content from any web page, supporting both English and Chinese.
A fast, Unix-style command-line web crawler that extracts links, resources, and API endpoints from web pages.
A fast, powerful, and extensible web crawling and scraping framework for Go, inspired by Scrapy.
A tool to create a local or public mirror of Packagist metadata for faster Composer package downloads in regions with slow internet.
A Go tool and library for downloading URLs and files from Common Crawl and Wayback Machine web archives.
A high-fidelity, user-scriptable archival web crawler using Chrome/Chromium to preserve JavaScript-rendered content.
An open-source price tracker that automatically monitors product prices from e-commerce sites and sends alerts when prices drop.
A reliable, flexible, and fast Rust framework for web crawling and request-response services.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.