Showing 15 of 15 projects
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A lightweight, efficient, and fast high-level web crawling and scraping framework for .NET.
A framework-agnostic Ruby gem for generating XML sitemaps with Rails integration and support for multiple sitemap extensions.
A high-level Ruby API for controlling Chrome/Chromium browsers directly via the Chrome DevTools Protocol.
A Python security analysis tool that automatically discovers and reports comprehensive information about a given domain.
A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.
A versatile Ruby web spidering library for crawling sites, domains, or specific links with extensive filtering and callback support.
A fast, powerful, and extensible web crawling and scraping framework for Go, inspired by Scrapy.
A self-hosted URL to PNG generator with parallel rendering via Playwright and configurable storage caching.
A Go library for generating various types of XML sitemaps with support for search engine pinging and cloud storage.
Python command-line tools and libraries for handling, validating, and converting WARC and ARC web archive files.
Serverless data pipeline for crawling PDFs from the web and extracting structured data using AWS Textract.
A reliable, flexible, and fast Rust framework for web crawling and request-response services.
An Elixir library for generating sitemap.xml files with support for news, image, video, and mobile sitemaps.
A native Rust port of Google's robots.txt parser and matcher library, preserving all original behavior.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.