Showing 36 of 147 projects
A robust HTML to Markdown converter with plugin support, usable as a Go library, CLI tool, or via hosted API.
A Python module to bypass Cloudflare's anti-bot page by solving JavaScript challenges using Node.js.
A Go library to automate Chromium, Firefox, and WebKit browsers with a single API for cross-browser web automation.
A Go library for cross-browser automation, controlling Chromium, Firefox, and WebKit with a single API.
A PHP and Symfony library for browser testing and web scraping using real browsers via the WebDriver protocol.
A scriptable browser based on Firefox's Gecko engine, compatible with PhantomJS API for web automation and testing.
Official .NET library for cross-browser web automation and testing with Chromium, Firefox, and WebKit.
A TensorFlow-based CNN solution for recognizing character-based CAPTCHAs, providing training, validation, and API modules.
A robust Go library for parsing RSS, Atom, and JSON feeds with support for extensions and invalid feed handling.
A PHP library to control headless Chrome/Chromium instances for browser automation, screenshots, and PDF generation.
A Rust library for parsing HTML and querying elements using CSS selectors.
A simple and fast HTML and XML parser for PHP with CSS selector and XPath support.
A Python library and CLI tool that converts HTML into clean, readable Markdown-formatted plain text.
A simple HTML parser for Elixir that enables search for nodes using CSS selectors.
A PHP library to extract metadata, embed codes, and structured data from any web page using multiple protocols.
A high-level Ruby API for controlling Chrome/Chromium via the Chrome DevTools Protocol without Selenium dependencies.
A high-level Ruby API for controlling Chrome/Chromium browsers directly via the Chrome DevTools Protocol.
An async Python web scraping micro-framework built on asyncio and aiohttp for fast, extensible crawling.
Advanced Go HTTP client with browser impersonation, TLS fingerprinting, HTTP/3 support, and anti-bot bypass for web automation.
A batteries-included Ruby framework for easy web-scraping with built-in debug mode and rate limiting.
A tidyverse package for web scraping in R, inspired by Beautiful Soup and designed for data extraction workflows.
A lightweight Ruby web crawler and scraper with an elegant DSL for extracting structured data from web pages.
A configurable and extensible PHP web spider for crawling and scraping websites with support for breadth-first/depth-first traversal, caching, and custom filters.
A PHP bridge to Puppeteer that provides full API support for browser automation from PHP applications.
A fast, local-first web scraper and content extractor optimized for AI agents, with CLI, REST API, and MCP server.
A Docker container that provides a rotating proxy service using multiple Tor circuits for IP rotation.
A Swift headless browser framework for iOS/OSX to automate website navigation, data collection, and testing without a UI.
A pure Python HTML5 parser with spec-perfect parsing, built-in sanitization, CSS selectors, and zero dependencies.
A curated collection of Selenium resources including tools, drivers, containers, cloud services, and testing frameworks.
Write web scrapers in Ruby using a clean, AI-assisted DSL that caches selectors for fast, LLM-free extraction.
A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.
A Ruby gem for web scraping that extracts titles, meta tags, links, images, and structured data from URLs.
A Rust library for extracting structured data from HTML documents, designed for web scraping tasks.
A scalable, mature, and versatile web crawler built on Apache Storm for building low-latency, distributed crawling systems.
A bullet-proof, fast, and reliable headless browser API for Chrome automation and testing.
A high-performance web crawler and scraper built in Elixir with worker pooling and rate limiting.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.