Showing 36 of 147 projects
A command-line tool to extract data from HTML/XML pages and JSON APIs using CSS, XPath, XQuery, JSONiq, and pattern matching.
A Swift library for extracting article previews including title, description, images, and metadata from web pages.
A versatile Ruby web spidering library for crawling sites, domains, or specific links with extensive filtering and callback support.
A Java API for controlling Chrome and Firefox browsers via DevTools and WebDriver-bidi protocols.
A Rust-based command-line tool for recursively downloading entire websites for offline browsing.
Type-safe Go bindings for the Chrome DevTools Protocol, enabling browser automation and debugging.
A Go package for querying HTML documents using XPath expressions with built-in caching for performance.
A high-performance, multithreaded command-line tool for downloading images from webpages.
A PHP library for extracting media information from web pages like YouTube videos, Twitter statuses, and blog articles.
A Go package for querying XML, HTML, and JSON documents using XPath expressions.
A script to find the fattest cat currently available for adoption at the San Francisco SPCA.
A Go web scraping framework that extracts structured data from websites using CSS selectors, including JavaScript-rendered pages.
A Clojure/ClojureScript library that parses HTML into Clojure data structures for analysis, transformation, and serialization.
A Python package for controlling Google Chrome/Chromium via the Chrome DevTools Protocol with a threading-based API.
A Python library for automating Tor Browser with Selenium WebDriver for privacy-focused web scraping and testing.
A Node.js library for interacting with Steam Community's website interfaces, including login, trading, and inventory management.
Ruby gem that fetches images and metadata from URLs to generate link previews, similar to social media previews.
A Ruby client library for browser automation and testing using Microsoft Playwright.
An unofficial Node.js API for programmatically accessing HLTV's Counter-Strike esports data, including matches, teams, players, and live scores.
A port of the Puppeteer browser automation library to run natively on Deno.
A Go HTTP client that spoofs TLS/JA3, HTTP/2, and HTTP/3 fingerprints to emulate real browsers by default.
An Android library that generates link previews by extracting titles, descriptions, and images from URLs.
A high-performance, Nokogiri-compatible HTML5 parser for Ruby with CSS selector and XPath support.
A functional HTML scraping and manipulation library for OCaml with CSS selector support.
A Go client library for remotely controlling Chrome/Chromium browsers via the Chrome DevTools Protocol.
A comprehensive cheat sheet and reference for web scraping in R using rvest, httr, and RSelenium.
A versatile Rust tool for generating and mutating wordlists using patterns, web scraping, and password formats.
A CLI and API tool that converts HTML into plain text, Markdown, or filtered HTML for terminal viewing.
A cross-platform library to load and decrypt cookies from any web browser, built with Rust for speed and safety.
A Node.js library to automatically scrape and extract readable article content from any web page, supporting both English and Chinese.
A fast, Unix-style command-line web crawler that extracts links, resources, and API endpoints from web pages.
An Elixir library for structured data extraction from websites, articles, and RSS/Atom feeds using information-retrieval techniques.
An Elixir library for parsing and extracting data from HTML and XML using CSS or XPath selectors.
An Elixir library for parsing and extracting data from HTML and XML using CSS or XPath selectors.
A Ruby client library for interacting with the Wikipedia API, providing easy access to articles, summaries, images, and metadata.
An Elixir library for extracting and curating the primary readable content from webpages.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.