Web Scraping

190 projects

Showing 36 of 190 projects

browser-usePython

An open-source Python library and cloud service that enables AI agents to automate web browsing and task completion.

#playwright#task-automation#web-interaction

A JavaScript library providing a high-level API to control Chrome or Firefox browsers for automation and testing.

#developer-tools#chrome#screenshot

A JavaScript library providing a high-level API to control Chrome or Firefox for browser automation, testing, and web scraping.

#developer-tools#chrome#headless-chrome

A framework for web testing and automation that drives Chromium, Firefox, and WebKit with a single API.

#playwright#chrome#test-runner

Stars93.3k

Forks6.1k

Last commit19 hours ago

crawl4aiPython

An open-source web crawler and scraper that converts web content into clean, LLM-ready Markdown for RAG, agents, and data pipelines.

#playwright#ai-agents#markdown-generation

Stars74.7k

Forks7.7k

Last commit2 days ago

scrapyPython

Scrapy, a fast high-level web crawling & scraping framework for Python.

#hacktoberfest#crawler#web-scraping-python

A system for building agents that perform automated tasks online, like a self-hosted IFTTT or Zapier.

#feedgenerator#zapier-alternative#ifttt-alternative

Stars49.7k

Forks4.3k

Last commit21 hours ago

HuginnRuby

A system for building agents that monitor the web and automate tasks, giving you full control over your data.

#event-driven#feedgenerator#ifttt-alternative

Stars49.7k

Forks4.3k

Last commit21 hours ago

MaigretPython

Collects a dossier on a person by checking for accounts on 3000+ websites using only a username.

#digital-footprint#social-media-analysis#namechecker

Stars35.7k

Forks2.7k

Last commit1 day ago

playwright-mcpTypeScript

A Model Context Protocol server that enables LLMs to automate web browsers using Playwright's accessibility tree.

#playwright#ai-agents#accessibility

Stars35.5k

Forks3.0k

Last commit22 hours ago

collyGo

A fast and elegant scraping and crawling framework for Go, designed for extracting structured data from websites.

#spider#crawler#scraper

Stars25.4k

Forks1.9k

Last commit1 month ago

jsdomJavaScript

A pure-JavaScript implementation of web standards like DOM and HTML for Node.js, enabling browser-like environments for testing and scraping.

#dom-apis#jsdom#dom

Stars21.6k

Forks1.8k

Last commit4 days ago

FreshRSSPHP

A free, self-hostable RSS feed aggregator that is lightweight, customizable, and supports multi-user access with instant push notifications.

#open-source#feed-reader#api

CLI tool and library for saving complete web pages as a single, self-contained HTML file.

#make-the-internet-great-again#e-hoarding#cli-tool

Stars15.4k

Forks468

Last commit2 months ago

GoQueryGo

A jQuery-like HTML manipulation and traversal library for Go, built on net/html and cascadia CSS selectors.

#jquery#css-selectors#net-html

Stars15.0k

Forks936

Last commit7 days ago

playwright-pythonPython

Python library to automate Chromium, Firefox, and WebKit browsers with a single API for testing and automation.

#playwright#python-testing#chromium

A Node.js library for automating Chrome locally or headless on AWS Lambda with a simple API.

#screenshot-automation#chrome#integration-testing

Stars13.2k

Forks569

Last commit7 years ago

chromedpGo

A Go library for driving browsers via the Chrome DevTools Protocol without external dependencies.

#chrome#unit-testing#cdp

Stars13.2k

Forks882

Last commit10 days ago

chromedpGo

A Go library for driving browsers via the Chrome DevTools Protocol without external dependencies.

#chrome#unit-testing#headless-chrome

Stars13.2k

Forks882

Last commit10 days ago

PatternPython

A Python web mining module with tools for scraping, NLP, machine learning, network analysis, and visualization.

#text-analysis#natural-language-processing#python

Stars8.9k

Forks1.6k

Last commit2 years ago

HeliumPython

A high-level Python wrapper for Selenium that simplifies web automation with a more intuitive API.

#selenium-python#chrome#helium

Stars8.3k

Forks513

Last commit17 days ago

casperjsJavaScript

A navigation scripting and testing utility for PhantomJS and SlimerJS, easing web automation and functional testing.

#javascript-testing#slimerjs#headless-browsers

Stars7.2k

Forks963

Last commit6 years ago

RodGo

A high-level Go driver for Chrome DevTools Protocol, designed for web automation and scraping.

#cdp#devtools-protocol#concurrent-safe

Stars7.0k

Forks476

Last commit9 days ago

Node-CrawlerTypeScript

A Node.js web crawler with server-side jQuery, rate limiting, and proxy support for efficient scraping.

#proxy-support#jquery#spider

Stars6.8k

Forks866

Last commit1 month ago

trafilaturaPython

A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.

#text-extraction#readability#article-extractor

Stars6.3k

Forks397

Last commit6 days ago

AngleSharpC#

A .NET library for parsing HTML5, SVG, MathML, and CSS with a standards-compliant DOM.

#dom-manipulation#svg-parser#angle-bracket

Stars5.5k

Forks593

Last commit2 days ago

gumbo-parserHTML

A pure-C HTML5 parsing library implementing the HTML5 parsing algorithm.

#c-library#html5#portable

Stars5.2k

Forks665

Last commit6 months ago

SwiftSoupSwift

A pure Swift HTML parser with DOM, CSS, and jQuery-like methods for parsing, manipulating, and cleaning HTML across Apple platforms and Linux.

#dom-manipulation#parse#css-selectors

A simple yet powerful Go HTTP client with automatic decoding, debugging, retry, and HTTP fingerprinting support.

#retry-logic#http#http-fingerprinting

Stars4.8k

Forks406

Last commit8 days ago

Crawler4jJava

An open-source Java web crawler that provides a simple interface for multi-threaded web crawling.

#java-library#open-source#crawling-framework

Stars4.6k

Forks1.9k

Last commit4 years ago

MechanizeRuby

A Ruby library for automating web interaction, handling cookies, redirects, forms, and navigation.

#form-submission#cookies#ruby-gem

Stars4.4k

Forks477

Last commit2 months ago

myGPTReaderPython

A Slack bot that reads and summarizes webpages, documents, and videos using ChatGPT, with voice chat capabilities.

#ai#content-summarization#embedding

Stars4.4k

Forks441

Last commit5 months ago

DotnetSpiderC#

A lightweight, efficient, and fast high-level web crawling and scraping framework for .NET.

#web-crawling#distributed#redis

Stars4.1k

Forks1.1k

Last commit3 months ago

TyphoeusRuby

A Ruby HTTP client library that wraps libcurl to make fast and reliable requests with parallel execution support.

#caching#parallel-requests#testing

Stars4.1k

Forks441

Last commit4 months ago

pyppeteerPython

Unofficial Python port of Puppeteer for headless Chrome/Chromium browser automation.

#puppeteer#headless-chrome#async

Stars3.9k

Forks346

Last commit2 years ago

Puppeteer SharpC#

A .NET port of the official Node.js Puppeteer API for headless browser automation.

#aot-compilation#chrome#puppeteer

Stars3.9k

Forks486

Last commit2 days ago

Page 1 of 6

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub