Scraping

20 projects

Showing 20 of 20 projects

scrapyPython

Scrapy, a fast high-level web crawling & scraping framework for Python.

#hacktoberfest#crawler#web-scraping-python

Stars63.4k

Forks11.8k

Last commit19 hours ago

collyGo

A fast and elegant scraping and crawling framework for Go, designed for extracting structured data from websites.

#spider#crawler#scraper

Stars25.4k

Forks1.9k

Last commit1 month ago

webmagicJava

A scalable Java framework for building web crawlers, covering downloading, URL management, content extraction, and persistence.

#distributed-systems#crawler#html-parsing

Stars11.7k

Forks4.1k

Last commit7 months ago

MechanizeRuby

A Ruby library for automating web interaction, handling cookies, redirects, forms, and navigation.

#form-submission#cookies#ruby-gem

Stars4.4k

Forks477

Last commit2 months ago

Symfony PantherPHP

A PHP and Symfony library for browser testing and web scraping using real browsers via the WebDriver protocol.

#hacktoberfest#selenium-webdriver#chromedriver

Stars3.1k

Forks232

Last commit1 month ago

EmbedPHP

A PHP library to extract metadata, embed codes, and structured data from any web page using multiple protocols.

#embeds#social-media#metadata-extraction

Stars2.1k

Forks324

Last commit15 days ago

WebsurfxRust

A modern, fast, privacy-respecting meta search engine written in Rust, offering a secure and ad-free search experience.

#search-aggregator#meta-search-engine#actix-web

Stars1.2k

Forks127

Last commit22 days ago

CrawlyElixir

A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.

#scraping-websites#elixir#web-crawling

Stars1.1k

Forks122

Last commit1 year ago

ramaRust

A modular Rust service framework for building programmable network proxies, clients, and servers with fine-grained control over packet flow.

#service-architecture#http-server#proxy

A Python tool to automatically archive web content (videos, images, social media) from Google Sheets and other sources in a secure, verifiable way.

#service#image-archiving#python

A high-performance, multithreaded command-line tool for downloading images from webpages.

#pypi#commandline-tool#terminal

Stars776

Forks104

Last commit8 years ago

dataflowkitGo

A Go web scraping framework that extracts structured data from websites using CSS selectors, including JavaScript-rendered pages.

#chrome-fetcher#scraping-websites#javascript-rendering

Stars715

Forks83

Last commit3 years ago

Lambda SoupOCaml

A functional HTML scraping and manipulation library for OCaml with CSS selector support.

#ocaml-library#functional-programming#css-selectors

Stars409

Forks35

Last commit1 year ago

scrapeElixir

An Elixir library for structured data extraction from websites, articles, and RSS/Atom feeds using information-retrieval techniques.

#readability#elixir#information-retrieval

Stars337

Forks41

Last commit6 years ago

antchGo

A fast, powerful, and extensible web crawling and scraping framework for Go, inspired by Scrapy.

#web-crawling#concurrent#crawler

Stars266

Forks40

Last commit6 years ago

dyerRust

A reliable, flexible, and fast Rust framework for web crawling and request-response services.

#event-driven#web-crawling#spider

Stars126

Forks7

Last commit11 months ago

Startup JobElixir

A sample project to search startup jobs scraped from various websites, built with Elixir/Phoenix backend and React/Redux frontend.

#elixir#startup-jobs#phoenix

Stars103

Forks16

Last commit9 years ago

npm-userJavaScript

Fetch npm user profile information by scraping the npm website since no official API exists.

#developer-tools#npm#nodejs

Stars58

Forks14

Last commit2 years ago

golyricsGo

A Go package for fetching song lyrics from the Wikia (Lyrics.wikia.com) website.

#music#api#golang-package

Stars40

Forks2

Last commit8 years ago

node-kebabJavaScript

A Node.js API wrapper for accessing kebab-frites.info data programmatically.

#nodejs#food-data#javascript-library

Stars1

Forks0

Last commit7 years ago

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub

Scraping

Related Tags

Found a gem we're missing?

Scraping

Related Tags

Found a gem we're missing?