Showing 8 of 8 projects
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
An open-source, extensible, web-scale, archival-quality web crawler from the Internet Archive.
A privacy-focused web archiving tool with an IM-style interface that captures pages to multiple archival services.
A Python package and CLI tool for interacting with the Wayback Machine's Save, CDX, and Availability APIs.
Legacy web archive replay engine for accessing historical web content from WARC files.
Archive mirror of the users section from the historical rootkit.com security research website.
Python command-line tools and libraries for handling, validating, and converting WARC and ARC web archive files.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.