Showing 3 of 3 projects
A Go tool and library for downloading URLs and files from Common Crawl and Wayback Machine web archives.
An Apache Spark framework for efficient data processing, extraction, and derivation from web archives and archival collections.
A collection of robust and fast Python tools for parsing, extracting, and analyzing web archive data, including a high-performance WARC parser.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.