WARC deduplication tool (and WARC library) written in Rust. (In Development)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more than 600,000 wikis.
WarcDB: Web crawl data as SQLite databases.
Extract web archive data using Wayback Machine and Common Crawl
Warchaeology is a collection of tools for inspecting, manipulating, deduplicating and validating WARC-files. (Stable)