MapReduce tools for bulk indexing of web archive WARC/ARC files into ZipNum sharded CDX clusters on Hadoop, EMR, or local systems.
Tools for bulk indexing of WARC/ARC files on Hadoop, EMR or local file system.
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more than 600,000 wikis.
WarcDB: Web crawl data as SQLite databases.
Extract web archive data using Wayback Machine and Common Crawl
A Memento Aggregator CLI and Server in Go
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.