Extracts hyperlinks from files using Apache Tika for batch processing and web archiving workflows.
Tika based link (URL) extractor for httpreserve
Tools for downloading and preserving wikis. We archive wikis, from Wikipedia to tiniest wikis. As of 2026, WikiTeam has preserved more than 600,000 wikis.
WarcDB: Web crawl data as SQLite databases.
Extract web archive data using Wayback Machine and Common Crawl
A Memento Aggregator CLI and Server in Go
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.