A web application for searching, browsing, and analyzing archived web content (ARC/WARC files) with a Solr backend.
SolrWayback is an open-source web application that functions as a search interface and playback system for archived web content stored in ARC/WARC files. It allows institutions and researchers to index, search, visualize, and export historical web data using a Solr backend, solving the problem of accessing and analyzing large-scale web archives.
Digital archivists, librarians, researchers, and institutions managing web archives who need a self-hosted, searchable interface for their ARC/WARC collections.
Developers choose SolrWayback for its comprehensive feature set—including full-text search, advanced visualizations, and flexible exports—coupled with the ability to self-host and scale for large collections, offering an open-source alternative to proprietary web archive access tools.
A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Enables full-text search across HTML, PDFs, images, and metadata, plus reverse image and geo-location search using EXIF data, as highlighted in the features list.
Provides interactive tools like link graphs, word clouds, n-gram timelines, and domain statistics for in-depth archive exploration, supported by multiple visualization screenshots.
Allows streaming WARC downloads, ZIP exports of native formats, and CSV with custom fields, making data extraction versatile for preservation and analysis.
Supports Memento API and configurable playback engines like OpenWayback or pywb, ensuring interoperability with existing web archive ecosystems.
Requires installing and configuring Java, Solr, Tomcat, and property files, with platform-specific steps that can be daunting for newcomers, as detailed in the installation guide.
Indexing large collections is slow—taking weeks for 20,000 WARC files—and demands high RAM, SSDs for performance, with scaling limits that necessitate Solr Cloud for bigger archives.
Indexed files may not appear in search for up to 5 minutes, and live updates aren't supported, requiring manual commits or re-indexing for new content.
SolrWayback is an open-source alternative to the following products: