Showing 3 of 3 projects
A customizable Scala crawler for creating personal web archives in WARC/CDX format.
A Scala/Spark library for efficient processing, extraction, and derivation of web archive data (CDX/WARC).
A framework for profiling web archives to summarize their holdings using compact SURT-based maps.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.