Example notebooks for analyzing web archives using the Archives Unleashed Toolkit.
Various examples of notebooks for working with web archives with the Archives Unleashed Toolkit, and derivatives generated by the Archives Unleashed Toolkit.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
Various Jupyter notebooks about Common Crawl data
SQL-queryable index, with CDX info plus language classification. (Stable)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.