A host or domain-level graph of the web, with ranking information. (Stable)
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
An Apache Spark framework for easy data processing, extraction as well as derivation for web archives and archival collections, developed at Internet Archive.
The Archives Unleashed Toolkit is an open-source toolkit for analyzing web archives.
SQL-queryable index, with CDX info plus language classification. (Stable)
Structured data extracted from Common Crawl. (Stable)