Showing 3 of 3 projects
A Node.js library for parsing and creating Web ARChive (WARC) files with support for Chrome, Puppeteer, and Electron.
A collection of Jupyter notebooks for analyzing Common Crawl web archive data using columnar indexes and webgraph datasets.
A dockerized, queued web archiver using Chrome headless to create high-fidelity WARC files from URLs.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.