Heritrix is An open-source, extensible, web-scale, archival-quality web crawler developed by the Internet Archive.. There is currently 1 open-source alternative to Heritrix, with a combined total of 174 GitHub stars. The most common language among these projects is JavaScript.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.