A fast, Unix-style command-line web crawler that extracts links, resources, and API endpoints from web pages.
Crawley is a command-line web crawler written in Go that extracts links, resources, and API endpoints from web pages. It parses HTML, JavaScript, and CSS to discover URLs for images, videos, forms, and other embedded content, outputting unique results to stdout for easy processing with other Unix tools.
Developers, security researchers, and DevOps engineers who need a lightweight, scriptable tool for web crawling, link discovery, or API endpoint extraction in automated workflows.
Crawley stands out for its speed, simplicity, and adherence to the Unix philosophy, offering a composable command-line tool that integrates seamlessly into pipelines without the overhead of larger crawling frameworks.
The unix-way web crawler
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses Go's x/net/html for high-speed HTML parsing, enabling efficient link extraction without loading the entire DOM into memory, as highlighted in the README's focus on performance.
Incorporates lexical parsers to extract URLs from JavaScript code and CSS url() properties, useful for discovering API endpoints and embedded resources, a key feature mentioned in the description.
Outputs unique results to stdout for easy piping into tools like grep or wget, aligning with its design for composable command-line workflows, as emphasized in the philosophy section.
Supports depth limits, robots.txt policies, subdomain crawling, and tag filtering via flags, allowing fine-grained control over the crawling process, as detailed in the usage examples.
Relies on lexical analysis rather than executing JavaScript, so it may miss URLs generated dynamically at runtime in complex web applications, limiting effectiveness on modern sites.
As a Unix-style tool, it outputs only to stdout, requiring additional tools for persistent storage, data transformation, or advanced analytics, which adds complexity to workflows.
Primarily designed for CLI with no native API or library for direct integration into other programming languages or applications, reducing flexibility for embedded or GUI-based projects.