A high-performance web crawler and scraper built in Elixir with worker pooling and rate limiting.
Crawler is an Elixir library for building high-performance web crawlers and scrapers. It efficiently extracts data from websites with configurable concurrency, rate limiting, and extensible parsing. The library solves the problem of programmatically gathering web content at scale while respecting site policies.
Elixir developers building data pipelines, research tools, or monitoring systems that require automated web content extraction. It's ideal for those needing fine-grained control over crawling behavior.
Developers choose Crawler for its Elixir-native design leveraging BEAM concurrency, modular architecture allowing custom components, and built-in features like rate limiting and asset crawling without external dependencies.
A high performance web crawler / scraper in Elixir.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages OPQ for worker pooling and Elixir's BEAM VM, enabling efficient parallel crawling with configurable workers, as highlighted in the architecture and features.
Offers extensive options like max depths, rate limits via :interval, and retry strategies, allowing precise tuning to respect site policies and avoid bans.
Supports custom implementations for scrapers, parsers, and other components through defined behaviours, making it adaptable to specific use cases beyond default functionality.
Can crawl JavaScript, CSS, and image files alongside HTML, useful for full-site analysis or archiving, as listed in the key features.
While it fetches JS assets, it doesn't execute JavaScript to render dynamic content, requiring custom parsers or additional libraries for modern websites.
With numerous options and the need for custom modules in scenarios like pausing (requiring large timeouts), setup can be overwhelming for straightforward tasks.
Tied to Elixir and its tooling, which may not integrate easily into non-Elixir projects, limiting its appeal outside that community.