A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.
Crawly is an application framework for crawling websites and extracting structured data, built with Elixir. It provides a robust, configurable system for building scalable web scrapers to handle data mining, information processing, and historical archival. The framework uses a spider-based architecture with middleware and pipelines for customization.
Elixir developers who need to build scalable, maintainable web scrapers for data extraction tasks, such as data engineers or backend developers working on data aggregation projects.
Developers choose Crawly for its high-level abstraction that balances power with ease of use, offering features like browser rendering for JavaScript-heavy sites, a management UI for monitoring, and standalone Docker deployment. Its extensible middleware and pipeline system allows fine-tuned control over crawling behavior.
Crawly, a high-level web crawling & scraping framework for Elixir.
Uses a familiar callback model similar to Scrapy, making it intuitive for Elixir developers to define crawls with URL generation and parsing logic, as shown in the quickstart example.
Configurable to fetch pages with JavaScript rendering via tools like Splash or Chrome, essential for scraping dynamic content from modern websites, as documented in the browser rendering guide.
Offers pluggable components for customizing request handling and item processing, demonstrated in config examples with DomainFilter, UniqueRequest, and WriteToFile pipelines.
Enables running spiders via Docker with YAML or module definitions, simplifying deployment without full Elixir project setup, as covered in the standalone documentation.
Provides a built-in web interface on localhost:4001 for starting/stopping spiders and viewing items, with options to disable or integrate as a plug in existing apps.
The default UI is minimalistic, and the more advanced Phoenix-based UI (CrawlyUI) is deprecated, limiting out-of-the-box monitoring and development features for complex workflows.
Enabling JavaScript rendering requires external services like Splash or Chrome, adding deployment and maintenance overhead beyond the core framework.
Tightly coupled to Elixir and BEAM, making it less suitable for teams not already using this stack or needing interoperability with other language ecosystems.
As a version 0.x project, frequent updates like those in 0.15.0 may introduce breaking changes, requiring ongoing maintenance for production deployments.
Yet Another HTTP client for Elixir powered by hackney
The flexible HTTP client library for Elixir, with support for middleware and multiple adapters.
MochiWeb is an Erlang library for building lightweight HTTP servers.
simple HTTP client in Erlang
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.