Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Elixir
  3. Crawly

Crawly

Apache-2.0Elixir0.15.0

A high-level web crawling and scraping framework for Elixir, designed for data extraction and processing.

Visit WebsiteGitHubGitHub
1.1k stars121 forks0 contributors

What is Crawly?

Crawly is an application framework for crawling websites and extracting structured data, built with Elixir. It provides a robust, configurable system for building scalable web scrapers to handle data mining, information processing, and historical archival. The framework uses a spider-based architecture with middleware and pipelines for customization.

Target Audience

Elixir developers who need to build scalable, maintainable web scrapers for data extraction tasks, such as data engineers or backend developers working on data aggregation projects.

Value Proposition

Developers choose Crawly for its high-level abstraction that balances power with ease of use, offering features like browser rendering for JavaScript-heavy sites, a management UI for monitoring, and standalone Docker deployment. Its extensible middleware and pipeline system allows fine-tuned control over crawling behavior.

Overview

Crawly, a high-level web crawling & scraping framework for Elixir.

Use Cases

Best For

  • Building scalable web scrapers in Elixir for data mining and information processing.
  • Extracting structured data from dynamic websites that require JavaScript rendering.
  • Creating standalone crawling applications deployable via Docker with spiders defined in YAML or modules.
  • Monitoring and managing web scraping jobs through a built-in web interface for starting, stopping, and viewing items.
  • Implementing concurrent, rate-limited crawls with configurable request handling per domain.
  • Rapid spider development using mix tasks for code generation and configuration templates.

Not Ideal For

  • Projects requiring integration with non-Elixir ecosystems or languages for downstream data processing pipelines
  • Quick, one-off scraping tasks where setting up an Elixir project and framework overhead is unjustified
  • Teams needing visual, point-and-click scraping tools for non-developers to configure crawls without code
  • High-scale distributed scraping across multiple geolocations without built-in proxy rotation or advanced anti-bot bypass features

Pros & Cons

Pros

Spider-based Architecture

Uses a familiar callback model similar to Scrapy, making it intuitive for Elixir developers to define crawls with URL generation and parsing logic, as shown in the quickstart example.

Browser Rendering Support

Configurable to fetch pages with JavaScript rendering via tools like Splash or Chrome, essential for scraping dynamic content from modern websites, as documented in the browser rendering guide.

Extensible Middleware and Pipelines

Offers pluggable components for customizing request handling and item processing, demonstrated in config examples with DomainFilter, UniqueRequest, and WriteToFile pipelines.

Standalone Docker Deployment

Enables running spiders via Docker with YAML or module definitions, simplifying deployment without full Elixir project setup, as covered in the standalone documentation.

Simple Management UI

Provides a built-in web interface on localhost:4001 for starting/stopping spiders and viewing items, with options to disable or integrate as a plug in existing apps.

Cons

Basic Management UI

The default UI is minimalistic, and the more advanced Phoenix-based UI (CrawlyUI) is deprecated, limiting out-of-the-box monitoring and development features for complex workflows.

Complex Browser Rendering Setup

Enabling JavaScript rendering requires external services like Splash or Chrome, adding deployment and maintenance overhead beyond the core framework.

Elixir Ecosystem Dependency

Tightly coupled to Elixir and BEAM, making it less suitable for teams not already using this stack or needing interoperability with other language ecosystems.

Evolving API with Breaking Changes

As a version 0.x project, frequent updates like those in 0.15.0 may introduce breaking changes, requiring ongoing maintenance for production deployments.

Frequently Asked Questions

Quick Stats

Stars1,082
Forks121
Contributors0
Open Issues8
Last commit9 months ago
CreatedSince 2019

Tags

#elixir#web-crawling#spider#crawler#scraper#crawling#docker#erlang#extract-data#web-scraping#data-extraction#scraping#middleware#data-mining

Built With

E
Elixir
D
Docker

Links & Resources

Website

Included in

Elixir13.1k
Auto-fetched 7 hours ago

Related Projects

httpoisonhttpoison

Yet Another HTTP client for Elixir powered by hackney

Stars2,313
Forks340
Last commit1 month ago
teslatesla

The flexible HTTP client library for Elixir, with support for middleware and multiple adapters.

Stars2,071
Forks363
Last commit2 days ago
mochiwebmochiweb

MochiWeb is an Erlang library for building lightweight HTTP servers.

Stars1,888
Forks464
Last commit7 months ago
hackneyhackney

simple HTTP client in Erlang

Stars1,405
Forks443
Last commit8 days ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub