An Elixir library for structured data extraction from websites, articles, and RSS/Atom feeds using information-retrieval techniques.
Scrape is an Elixir library that provides structured data extraction from websites, articles, and RSS/Atom feeds. It solves the problem of programmatically accessing and parsing web content by offering simple functions to retrieve clean, organized data from various web resources.
Elixir developers who need to integrate web scraping, feed parsing, or content aggregation into their applications, such as those building news aggregators, data pipelines, or research tools.
Developers choose Scrape for its straightforward API focused on common web resources, its use of information-retrieval techniques for accurate extraction, and its permissive LGPLv3 license that encourages community contributions and commercial use.
Scrape any website, article or RSS/Atom Feed with ease!
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides straightforward functions like Scrape.domain! and Scrape.article! for common web resources, as demonstrated in the usage section of the README.
Specifically handles RSS and Atom feeds with Scrape.feed!, making content aggregation easier for news or monitoring tools.
LGPLv3 allows commercial use and encourages community contributions for bugfixes and improvements, as stated in the license section.
Relies on an outdated version of httpoison due to a dependency, requiring manual overrides in your app, which complicates setup and updates.
Version 3.X is a complete rewrite with possible new bugs and breaking API changes, as noted in the known issues section of the README.
Focuses on domain, article, and feed scraping; may not handle complex, custom, or non-standard web scraping scenarios effectively.