A lightweight Ruby web crawler and scraper with an elegant DSL for extracting structured data from web pages.
Wombat is a lightweight Ruby library for web scraping and crawling that extracts structured data from web pages using an elegant domain-specific language. It solves the problem of programmatically gathering information from websites by providing a simple, readable syntax to define scraping rules and return data as organized Ruby hashes.
Ruby developers who need to scrape websites for data extraction, automation, or integration tasks, particularly those looking for a clean DSL alternative to more complex scraping frameworks.
Developers choose Wombat for its minimalistic and expressive DSL that simplifies web scraping, reducing boilerplate code and making it easy to parse HTML with CSS or XPath selectors while maintaining flexibility for data transformation.
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses a clean, readable syntax to define scraping rules, reducing boilerplate code as demonstrated in the example crawl block with inline blocks for transformation.
Allows element selection with both CSS and XPath, providing flexibility based on page structure, as shown in the example using both selectors.
Returns scraped data as organized Ruby hashes, making it easy to process and integrate, evidenced by the hash output in the README example.
Supports custom formatting through blocks, enabling on-the-fly data cleaning like the gsub example in the links section.
Cannot scrape content rendered by JavaScript, limiting use for modern websites without additional tools like headless browsers, a common need unaddressed in the README.
Key documentation is split between a sparse README and external Wiki, requiring extra steps for onboarding, as noted with links to separate resources.
Lacks built-in support for pagination, session management, or error handling, forcing manual implementation for common scraping challenges.