Write web scrapers in Ruby using a clean, AI-assisted DSL that caches selectors for fast, LLM-free extraction.
Kimurai is a Ruby web scraping framework that uses AI to automatically generate and cache selectors for data extraction. It combines traditional scraping capabilities with LLM-powered intelligence, allowing developers to describe what data they want rather than writing complex XPath/CSS selectors manually. The framework supports multiple browsers and provides a clean DSL for building robust, maintainable scrapers.
Ruby developers who need to build web scrapers for data collection, particularly those working with JavaScript-rendered websites or seeking to reduce selector maintenance overhead. It's ideal for data engineers, researchers, and developers building data pipelines.
Kimurai uniquely combines AI-powered selector generation with traditional scraping tools, offering the intelligence of LLMs without the per-request costs. Its caching mechanism means you get AI accuracy during development but pure Ruby performance in production, making it both powerful and cost-effective.
Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes with pure Ruby. Get the intelligence of an LLM without the per-request latency or token costs.
Automatically generates and caches XPath/CSS selectors using LLMs based on your data schema, eliminating manual selector writing and maintenance as shown in the extract method examples.
Supports headless Chrome, Firefox, and Mechanize engines, allowing adaptation to both JavaScript-heavy and static websites without code changes.
Integrates Capybara for full browser control, enabling complex interactions like form submissions, clicks, and scrolling for dynamic content.
Includes thread-safe parallel crawling with the in_parallel method for high-performance data extraction from multiple pages simultaneously.
Requires Ruby >=3.2.0, specific browser installations, and system dependencies like Selenium, making onboarding more involved than lightweight scrapers.
Initial AI extraction depends on external LLM APIs (e.g., OpenAI, Gemini) with token costs and key management, adding complexity and potential expenses.
As a Ruby framework, it may not integrate well with projects in other languages, and the scraping ecosystem is smaller compared to Python alternatives like Scrapy.
Mechanize is a ruby library that makes automated web interaction easy.
A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)
Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.
Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.