Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Ruby
  3. Kimurai

Kimurai

MITRuby

Write web scrapers in Ruby using a clean, AI-assisted DSL that caches selectors for fast, LLM-free extraction.

Visit WebsiteGitHubGitHub
1.1k stars161 forks0 contributors

What is Kimurai?

Kimurai is a Ruby web scraping framework that uses AI to automatically generate and cache selectors for data extraction. It combines traditional scraping capabilities with LLM-powered intelligence, allowing developers to describe what data they want rather than writing complex XPath/CSS selectors manually. The framework supports multiple browsers and provides a clean DSL for building robust, maintainable scrapers.

Target Audience

Ruby developers who need to build web scrapers for data collection, particularly those working with JavaScript-rendered websites or seeking to reduce selector maintenance overhead. It's ideal for data engineers, researchers, and developers building data pipelines.

Value Proposition

Kimurai uniquely combines AI-powered selector generation with traditional scraping tools, offering the intelligence of LLMs without the per-request costs. Its caching mechanism means you get AI accuracy during development but pure Ruby performance in production, making it both powerful and cost-effective.

Overview

Write web scrapers in Ruby using a clean, AI-assisted DSL. Kimurai uses AI to figure out where the data lives, then caches the selectors and scrapes with pure Ruby. Get the intelligence of an LLM without the per-request latency or token costs.

Use Cases

Best For

  • Scraping JavaScript-rendered websites with dynamic content
  • Building data collection pipelines without writing complex selectors
  • Extracting structured data from websites with frequently changing layouts
  • Parallel processing of large numbers of web pages
  • Interactive scraping that requires form submission and click simulation
  • Projects needing both traditional and AI-assisted extraction approaches

Not Ideal For

  • Projects built in Python or other non-Ruby ecosystems where scraping libraries like Scrapy are already integrated
  • Teams needing quick, minimal-configuration scraping without browser dependencies or AI setup
  • High-frequency scraping tasks where website layouts change daily, making cached AI selectors obsolete quickly

Pros & Cons

Pros

AI-Powered Selector Generation

Automatically generates and caches XPath/CSS selectors using LLMs based on your data schema, eliminating manual selector writing and maintenance as shown in the extract method examples.

Multi-Engine Flexibility

Supports headless Chrome, Firefox, and Mechanize engines, allowing adaptation to both JavaScript-heavy and static websites without code changes.

Capybara-Based Interactions

Integrates Capybara for full browser control, enabling complex interactions like form submissions, clicks, and scrolling for dynamic content.

Built-in Parallel Processing

Includes thread-safe parallel crawling with the in_parallel method for high-performance data extraction from multiple pages simultaneously.

Cons

Complex Initial Setup

Requires Ruby >=3.2.0, specific browser installations, and system dependencies like Selenium, making onboarding more involved than lightweight scrapers.

AI Configuration Overhead

Initial AI extraction depends on external LLM APIs (e.g., OpenAI, Gemini) with token costs and key management, adding complexity and potential expenses.

Ruby-Centric Limitation

As a Ruby framework, it may not integrate well with projects in other languages, and the scraping ecosystem is smaller compared to Python alternatives like Scrapy.

Frequently Asked Questions

Quick Stats

Stars1,099
Forks161
Contributors0
Open Issues11
Last commit3 months ago
CreatedSince 2018

Tags

#headless-chrome#crawler#llm-integration#selenium#scraper#headless-browser#web-scraping#ruby#data-extraction#automation

Built With

R
Ruby
N
Nokogiri
S
Selenium

Links & Resources

Website

Included in

Ruby14.1k
Auto-fetched 1 day ago

Related Projects

MechanizeMechanize

Mechanize is a ruby library that makes automated web interaction easy.

Stars4,442
Forks478
Last commit2 months ago
UptonUpton

A batteries-included framework for easy web-scraping. Just add CSS! (Or do more.)

Stars1,598
Forks109
Last commit7 years ago
WombatWombat

Lightweight Ruby web crawler/scraper with an elegant DSL which extracts structured data from pages.

Stars1,360
Forks128
Last commit25 days ago
MetaInspectorMetaInspector

Ruby gem for web scraping purposes. It scrapes a given URL, and returns you its title, meta description, meta keywords, links, images...

Stars1,046
Forks165
Last commit23 days ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub