A line-oriented search tool that extends ripgrep to search inside PDFs, Office documents, archives, and many other file types.
rga (ripgrep-all) is a Rust-based command-line search tool that extends ripgrep to search for regex patterns inside binary files, archives, and documents. It solves the problem of searching through heterogeneous file collections—like PDFs, Office documents, SQLite databases, and compressed archives—without manually extracting them first.
Developers, system administrators, and data analysts who need to search through mixed-format file repositories, logs, or document stores from the command line.
It combines ripgrep's speed with broad format support, recursive archive handling, and optional caching, offering a unified search experience across dozens of file types without sacrificing performance.
rga: ripgrep, but also search in PDFs, E-Books, Office documents, zip, tar.gz, etc.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses adapters for PDFs, Office documents, SQLite, archives, and more, enabling regex searches across dozens of file types without manual extraction, as listed in the README's adapter section.
Recursively descends into nested archives like ZIP within TAR.GZ up to a configurable depth, processing them as streams to handle large files efficiently, demonstrated in the example directory.
Caches extracted text in a local database by default, speeding up repeated searches on the same files, with configurable compression and size limits as noted in the options.
Supports user-defined adapters to extend search capabilities to new formats, allowing flexibility beyond built-in options, as mentioned in the wiki and adapter list.
Requires external tools like poppler, pandoc, and ffmpeg for full functionality; installation is cumbersome without package managers, and missing dependencies can cause failures, as warned in the installation notes.
Text extraction from binary formats adds significant delay compared to ripgrep on plain text, making searches slower for initial runs or without caching, a trade-off admitted in the caching rationale.
Cache can grow large and may require manual cleanup in default locations (e.g., ~/.cache/ripgrep-all), and disabling cache reduces performance, adding operational complexity.