A Ruby gem for extracting pages from PDFs as images and text strings using Ghostscript, ImageMagick, and pdftotext.
Grim is a Ruby gem that extracts pages from PDF files, converting them to images and extracting text content. It serves as a wrapper for Ghostscript, ImageMagick, and pdftotext, providing a simple API for developers to programmatically access PDF data without dealing with complex command-line tools directly.
Ruby developers who need to programmatically convert PDF pages to images or extract text from PDFs within their applications, such as those building document processing pipelines or content management systems.
Developers choose Grim for its clean Ruby API that abstracts the complexity of PDF processing tools, offering flexibility with configurable processors and image output options while maintaining ease of use.
Tool for extracting pages from pdf as images and text as strings.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides a straightforward interface like `Grim.reap("/path/to/pdf")` and `pdf[3].save('image.png')`, abstracting the complexity of Ghostscript and ImageMagick command-line tools.
Supports multiple ImageMagick and Ghostscript versions via Grim::MultiProcessor, allowing fallback options for different environments, as demonstrated in the README with configurable paths.
Offers adjustable width, density, quality, colorspace, and alpha settings in the save method, with defaults like width 1024 and density 300, enabling fine-tuned image generation.
Includes logging capabilities to debug command execution, with examples showing how to set a custom logger like Logger.new($stdout) for tracing processor activity.
Requires separate installation of Ghostscript, ImageMagick, and xpdf, adding setup complexity and potential environment-specific issues, as noted in the prerequisites section.
Focuses solely on extraction to images and text, lacking support for PDF editing, form handling, or other common operations, which restricts its use to basic tasks.
Relies on spawning external processes for each operation, which can be slow and memory-intensive for large PDFs or in high-concurrency scenarios, with no built-in optimizations.
The README references blog posts from 2011 and logging examples from 2016, suggesting limited recent updates or maintenance, which could lead to compatibility issues with newer tool versions.