A Ruby wrapper around the pHash library for detecting duplicate and near-duplicate images using perceptual hashing.
Phashion is a Ruby wrapper around the pHash library that enables duplicate and near-duplicate detection for images using perceptual hashing. It generates 64-bit hash values from image frequency data and compares them via Hamming distance to identify files that show the same content despite variations in format, size, or compression.
Ruby developers working with image libraries, media management systems, or applications needing automated duplicate image detection, such as content management systems or digital asset platforms.
Developers choose Phashion for its straightforward Ruby API that abstracts the complexities of the pHash library, offering reliable duplicate detection with configurable thresholds and support for near-duplicates like cropped or rotated images.
Ruby wrapper around pHash, the perceptual hash library for detecting duplicate multimedia files
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Provides an intuitive interface with methods like `Image.new` and `duplicate?`, abstracting the complexities of the underlying pHash C library for easy integration into Ruby apps.
Allows setting custom Hamming distance thresholds in the `duplicate?` method, enabling fine-tuned control over sensitivity to balance false positives and misses.
Effectively identifies images with variations like cropping, rotation, or format changes, as demonstrated in the README's Hamming distance table for transformations such as thumbnails or color correction.
Uses pHash's 64-bit frequency-based hashes to measure similarity that is resilient to compression artifacts, dimension changes, and color adjustments, ensuring accurate duplicate detection.
Requires multiple system dependencies like libjpeg-dev and imagemagick, and the gem build can fail on untested platforms, as noted in the Compatibility section with specific OS version limitations.
Only wraps image functionality from pHash, excluding audio and video comparison, which restricts its use for full multimedia deduplication despite the library's broader scope.
Relies on a custom fork of pHash 0.9.6 from the 2010s to handle alpha PNGs, which may lack updates and fixes from the upstream project, though plans to migrate are mentioned.