Showing 14 of 14 projects
A collection of utilities that help customize Windows and streamline everyday tasks.
A Python utility for converting PDFs, Office documents, images, audio, and more into structured Markdown for LLM consumption.
A command-line tool that adds an OCR text layer to scanned PDF files, making them searchable and copy-pasteable.
A pure-Python PDF library for splitting, merging, cropping, transforming, and extracting data from PDF files.
A line-oriented search tool that extends ripgrep to search inside PDFs, Office documents, archives, and many other file types.
A polyglot document intelligence framework with a Rust core for extracting text, metadata, and structured data from 91+ file formats.
A Python library for extracting and analyzing text, images, and metadata from PDF documents.
A Python library and CLI tool for web crawling, scraping, and extracting main text, metadata, and comments from web pages.
A Python library and CLI tool for automatic text summarization using extractive methods like LexRank, LSA, Luhn, and Edmundson.
A pure JavaScript OCR engine compiled from Ocrad via Emscripten for client-side text recognition in the browser.
A curated list of awesome open-source OCR software, libraries, datasets, and literature.
A Go package for Optical Character Recognition (OCR) using the Tesseract C++ library.
A macOS menu bar app that uses OCR to copy any text visible on your screen directly to your clipboard.
A Java JNA wrapper for Tesseract OCR API, enabling OCR functionality in Java applications.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.