Showing 34 of 34 projects
A collection of example skills for Claude that demonstrate how to create reusable instruction sets for specialized AI tasks.
An opinionated RAG framework for integrating generative AI into applications, supporting any LLM, vector store, and file type.
A command-line tool that adds an OCR text layer to scanned PDF files, making them searchable and copy-pasteable.
A pure-Python PDF library for splitting, merging, cropping, transforming, and extracting data from PDF files.
A pure-C HTML5 parsing library implementing the HTML5 parsing algorithm.
A Slack bot that reads and summarizes webpages, documents, and videos using ChatGPT, with voice chat capabilities.
A Python library and CLI tool for automatic text summarization using extractive methods like LexRank, LSA, Luhn, and Edmundson.
Give ChatGPT long-term memory by uploading custom knowledge base files (PDF, txt, epub) and asking questions via a React frontend.
A Go package for Optical Character Recognition (OCR) using the Tesseract C++ library.
A curated list of awesome open-source OCR software, libraries, datasets, and literature.
A Python library for reading, writing, repairing, and transforming PDFs, powered by the qpdf C++ library.
A Rust library for creating, merging, modifying, and decrypting PDF documents with support for modern object streams.
A high-performance .NET library for creating, manipulating, inspecting, and maintaining PDF documents.
A Java JNA wrapper for Tesseract OCR API, enabling OCR functionality in Java applications.
A pure Ruby library for creating, manipulating, merging, securing, and optimizing PDF files with a Ruby-esque API.
A pure PHP library for reading and writing presentation files in PowerPoint (PPTX) and OpenDocument (ODP) formats.
A Rust library for creating, reading, writing, and rendering PDF documents with support for graphics, fonts, and experimental HTML layout.
A high-performance C++ library for creating, parsing, and manipulating PDF files and streams.
A high-performance PDF toolkit for text/image extraction, markdown conversion, and PDF editing, built in Rust with Python, WASM, CLI, and MCP server bindings.
A pure Ruby library for merging PDF files, adding page numbers, watermarks, and stamps.
A Ruby wrapper library that provides Ruby bindings and a Ruby-esque interface to the Tesseract OCR API.
A task-oriented Java SDK for PDF manipulation with ready-to-use operations and extensible architecture.
A .NET library for reading, modifying, and generating PowerPoint (PPTX) presentations without requiring Microsoft Office.
A Julia package providing standard tools and models for text analysis and natural language processing.
Extract and index knowledge from websites, PDFs, docs, and YouTube to power Q&A sessions using GPT and other language models.
A generic EPUB parser and generator library for Ruby that supports EPUB 2 and EPUB 3 specifications.
A PowerShell module for creating, editing, splitting, merging, and converting PDF files across Windows, Linux, and macOS.
A Ruby API for document creation and conversion using Ghostscript, supporting PDF, PS, GIF, TIF, PNG, JPG formats.
A .NET library for reading and writing Office formats (Excel, Word) without requiring Microsoft Office installation.
A Docker image providing a full TeX Live distribution with additional tools like Pandoc, Inkscape, and GraphViz for LaTeX workflows.
A friendly macOS desktop app to combine multiple PDF files into a single PDF with a simple drag-and-drop interface.
Elixir library that converts PDF documents to HTML while preserving text and formatting.
Go implementation of the Open Packaging Conventions (OPC) for reading and writing formats like .docx and .xlsx.
An AI-powered OSINT platform that extracts entities, visualizes relationships, and uses multi-agent reasoning to analyze documents for intelligence.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.