Showing 13 of 13 projects
A Python library for parsing diverse document formats into structured data, optimized for integration with generative AI applications.
A customizable AI chatbot agent that ingests PDF documents, stores embeddings in a vector database, and answers user queries using LangChain and LangGraph.
A comprehensive PDF processing library and CLI written in Go, supporting encryption, validation, and batch operations.
A Python library for extracting and analyzing text, images, and metadata from PDF documents.
A high-performance GraphRAG framework in Rust that transforms documents into knowledge graphs for superior retrieval and generation.
A curated list of resources for Document Understanding (DU), covering research, datasets, tools, and applications in Intelligent Document Processing.
A high-performance PDF toolkit for text/image extraction, markdown conversion, and PDF editing, built in Rust with Python, WASM, CLI, and MCP server bindings.
A Python tool that uses GPT-3.5 to read, summarize, and answer questions about academic PDF papers locally.
A task-oriented Java SDK for PDF manipulation with ready-to-use operations and extensible architecture.
A Chrome extension that enhances ChatGPT with PDF support, markdown conversion, prompt hints, and cross-page selection.
A Ruby gem for extracting pages from PDFs as images and text strings using Ghostscript, ImageMagick, and pdftotext.
An OCaml library for reading, writing, and modifying PDF files, serving as the foundation for the CPDF toolchain.
Serverless data pipeline for crawling PDFs from the web and extracting structured data using AWS Textract.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.