Pdf Processing

14 projects

Showing 14 of 14 projects

A Python library for parsing diverse document formats into structured data, optimized for integration with generative AI applications.

#ai#tables#documents

Stars63.4k

Forks4.5k

Last commit2 days ago

gpt4-pdf-chatbot-langchainTypeScript

A customizable AI chatbot agent that ingests PDF documents, stores embeddings in a vector database, and answers user queries using LangChain and LangGraph.

#ai#langgraph#document-qa

A comprehensive PDF processing library and CLI written in Go, supporting encryption, validation, and batch operations.

#pdf-utilities#pdf-tools#pdf-lib

Stars8.7k

Forks619

Last commit6 days ago

pdfminer.sixPython

A Python library for extracting and analyzing text, images, and metadata from PDF documents.

#text-extraction#pdf-tools#open-source

Stars7.0k

Forks1.0k

Last commit4 months ago

edgequakeRust

A high-performance GraphRAG framework in Rust that transforms documents into knowledge graphs for superior retrieval and generation.

#high-performance#lightrag#llm-integration

Stars2.0k

Forks235

Last commit2 days ago

Awesome Document Understanding

A curated list of resources for Document Understanding (DU), covering research, datasets, tools, and applications in Intelligent Document Processing.

#key-information-extraction#document-understanding#document-analysis

Stars1.5k

Forks178

Last commit3 years ago

pdf_oxideRust

A high-performance PDF toolkit for text/image extraction, markdown conversion, and PDF editing, built in Rust with Python, WASM, CLI, and MCP server bindings.

#text-extraction#open-source#pdf-parser

Stars892

Forks105

Last commit1 day ago