Document Analysis

10 projects

Showing 10 of 10 projects

VaneTypeScript

A privacy-focused AI answering engine that runs on your own hardware, combining web search with local and cloud LLMs.

#perplexica#document-analysis#open-source-ai-search-engine

Stars35.7k

Forks3.9k

Last commit3 months ago

DocsGPTPython

Open-source AI platform for building private agents, assistants, and enterprise search with document analysis and multi-model support.

#ai#information-retrieval#multi-model-support

Stars18.0k

Forks2.1k

Last commit5 days ago

pdfminer.sixPython

A Python library for extracting and analyzing text, images, and metadata from PDF documents.

#text-extraction#pdf-tools#open-source

Stars7.0k

Forks1.0k

Last commit4 months ago

FOCAC#

A Windows tool for extracting metadata and hidden information from documents found on web pages and local files.

#document-analysis#information-gathering#metadata-extraction

Stars3.6k

Forks624

Last commit3 years ago

LeptonicaC

A C library for efficient image processing and analysis, widely used in OCR and computer vision applications.

#c-library#image-analysis#document-analysis

Stars2.1k

Forks433

Last commit9 days ago

Awesome Document Understanding

A curated list of resources for Document Understanding (DU), covering research, datasets, tools, and applications in Intelligent Document Processing.

#key-information-extraction#document-understanding#document-analysis

Stars1.5k

Forks178

Last commit3 years ago

Topic Models ResourcesR

A curated collection of learning resources, R packages, and practical examples for understanding and applying topic modeling techniques.

#document-analysis#text-analysis#data-science

Stars232

Forks54

Last commit

aws-pdf-textract-pipelineTypeScript

Serverless data pipeline for crawling PDFs from the web and extracting structured data using AWS Textract.

#lambda#web-crawling#aws-textract

Stars165

Forks20

Last commit2 years ago

Deep Belief Nets for Topic ModelingPython

A Python toolbox using deep belief networks for topic modeling on document data, producing latent representations for content-based recommendation.

#deep-belief-networks#research-tool#document-analysis

A V programming language wrapper for Tesseract-OCR, enabling text extraction and OCR operations from images.

#text-extraction#document-analysis#wrapper-library

Stars17

Forks3

Last commit4 years ago

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub