Showing 32 of 32 projects
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications in Python.
A Python framework for computing and training state-of-the-art text embeddings, rerankers, and sparse encoders.
Open-source AI platform for building private agents, assistants, and enterprise search with document analysis and multi-model support.
A Python library for topic modeling, document indexing, and similarity retrieval with large corpora.
A Python library for topic modeling, document indexing, and similarity retrieval with large text corpora.
An open-source, cloud-native vector database that combines semantic search with structured filtering for AI applications.
A full-text search engine library written in Rust, inspired by Apache Lucene.
A modern indexing and search library for Go supporting text, numeric, geo-spatial, and vector data.
A JavaScript library for parsing text to extract dates, times, phone numbers, emails, places, and other structured information.
A PyTorch system for open-domain question answering by retrieving and reading documents, originally applied to Wikipedia.
A high-performance string library leveraging SIMD and SWAR to accelerate search, hashing, sorting, and edit distances across C, C++, Python, Rust, and more.
A TensorFlow library for Learning-to-Rank (LTR) techniques, providing loss functions, metrics, and models for ranking tasks.
A portable mixed-precision math library with 2,000+ SIMD kernels for 15+ numeric types across x86, Arm, RISC-V, and WebAssembly.
An open-source AI troubleshooting atlas and avatar runtime for diagnosing and fixing RAG, agent, and real-world AI workflow failures.
A curated list of awesome information retrieval resources including books, courses, datasets, software, and conferences.
A curated list of awesome resources for information retrieval and web search, including books, courses, datasets, and software.
A comparative Python framework for building, evaluating, and deploying multimodal recommender systems with auxiliary data.
A PyTorch framework for training neural learning-to-rank models with flexible loss functions and scoring architectures.
A Ruby gem for calculating text similarity using tf*idf and BM25 vector space models.
A curated list of resources for Question Answering (QA), covering machine learning, deep learning, datasets, and research.
A modern C++ toolkit for text retrieval and analysis, featuring indexing, ranking, topic modeling, classification, and language models.
A vector space search engine, vector database, and key/value store for efficient string processing and vector operations.
A high-performance Java library for compressing arrays of integers, optimized for databases and information retrieval.
An Elixir library for structured data extraction from websites, articles, and RSS/Atom feeds using information-retrieval techniques.
An enterprise-grade Graph RAG framework combining hierarchical tree navigation with knowledge graph reasoning for verifiable, on-premise AI.
An extensible information retrieval library for Ruby, similar to Apache Lucene.
A curated list of free tools, datasets, models, and resources for Hungarian Natural Language Processing.
An English (Porter2) stemming implementation in Elixir for reducing words to their base forms.
A Go implementation of the Rapid Automatic Keyword Extraction (RAKE) algorithm for extracting keywords from text.
A Node.js implementation of Martin Porter's stemming algorithm for removing morphological endings from English words.
A Julia package providing high-performance, configurable tokenizers and sentence splitters for natural language processing.
An efficient and ergonomic document search engine library built on top of perlin-core.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.