Showing 28 of 28 projects
A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.
A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.
Connects ChatGPT with visual foundation models to enable sending and receiving images during chat interactions.
Open-source AI orchestration framework for building context-engineered, production-ready LLM applications in Python.
A collection of hands-on tutorials and practical examples for using Google's Gemini API across text, image, video, audio, and robotics applications.
A Python library for language-vision intelligence research, providing unified access to state-of-the-art models, datasets, and tasks.
An open-source embedded retrieval library for multimodal AI, offering fast vector search, SQL, and full-text search.
An automated machine learning library that trains and deploys high-accuracy models for tabular, text, image, and time series data with minimal code.
A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.
A terminal-based AI assistant that analyzes code, automates workflows, and executes tasks using natural language commands.
A JAX library for rapid prototyping of large-scale attention-based vision models across images, video, audio, and multimodal data.
An open-source framework for building multimodal AI systems that enable large language models to understand and chat about videos and images.
A curated repository of famous Vision-Language Models (VLMs) detailing their architectures, training procedures, and datasets.
A curated list of recent research papers and resources on Vision and Language Pre-trained Models (VL-PTMs).
A comparative Python framework for building, evaluating, and deploying multimodal recommender systems with auxiliary data.
A curated list of deep learning resources for video-text retrieval, including papers, implementations, and datasets.
A .NET provider-agnostic SDK for building, orchestrating, and deploying AI agents and workflows with 30+ built-in API connectors.
A FastAPI proxy that transforms Google's Gemini CLI into OpenAI-compatible and native Gemini API endpoints for easy integration.
A video-language understanding framework that treats video narration as vocabulary and videos as long documents for efficient analysis.
A vision-language foundation model for computational pathology, pretrained on 1.17M histopathology image-caption pairs for diverse AI tasks.
A Torch implementation of a VIS+LSTM model for answering questions about images using deep learning.
A Python toolkit for visual analysis and evaluation of text generation tasks like translation, summarization, and captioning.
A joint audio tagging and speech recognition model that adds audio event detection to OpenAI Whisper with minimal computational overhead.
A Java HTTP client library for interacting with the OpenAI API and compatible providers in a simple, consistent manner.
A JAX-based framework for streamlined training, fine-tuning, and high-performance serving of large language and multimodal models.
An MCP server wrapper for Google's Gemini CLI that enables AI assistants to access Gemini's search, chat, and file analysis capabilities.
A shell wrapper for interacting with multiple AI service providers including OpenAI, LocalAI, Ollama, Gemini, and Anthropic via chat, text, and speech endpoints.
A desktop AI assistant and universal MCP client that works with any LLM provider, offering chat, image/video generation, and system-wide productivity tools.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.