Multimodal Ai

13 projects

Showing 13 of 13 projects

A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.

#transformer#hacktoberfest#model-training

Stars159.8k

Forks33.0k

Last commit1 day ago

JAX, Flax & TransformersPython

A model-definition framework for state-of-the-art machine learning models across text, vision, audio, and multimodal tasks.

#transformer#hacktoberfest#model-training

Connects ChatGPT with visual foundation models to enable sending and receiving images during chat interactions.

#task-automation#image-editing#chatgpt-integration

Stars34.2k

Forks3.2k

Last commit2 years ago

HaystackMDX

Open-source AI orchestration framework for building context-engineered, production-ready LLM applications in Python.

#semantic-search#ai#information-retrieval

Stars25.0k

Forks2.7k

Last commit2 days ago

Gemini CLI cookbookJupyter Notebook

A collection of hands-on tutorials and practical examples for using Google's Gemini API across text, image, video, audio, and robotics applications.

#google-ai#colab-notebooks#gemini

Stars17.1k

Forks2.6k

Last commit

LAVISJupyter Notebook

A Python library for language-vision intelligence research, providing unified access to state-of-the-art models, datasets, and tasks.

#vision-language-pretraining#multimodal-datasets#salesforce

Stars11.2k

Forks1.1k

Last commit1 year ago

AutoGluonPython

An automated machine learning library that trains and deploys high-accuracy models for tabular, text, image, and time series data with minimal code.

#ensemble-learning#python-library#data-science

Stars10.3k

Forks1.1k

Last commit3 days ago

lancedbHTML

An open-source embedded retrieval library for multimodal AI, offering fast vector search, SQL, and full-text search.

#semantic-search#open-source#approximate-nearest-neighbor-search

Stars10.0k

Forks857

Last commit3 days ago

mistral.rsRust

A fast, flexible, and hardware-aware LLM inference engine with zero-config support for any Hugging Face model.

#agentic-ai#quantization#llm

Stars7.0k

Forks581

Last commit9 days ago

iFlow CLIShell

A terminal-based AI assistant that analyzes code, automates workflows, and executes tasks using natural language commands.

#ai-assistant#workflow-automation#command-line-tool

Stars5.1k

Forks411

Last commit1 month ago

ScenicPython

A JAX library for rapid prototyping of large-scale attention-based vision models across images, video, audio, and multimodal data.

#attention#model-training#jax

Stars3.8k

Forks478

Last commit3 days ago

Ask-AnythingPython

An open-source framework for building multimodal AI systems that enable large language models to understand and chat about videos and images.

#chat#big-model#gradio

Stars3.3k

Forks270

Last commit1 year ago

Witsy

A desktop AI assistant and universal MCP client that works with any LLM provider, offering chat, image/video generation, and system-wide productivity tools.

#desktop-application#electronjs#ollama-gui

Stars0

Forks0

Last commit10 days ago

Related Tags

Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a project Star on GitHub