A PyTorch-based framework for training and validating models that produce high-quality embeddings for metric learning and retrieval tasks.
Open Metric Learning (OML) is a PyTorch-based framework for training and validating deep learning models that produce high-quality embeddings, specifically designed for metric learning and retrieval tasks. It solves the problem of optimizing models for distance-based search scenarios, where standard classification training doesn't directly optimize retrieval metrics like cosine or L2 distances.
Machine learning engineers and researchers working on retrieval systems, person re-identification, face recognition, product search, or any application requiring similarity search over embeddings, especially those with many classes but few samples per class.
Developers choose OML for its practical, pipeline-oriented approach that includes config-based training, a zoo of pretrained models, and comprehensive tooling for metric learning, reducing the implementation overhead compared to building custom solutions or using more tool-focused libraries.
Metric learning and retrieval pipelines, models and zoo.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
OML allows training models by preparing data in a required format and modifying a YAML config file, similar to frameworks like mmdetection, which reduces code boilerplate and standardizes experiments.
Provides easy access to pretrained models for images, texts, and audio in a torchvision-like manner, including ViT, CLIP, and DINO architectures, facilitating quick starts without training from scratch.
Includes specialized losses like TripletLoss and ArcFace, miners like HardTripletsMiner, and retrieval metrics, offering end-to-end support for embedding optimization and validation.
Handles image, text, and audio data with dedicated extractors and datasets, making it versatile for various retrieval applications beyond just vision tasks.
OML does not support direct model exporting to ONNX; users must rely on PyTorch's built-in capabilities, which can complicate deployment in production environments requiring optimized inference.
Pipelines require data to be formatted in specific .csv tables, which may necessitate additional preprocessing and limit flexibility for teams with custom or unstructured datasets.
Compared to more established libraries, OML has a smaller community and fewer third-party integrations, which might affect long-term support and troubleshooting for niche use cases.