An open-source Python framework to evaluate, test, and monitor ML and LLM systems with 100+ built-in metrics.
Evidently is an open-source Python framework for ML and LLM observability. It helps data scientists and ML engineers evaluate, test, and monitor the performance and reliability of AI systems and data pipelines using a comprehensive suite of over 100 built-in metrics. It addresses the challenge of maintaining model quality and detecting issues like data drift in production.
Machine learning engineers, data scientists, and MLOps practitioners who need to validate, monitor, and debug ML models and LLM applications in development and production environments.
Developers choose Evidently for its extensive built-in metrics, flexibility to handle both tabular data and LLM evals, and its open architecture that supports everything from ad-hoc reports to continuous monitoring without vendor lock-in.
Evidently is an open-source ML and LLM observability framework. Evaluate, test, and monitor any AI-powered system or data pipeline. From tabular data to Gen AI. 100+ metrics.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes over 100 built-in metrics for data quality, drift, classification, regression, and LLM evals, covering a wide range of ML and AI tasks as listed in the feature table.
Generates interactive Reports for analysis and Test Suites with pass/fail conditions, exportable as JSON, HTML, or dictionaries, enabling easy integration into CI/CD pipelines as shown in the examples.
Offers a dashboard for visualizing metrics over time that can be deployed locally, with a demo available, though it requires manual setup using commands like `evidently ui --demo-projects all`.
Works with both tabular and text data, supporting predictive ML and generative LLM tasks including RAG pipelines, as emphasized in the key features and getting started sections.
Deploying the Monitoring UI locally requires manual environment management with uv or virtualenv, as detailed in the installation steps, which can be cumbersome compared to turnkey SaaS solutions.
Advanced features like built-in alerting, no-code evals, and user management are only available in Evidently Cloud, as admitted in the README with a link comparing OSS vs Cloud.
As a Python library, it may not integrate seamlessly with systems built on other languages without additional development effort, limiting its use in heterogeneous tech stacks.