A scikit-learn compatible Python library for probabilistic regression, survival analysis, and probability distributions.
skpro is a Python library for supervised probabilistic prediction, offering scikit-learn-compatible tools for regression, survival analysis, and probability distributions. It solves the problem of uncertainty quantification in tabular data by providing interfaces for interval, quantile, and distribution predictions, enabling more informative and reliable machine learning models.
Data scientists and machine learning engineers working on regression, survival analysis, or any predictive modeling task where quantifying uncertainty is critical, especially those already familiar with the scikit-learn ecosystem.
Developers choose skpro for its seamless integration with scikit-learn and sktime, its comprehensive toolkit for probabilistic evaluation, and its ability to turn existing regressors into probabilistic models through reductions, all within a unified and familiar API.
A unified framework for tabular probabilistic regression, time-to-event prediction, and probability distributions in python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Uses fit/predict APIs identical to scikit-learn, making it easy to integrate into existing ML workflows without retraining teams.
Combines tabular regression, survival analysis, and symbolic distributions in one library, reducing tool fragmentation for uncertainty quantification.
Provides methods like bootstrap and conformal to convert standard scikit-learn regressors into probabilistic models, as shown in the ResidualDouble example.
Includes specialized metrics such as CRPS and pinball loss for assessing probabilistic forecasts, detailed in the API reference.
Heavily tied to scikit-learn and sktime, limiting flexibility for projects using alternative ML frameworks or requiring lightweight dependencies.
Key features like survival prediction are labeled as 'maturing' in the documentation, indicating potential instability or incomplete functionality.
Probabilistic predictions, especially full distribution outputs, add significant processing time compared to point estimates, which may impact performance-sensitive applications.