A scikit-learn compatible classifier that produces human-interpretable decision rules instead of black box models.
sklearn-expertsys is a scikit-learn compatible classifier that produces highly interpretable decision rules instead of black box models. It implements Bayesian Rule Lists with extensions for continuous data discretization and large dataset optimization, solving the problem of opaque machine learning models by providing transparent, human-readable explanations for predictions.
Data scientists and machine learning practitioners who need interpretable models for domains requiring transparency, such as healthcare, finance, or regulatory compliance, where understanding model decisions is critical.
Developers choose this library because it offers competitive accuracy while providing fully interpretable models, bridging the gap between complex machine learning performance and human-understandable decision rules that domain experts can validate and trust.
Highly interpretable classifiers for scikit learn, producing easily understood decision rules instead of black box models
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Produces decision lists like the Titanic example, making model logic transparent and easy to explain to non-experts, directly addressing the need for interpretability in regulated domains.
Functions as a standard estimator with fit, predict, and score methods, allowing easy integration into existing sklearn workflows without major code changes.
Includes a discretizer for continuous data and allows protection of categorical columns via undiscretized_features, supporting diverse datasets without extensive preprocessing.
BigDataRuleListClassifier subsamples critical points to improve performance on large datasets, as demonstrated in the diabetes example with a training_subset parameter.
Rule list inference can be slow on large datasets without subsampling, potentially hindering real-time applications or large-scale deployments.
Requires pyFIM, an additional library not included in standard sklearn, complicating setup and increasing deployment overhead compared to lightweight alternatives.
The documentation focuses on binary classification tasks, with no explicit support for multi-class or regression, restricting its use case scope.
Automatic discretization of continuous features using MDL might not always capture optimal splits, potentially affecting model accuracy and requiring careful parameter tuning.