A scikit-learn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
scikit-mdr is a Python library that implements Multifactor Dimensionality Reduction (MDR), a feature construction algorithm for machine learning. It creates new features by modeling higher-order interactions between variables, particularly useful for capturing complex patterns in datasets with categorical features. The library provides scikit-learn-compatible estimators for both classification and regression problems.
Data scientists and researchers working with categorical data who need to model feature interactions, particularly in domains like bioinformatics and genetics where detecting epistasis (gene-gene interactions) is important.
Developers choose scikit-mdr because it provides a well-tested, scikit-learn-compatible implementation of the MDR algorithm that seamlessly integrates into existing machine learning workflows. Unlike generic feature engineering approaches, it specifically targets higher-order interaction detection in categorical data.
A sklearn-compatible Python implementation of Multifactor Dimensionality Reduction (MDR) for feature construction.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Implements standard methods like fit and transform, allowing seamless integration into existing scikit-learn pipelines, as demonstrated in the README examples with MDRClassifier and ContinuousMDR.
Specializes in constructing features that capture complex, higher-order interactions between variables, making it effective for detecting epistasis in genetic data, as highlighted in the key features.
Explicitly designed for categorical features, providing targeted functionality for domains like bioinformatics where such data is common, as stated in the README's feature support.
Offers both MDRClassifier for binary classification and ContinuousMDR for regression, covering two major machine learning tasks with dedicated estimators, shown in the code snippets.
Only works with categorical features, excluding continuous data unless discretized, and supports only binary classification and regression, not multi-class or unsupervised learning, as admitted in the README's limitations.
The project is under active development, which may lead to instability, breaking changes, or incomplete features, noted in the README's warning about regular updates and planned expansions.
MDR algorithms can be computationally expensive, especially for high-dimensional data or higher-order interactions, potentially limiting scalability for large datasets, though not explicitly stated, it's a known trade-off of the method.