A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for machine learning.
scikit-rebate is a Python library that implements ReBATE, a suite of Relief-based feature selection algorithms compatible with scikit-learn. It provides efficient methods for identifying relevant features in supervised learning datasets, with special attention to detecting feature interactions without exhaustive search. The package solves the problem of computationally intensive feature selection in machine learning pipelines, particularly in domains like genetics where epistasis (feature interactions) is common.
Data scientists, machine learning practitioners, and researchers using scikit-learn who need efficient feature selection methods, especially those working with genetic data or datasets where feature interactions are important.
Developers choose scikit-rebate because it offers scikit-learn-compatible implementations of advanced Relief algorithms that efficiently detect feature interactions, supports diverse data types and endpoints, and integrates seamlessly into existing machine learning workflows without requiring extensive configuration.
A scikit-learn-compatible Python implementation of ReBATE, a suite of Relief-based feature selection algorithms for Machine Learning.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Seamlessly integrates into scikit-learn workflows, as demonstrated in the README example using make_pipeline and cross_val_score for easy pipeline incorporation.
Automatically handles mixed data types (discrete/continuous), missing values, and various endpoints (binary, multi-class, regression) without manual preprocessing, as highlighted in the feature support section.
Efficiently identifies feature interactions without exhaustive pairwise searches, saving computation time, which is particularly beneficial for genetic data with epistasis.
Includes multiple Relief-based algorithms like ReliefF, SURF, and MultiSURF* for different feature selection needs, offering flexibility for various supervised learning tasks.
Compared to the standalone Cython-optimized ReBATE version, scikit-rebate may have slower runtimes, as the README notes the alternative focuses on improved performance.
Algorithms like ReliefF require user-specified parameters (e.g., n_neighbors), which can be non-trivial to optimize without domain expertise, as mentioned in the README.
The package is under active development, which might lead to breaking changes or bugs, as indicated by the development status badges and note in the README.