An open-source Python repository providing around 40 feature selection algorithms for machine learning applications.
scikit-feature is an open-source Python repository developed by the Data Mining and Machine Learning Lab at Arizona State University that provides around 40 feature selection algorithms for machine learning. It serves as a comprehensive platform for feature selection application, research, and comparative study, built on top of scikit-learn, NumPy, and SciPy. The library addresses the need for a standardized collection of feature selection methods to facilitate algorithm development and empirical evaluation.
Machine learning researchers and practitioners who need to implement, compare, or develop feature selection algorithms for data mining and predictive modeling tasks.
Developers choose scikit-feature because it offers one of the most comprehensive collections of feature selection algorithms in a unified Python interface, specifically designed to support both research and practical applications with seamless integration into the scikit-learn ecosystem.
open-source feature selection repository in python
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Includes around 40 feature selection algorithms spanning traditional, structural, and streaming methods, as highlighted in the README, providing one of the widest selections available in Python.
Built on top of scikit-learn, ensuring seamless compatibility with existing machine learning workflows, pipelines, and estimators for easy adoption.
Designed to facilitate algorithm sharing and empirical evaluation, making it ideal for comparative studies and developing new feature selection methods, as stated in its philosophy.
Leverages NumPy and SciPy for efficient numerical computations, ensuring robust performance and reliability on scientific and large-scale datasets.
Requires installation via setup.py instead of standard pip, which is less convenient, lacks modern dependency management, and may pose challenges on some systems.
Based on a 2018 publication, the library might not receive frequent updates, possibly missing newer algorithms and compatibility with the latest Python or scikit-learn versions.
Instructions are hosted on a separate project website, leading to potential access issues and making it harder to find integrated help within the repository.