A Python library for feature engineering and selection with scikit-learn compatible transformers.
Feature-engine is an open-source Python library that provides a wide array of transformers for feature engineering and feature selection in machine learning workflows. It solves the problem of fragmented feature preprocessing by offering a unified, scikit-learn-compatible toolkit that includes imputation, encoding, discretization, outlier handling, transformation, and selection methods. The library is designed to integrate seamlessly into existing scikit-learn pipelines, making feature engineering more efficient and reproducible.
Data scientists, machine learning engineers, and researchers who build machine learning models in Python and need robust, scalable tools for feature preprocessing and selection. It is particularly useful for professionals working on tabular data, time series, or text data within scikit-learn ecosystems.
Developers choose Feature-engine because it consolidates numerous feature engineering techniques into a single, well-documented library with a consistent scikit-learn API, reducing the need for custom code and third-party dependencies. Its comprehensive coverage of methods, active maintenance, and integration with popular ML workflows provide a reliable, production-ready solution for feature preprocessing.
Feature engineering and selection open-source Python library compatible with sklearn.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Follows scikit-learn's API with fit() and transform() methods, enabling easy drop-in use in existing pipelines, as highlighted in the README's philosophy for practicality.
Offers a wide array of transformers for imputation, encoding, selection, and more, reducing the need for multiple third-party libraries, evidenced by the extensive list including time series and text features.
Backed by detailed documentation on Read the Docs, YouTube tutorials, and active maintenance from Train in Data, ensuring reliability and ease of adoption for real-world projects.
Includes advanced methods like MRMR selection and datetime subtraction, catering to complex ML workflows beyond basic tabular data, as shown in the feature list.
For basic preprocessing already handled by scikit-learn, using Feature-engine can add unnecessary complexity and slower execution due to additional abstraction layers.
While comprehensive, it may lack some cutting-edge or domain-specific transformers found in specialized libraries (e.g., for graph data), requiring custom code extensions.
Heavily reliant on scikit-learn's API, making it less suitable for projects using alternative ML frameworks, which limits flexibility in heterogeneous tech stacks.