Open-Awesome
CategoriesAlternativesStacksSelf-HostedExplore
Open-Awesome

© 2026 Open-Awesome. Curated for the developer elite.

TermsPrivacyAboutGitHubRSS
  1. Home
  2. Data Science
  3. OpenFE

OpenFE

MITPython

An automated feature generation framework for tabular data that discovers expert-level features to boost machine learning model performance.

GitHubGitHub
869 stars112 forks0 contributors

What is OpenFE?

OpenFE is an automated feature generation framework for tabular data that systematically creates new candidate features to improve machine learning model performance. It supports various tasks like classification and regression, and is designed to be efficient and easy to use, often outperforming human experts in feature engineering.

Target Audience

Data scientists, machine learning engineers, and researchers working with tabular data who need to enhance model performance through automated feature engineering, particularly in competitive settings like Kaggle.

Value Proposition

Developers choose OpenFE for its ability to generate expert-level features automatically, its efficiency with parallel computing, and its proven track record of outperforming existing methods and human experts in real-world competitions.

Overview

OpenFE: automated feature generation with expert-level performance

Use Cases

Best For

  • Automating feature engineering for tabular datasets in machine learning projects
  • Improving performance of GBDT models like XGBoost through generated features
  • Enhancing neural network models on structured data with new features
  • Competing in data science competitions like Kaggle with limited feature engineering time
  • Handling missing values and categorical features automatically during feature generation
  • Scaling feature generation efficiently with parallel computing support

Not Ideal For

  • Projects involving non-tabular data like images, text, or time-series with complex temporal dependencies
  • Real-time prediction systems where low-latency feature generation is critical
  • Resource-constrained environments with limited CPU or memory for parallel processing
  • Teams relying heavily on domain-specific feature engineering that requires manual intuition and control

Pros & Cons

Pros

Broad Operator Support

Covers 23 useful and effective operators for generating diverse candidate features, as specified in the README, enabling comprehensive feature exploration.

Automated Data Handling

Automatically processes missing values and categorical features during feature generation, reducing manual preprocessing effort for tabular data.

Proven Competition Performance

Validated on Kaggle, such as beating 99.3% of teams in IEEE-CIS Fraud Detection, demonstrating expert-level feature engineering capabilities.

Parallel Computing Efficiency

Supports parallel processing with n_jobs parameter for faster execution on large datasets, enhancing scalability.

Cons

Installation Pitfalls

The README warns that conda install may install a different package, requiring careful pip setup and potentially causing confusion for users.

Tabular-Only Focus

Designed specifically for tabular data, limiting applicability to other data types like images or text without significant preprocessing.

Computational Intensity

Despite parallel support, feature generation can be resource-heavy, demanding substantial CPU and memory for large datasets, which may not suit all environments.

Frequently Asked Questions

Quick Stats

Stars869
Forks112
Contributors0
Open Issues21
Last commit1 year ago
CreatedSince 2022

Tags

#parallel-computing#python-library#data-science#kaggle#neural-networks#automated-feature-engineering#feature-generation#xgboost#tabular-data#machine-learning

Built With

P
Python

Included in

Data Science3.4k
Auto-fetched 5 hours ago

Related Projects

tsfreshtsfresh

Automatic extraction of relevant features from time series:

Stars9,183
Forks1,264
Last commit5 months ago
FeaturetoolsFeaturetools

An open source python library for automated feature engineering

Stars7,631
Forks906
Last commit2 months ago
Feature EngineFeature Engine

Feature engineering and selection open-source Python library compatible with sklearn.

Stars2,233
Forks342
Last commit1 month ago
Community-curated · Updated weekly · 100% open source

Found a gem we're missing?

Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.

Submit a projectStar on GitHub