Question 1

How do I use BorutaPy with gradient boosting models like XGBoost?

Accepted Answer

BorutaPy requires scikit-learn compatible estimators with feature_importances_. For XGBoost, use the scikit-learn API (XGBClassifier or XGBRegressor) which provides this attribute. However, note that feature importances might differ from the Gini impurity used by default in BorutaPy's recommendations.

Question 2

BorutaPy vs LASSO: which is better for feature selection?

Accepted Answer

BorutaPy selects all relevant features for interpretability, while LASSO finds a sparse, minimal set via regularization. Choose BorutaPy if understanding all contributors is key, but expect higher computational cost. LASSO is faster but may miss weakly relevant features.

Question 3

How to interpret the ranking_ attribute in BorutaPy results?

Accepted Answer

The ranking_ array assigns rank 1 to confirmed features, rank 2 to tentative ones, and higher ranks to rejected features. Use this along with support_ and support_weak_ to prioritize features for model building or further analysis.

Question 4

Does BorutaPy handle multicollinearity in features?

Accepted Answer

BorutaPy does not explicitly address multicollinearity; it selects features based on importance relative to shadow features. Highly correlated features might both be selected, so post-processing or domain knowledge is needed to handle redundancy.

Question 5

What parameters should I tune first in BorutaPy for better results?

Accepted Answer

Start by adjusting perc (percentile threshold) to control false positives, and set two_step=True for relaxed corrections. Also, optimize the base estimator's parameters like max_depth to 3-7 as recommended for pruned trees to improve performance.

Question 6

Can BorutaPy be used for time series data?

Accepted Answer

BorutaPy is designed for i.i.d. data and may not account for temporal dependencies. You would need to preprocess time series into feature vectors and ensure the underlying estimator (e.g., RandomForest) is appropriate, but interpret results cautiously due to potential autocorrelation.

boruta_py

What is boruta_py?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions