A tree ensemble machine learning method that delivers better results than gradient boosted decision trees on many datasets.
Regularized Greedy Forest (RGF) is a tree ensemble machine learning algorithm that often outperforms traditional gradient boosted decision trees (GBDT) on various datasets. It works by directly optimizing the forest structure with built-in tree-structured regularization, rather than building trees sequentially. The method has been used successfully in Kaggle competitions and academic research.
Data scientists, machine learning engineers, and researchers who need high-performance tree ensemble models for tabular data, particularly those competing in Kaggle or working on problems where GBDT methods are commonly applied.
RGF provides better predictive performance than standard gradient boosting on many datasets due to its direct forest optimization and built-in regularization. The multi-core FastRGF implementation offers speed advantages, and the availability of Python and R wrappers makes it accessible for practical data science workflows.
Home repository for the Regularized Greedy Forest (RGF) library. It includes original implementation from the paper and multithreaded one written in C++, along with various language-specific wrappers.
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Unlike sequential boosting, RGF optimizes the entire forest directly, leading to better generalization and accuracy on various datasets as noted in the README.
Integrates tree-structured regularization into the learning formulation to prevent overfitting, improving model robustness without extra steps.
FastRGF provides a multi-threaded C++ implementation for faster training on multi-core systems, enhancing efficiency for large datasets.
Offers Python and R wrappers, making it accessible for integration into common data science workflows as highlighted in the features.
FastRGF has simplifications compared to the original RGF, which might reduce accuracy in complex scenarios, as mentioned in the README.
Smaller community than XGBoost or LightGBM, resulting in fewer tutorials, examples, and third-party integrations for troubleshooting.
Lacks GPU support, which can be a bottleneck for very large datasets where competitors like XGBoost offer faster training options.