A Scala framework for distributed supervised learning of decision tree ensemble models, inspired by Google's PLANET.
Brushfire is a Scala framework for distributed supervised learning of decision tree ensemble models. It enables training classifiers on large datasets by leveraging distributed computing platforms like Scalding and Hadoop, supporting features such as numeric and categorical inputs, cross-validation, and random forests.
Data engineers and machine learning practitioners working with large-scale datasets in Scala ecosystems, particularly those using Hadoop-based distributed processing.
Developers choose Brushfire for its extensible, type-safe design inspired by Google's PLANET, offering pluggable components and support for mixed feature types while running efficiently on distributed systems.
Distributed decision tree ensemble learning in Scala
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Leverages Scalding/Hadoop for large-scale training, implementing Google's PLANET-inspired algorithm to handle massive datasets efficiently.
Supports mixed numeric and categorical features via a dispatched type system, including high-cardinality cases with Ordinal, Nominal, Continuous, or Sparse wrappers.
Offers pluggable components for samplers, evaluators, and splitters, allowing deep customization of the learning pipeline as highlighted in the README.
Built-in support for k-fold cross-validation and random forests enables robust model validation and improved performance for classification tasks.
Only integrates with Scalding/Hadoop; future plans for Spark or single-node platforms are not implemented, restricting deployment options.
The README explicitly states regression trees are not yet available, limiting the framework to classification use cases only.
Using Continuous or Sparse feature types slows down learning significantly, as admitted in the documentation, impacting efficiency with complex data.