Question 1

How to handle high-cardinality categorical features in Brushfire?

Accepted Answer

Use the Sparse type in the Dispatched system for categorical features with many unique values, but note this can slow performance. The framework is designed to manage this, but optimization is required.

Question 2

Brushfire vs XGBoost for distributed tree learning?

Accepted Answer

Brushfire is Scala-based and tightly integrated with Hadoop/Scalding, offering extensibility for custom components. XGBoost is more performant and widely supported in Python but less flexible for deep customization in Scala ecosystems.

Question 3

Does Brushfire support model export for production deployment?

Accepted Answer

Models are serialized to JSON or Kryo formats in HDFS directories, but there's no built-in deployment mechanism for real-time inference, requiring manual integration with serving systems.

Question 4

How to add custom split criteria in Brushfire?

Accepted Answer

Define a new Splitter by extending the provided interface; the pluggable architecture allows tailoring feature binning, such as for log-binning continuous values, as described in the README.

Question 5

What are the hardware requirements for running Brushfire at scale?

Accepted Answer

It requires a Hadoop cluster with Scalding, as it's designed for distributed computing; local mode is possible but limited, and performance scales with cluster resources for large datasets.

Question 6

Can Brushfire be used for time-series classification?

Accepted Answer

Yes, through the timestamp field in Instances, but it lacks specialized time-series features; you'd need custom evaluators or splitters to handle temporal aspects effectively.

brushfire

What is brushfire?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions