Question 1

Is xgboost faster than H2O for random forests?

Accepted Answer

In the benchmark, xgboost is generally faster and more memory-efficient than H2O for random forests on large datasets, but H2O handles categorical variables better and can be more accurate in some cases. For example, on 10M rows, xgboost took 3000 seconds vs. H2O's 4000 seconds, with similar AUC.

Question 2

How to reproduce the benchm-ml benchmark on my own machine?

Accepted Answer

You can follow the installation instructions using EC2 instances or adapt the scripts, but note that setup is manual and not fully dockerized. The author recommends using the successor GBM-perf repo for more reproducible, dockerized tests focused on gradient boosting.

Question 3

Does deep learning beat gradient boosting on tabular data?

Accepted Answer

No, the benchmark shows that deep learning underperforms compared to gradient boosting on structured tabular data like the airline dataset, with AUCs around 73 vs. 80+ for GBMs. The author cites this as a reason to focus on tree-based methods for such problems.

Question 4

What's the best ML library for 10 million rows according to benchm-ml?

Accepted Answer

For binary classification on dense data, H2O and xgboost are top performers in speed and accuracy, with LightGBM added as a fast option later. However, the choice depends on trade-offs: H2O excels with categoricals, while xgboost is often faster.

Question 5

Why is Spark MLlib criticized in benchm-ml?

Accepted Answer

Spark MLlib is shown to be slow, memory-inefficient, and sometimes less accurate than single-node tools like H2O or xgboost in this benchmark. For instance, in random forests, Spark crashed on larger datasets and had lower AUC, leading the author to label it as underperforming for ML.

Question 6

Can I use benchm-ml for regression problems?

Accepted Answer

No, benchm-ml is specifically designed for binary classification tasks, such as fraud detection or credit scoring, and does not cover regression or other problem types. Its data generation and evaluation are tailored to AUC metrics for classification.

Szilard's machine learning benchmark

What is Szilard's machine learning benchmark?

Overview

Use Cases

Best For

Related Projects

Found a gem we're missing?

Not Ideal For

Pros & Cons

Pros

Cons

Frequently Asked Questions