A minimal benchmark comparing scalability, speed, and accuracy of popular open-source machine learning libraries for binary classification.
benchm-ml is a benchmarking project that compares the scalability, speed, and accuracy of popular open-source machine learning libraries for binary classification. It evaluates implementations like scikit-learn, H2O, xgboost, and Spark MLlib on datasets ranging from 10,000 to 10 million rows, helping users identify the most efficient tools for tasks like fraud detection or credit scoring.
Data scientists and machine learning engineers who need to select optimal ML libraries for binary classification on medium-to-large structured datasets, particularly in business applications.
It provides empirical, side-by-side comparisons of leading ML tools, highlighting trade-offs in speed, memory usage, and accuracy, which helps practitioners make informed decisions without relying on hype or marketing claims.
A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.).
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Compares implementations across R, Python, H2O, xgboost, Spark, and more, providing side-by-side results for training time, RAM usage, and AUC on datasets up to 10M rows.
Focuses on how algorithms scale from 10K to 10M rows on commodity hardware, with detailed metrics for linear models, random forests, and boosting, helping users gauge performance limits.
Highlights learning curves, such as random forests on 1% data outperforming linear models on full data, offering actionable insights for model selection in business applications.
Updates include LightGBM and a successor repo for GBMs with dockerized tests, showing adaptation to new tools like GPU implementations, though parts remain outdated.
Much of the data was collected in 2015 with later updates, and the author notes that a new benchmark for GBMs is more current, reducing relevance for the latest library versions.
Limited to binary classification with dense, non-sparse tabular data and no missing values, excluding regression, multiclass, or sparse feature scenarios common in real-world data.
Relies on specific EC2 instance types and manual installation without full dockerization for all tests, making reproduction cumbersome compared to modern automated benchmarks.
Criticizes tools like Spark for poor performance but provides limited distributed benchmarking, and some tests crash or have accuracy issues, offering incomplete guidance for cluster-based ML.