A suite of GPU-accelerated machine learning algorithms with scikit-learn compatible APIs for 10-50x faster performance on large datasets.
cuML is a GPU-accelerated machine learning library that provides scikit-learn compatible implementations of algorithms for clustering, regression, classification, dimensionality reduction, and more. It solves the problem of slow CPU-based ML training and inference on large datasets by leveraging NVIDIA GPUs for massive parallelism, delivering order-of-magnitude speed improvements.
Data scientists, ML researchers, and software engineers working with large tabular datasets who want to accelerate traditional ML workflows without rewriting code or learning low-level CUDA programming.
Developers choose cuML because it offers a frictionless transition from scikit-learn with matching APIs, provides 10-50x faster execution on GPUs, and scales to multi-node, multi-GPU clusters through Dask integration while maintaining full compatibility with the Python ML ecosystem.
cuML - RAPIDS Machine Learning Library
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
The Python API closely matches scikit-learn, allowing minimal code changes to port existing workflows to GPU acceleration, as shown in the DBSCAN example.
Integrates with Dask for distributed computing, enabling scaling to multi-node, multi-GPU clusters for large datasets, demonstrated with NearestNeighbors queries.
Covers a wide range of traditional ML tasks including clustering, regression, and dimensionality reduction, with a detailed table of supported algorithms.
Delivers 10-50x speedups over CPU equivalents for large datasets, backed by benchmarks in the provided notebooks.
Requires specific NVIDIA GPUs and CUDA toolkits, making it unsuitable for non-NVIDIA hardware or cloud environments with alternative accelerators.
Some algorithms, like Random Forest, are marked experimental in multi-GPU mode, which may lead to instability or lack of full support, as noted in the README table.
Optimal multi-node performance requires configuring Dask with UCXX for fast transport, adding deployment complexity compared to standalone scikit-learn, as seen in the initialization snippets.