A NumPy/SciPy-compatible array library for GPU-accelerated computing with Python, supporting NVIDIA CUDA and AMD ROCm.
CuPy is a GPU-accelerated array library for Python that provides a NumPy/SciPy-compatible API. It enables users to run existing numerical Python code on NVIDIA CUDA or AMD ROCm GPUs with minimal modifications, dramatically accelerating scientific computing, machine learning, and data processing workflows. The library serves as a bridge between Python's ease of use and the raw parallel processing power of modern GPUs.
Data scientists, researchers, and engineers working with numerical computing in Python who need to accelerate NumPy/SciPy workflows on GPU hardware, particularly those in machine learning, scientific computing, and signal processing domains.
Developers choose CuPy because it provides near-perfect NumPy compatibility while delivering massive GPU acceleration, requiring minimal code changes to existing workflows. Its unique combination of high-level API compatibility and low-level CUDA access offers both ease of adoption and fine-grained performance optimization capabilities.
NumPy & SciPy for GPU
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
CuPy's API closely mirrors NumPy, allowing existing code like cp.arange to run on GPUs with minimal changes, as shown in the README example where array operations work identically.
Supports both NVIDIA CUDA and AMD ROCm architectures, offering flexibility across GPU hardware, though ROCm support is noted as experimental in the installation guide.
Provides direct access to CUDA features such as RawKernels and Streams, enabling advanced users to fine-tune performance for critical applications, as mentioned in the low-level access section.
Includes cuSignal functionality for accelerated signal processing workflows, reducing the need for separate libraries, with integration noted starting from v13.0.0.
Installation requires selecting version-specific packages (e.g., cupy-cuda12x) and managing GPU drivers, with ROCm support labeled experimental, making setup error-prone and confusing.
Handling GPU memory can lead to out-of-memory errors if not managed carefully, especially with large datasets, as direct CUDA access requires manual optimization.
While NumPy-compatible, not all functions may be fully optimized or implemented, potentially causing performance bottlenecks or gaps in functionality compared to native NumPy.