A CUDA backend for Torch7 that enables GPU-accelerated tensor operations with a familiar Torch API.
Cutorch is a CUDA backend for Torch7 that provides GPU-accelerated tensor operations. It enables developers to run numerical computations on NVIDIA GPUs by introducing CUDA tensor types that mirror Torch's CPU tensor API, significantly speeding up machine learning and scientific computing workloads.
Machine learning researchers and developers using Torch7 who need GPU acceleration for tensor operations, particularly those working with deep learning models or large-scale numerical computations.
Developers choose Cutorch because it maintains full API compatibility with Torch7's CPU tensors while providing transparent GPU acceleration, multi-GPU support, and advanced CUDA features like stream management and peer-to-peer access without requiring low-level CUDA programming.
A CUDA backend for Torch7
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Maintains API consistency with Torch CPU tensors, allowing easy porting of existing code—for example, torch.CudaTensor behaves like torch.FloatTensor with minimal changes.
Provides robust device management (e.g., cutorch.setDevice) and a caching allocator (THC_CACHING_ALLOCATOR) to optimize performance across multiple GPUs and reduce synchronization overhead.
Exposes low-level controls like stream management and peer-to-peer access for fine-tuned parallel execution, essential for high-performance computing workloads.
Offers precise random number generator control per GPU (e.g., cutorch.seedAll), ensuring reproducibility in machine learning experiments.
Non-float CUDA tensor types (e.g., CudaDoubleTensor) have limited functionality, primarily restricted to copying and basic operations, as admitted in the README.
Stream management functions are marked as dangerous for users and require careful synchronization to avoid errors, increasing development complexity.
Tied exclusively to CUDA and NVIDIA GPUs, limiting portability to other hardware like AMD or integrated graphics.
API changes between versions (e.g., return types for operators) can break compatibility, requiring manual updates as noted in the README's versioning table.