A massively parallel library for training self-organizing maps on multicore CPUs, GPUs, and clusters with support for dense and sparse data.
Somoclu is a massively parallel library for training self-organizing maps (SOMs), which are unsupervised neural networks used for clustering, visualization, and dimensionality reduction of high-dimensional data. It solves the problem of slow SOM training by parallelizing computations across multicore CPUs, GPUs, and distributed clusters, supporting both dense and sparse data formats.
Data scientists, researchers, and machine learning practitioners working with large datasets who need efficient SOM training for tasks like exploratory data analysis, feature reduction, or pattern discovery in text mining and other domains.
Developers choose Somoclu for its exceptional speed and scalability, leveraging OpenMP, CUDA, and MPI to handle large maps and datasets that would be infeasible with sequential implementations. Its multi-language interfaces and support for sparse data make it versatile for various research and production workflows.
Massively parallel self-organizing maps: accelerate training on multicore CPUs, GPUs, and clusters
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Exploits OpenMP for multicore CPUs, CUDA for GPU acceleration, and MPI for cluster computing, drastically reducing training time for large-scale datasets as highlighted in the parallelization features.
Runs on Linux, macOS, and Windows with interfaces for Python, R, Julia, and MATLAB, enabling easy integration into diverse data science workflows as per the README's interface list.
Includes a specialized sparse kernel for text mining and high-dimensional sparse vectors, handling efficient training where data is mostly zeros, a key feature mentioned for vector spaces.
Capable of training maps with hundreds of thousands of neurons, supporting detailed representations of complex datasets as noted in the large-scale maps feature.
GPU and CPU kernels can produce different maps due to single-precision floats and non-sequential reduction in GPU, acknowledged in the README as a known issue that requires awareness.
MPI and sparse kernel support are not available through the Python, R, Julia, and MATLAB interfaces, restricting advanced parallel and sparse data use to command-line only.
On macOS, GPU support requires specific compilers or conda-forge; on Windows, missing DLLs like vcomp90.dll can cause errors, adding setup hurdles as detailed in the installation notes.