Run MPI programs on Hadoop YARN clusters using MPICH-3.1.2 and SSH for distributed computing.
MPICH-yarn is an application that enables running MPI (Message Passing Interface) programs on Hadoop YARN clusters. It allows high-performance computing applications to leverage YARN's resource management and scheduling capabilities. The project solves the problem of integrating MPI-based distributed computing with big data infrastructure.
Researchers, data scientists, and engineers who need to run MPI applications on existing Hadoop YARN clusters for distributed computing tasks.
Developers choose MPICH-yarn because it provides a bridge between MPI and YARN ecosystems, allowing reuse of Hadoop infrastructure for MPI workloads without requiring separate cluster setups.
Running MPICH2 on Yarn
Open-Awesome is built by the community, for the community. Submit a project, suggest an awesome list, or help improve the catalog on GitHub.
Enables MPI applications to run as YARN jobs, leveraging YARN's scheduling and allocation, as detailed in the yarn-site.xml configuration examples.
Uses MPICH-3.1.2, ensuring compatibility with MPI standards and allowing reuse of existing MPI codebases without modification.
Supports storing temporary files and I/O data in HDFS, facilitating big data workflows, as shown in job submission examples like PLDA.
Creates and configures RSA key pairs for password-less SSH login across nodes, simplifying inter-node communication setup without manual key management.
Requires precise YARN and MPI configuration, with the README admitting 'many troubles' and providing sample configs only for older versions like Hadoop 2.4.1.
Relies on MPICH-3.1.2 and recommends Ubuntu 12.04, which are outdated and may lack modern features, security updates, or compatibility with newer systems.
Uses SSH as the communication daemon, which can introduce latency and performance overhead compared to high-performance MPI networks like InfiniBand.