MPI (Message Passing Interface)¶

Introduction¶

The Message Passing Interface (MPI) is a communication protocol for parallel programming. It is a standardized and portable message-passing standard designed by the MPI Forum to function on a wide variety of parallel computing architectures. MPI allows different processes running simultaneously on distributed memory systems to communicate with each other.

The basic philosophy behind MPI is that of distributed memory: the default assumption is that all processes have their own private memory. Processes communicate with each other exclusively through the MPI's communication operations.

MPI is not a language, but rather a library that can be used in multiple programming languages, such as C, C++, and Fortran. This library provides various functionalities, including point-to-point and collective communication routines, in both blocking and non-blocking modes, and with a variety of data types.

Preparing your Environment¶

In our platforms, software is accessible via environment modules using Lmod. This system helps manage different versions of software and keeps the environment clean and uncluttered. This is especially useful when different programs require different versions of libraries.

Before compiling and running an MPI program, you need to load the appropriate MPI module. We have both FOSS (Free and Open Source Software) (foss) and Intel's toolchains (intel) installed.

To load the appropriate module, use the module load command. For instance:

module load foss

Or for the Intel MPI:

module load intel

To check which modules are currently loaded, use module list. To see all available modules, use module avail.

For more detailed information on how to use environment modules, refer to our Environment Modules Guide.

Compiling MPI Programs¶

MPI provides compiler wrappers that automatically link the necessary MPI libraries. The table below shows the MPI compiler wrappers for C, C++, and Fortran for both Intel and OpenMPI.

	C	C++	Fortran
Intel	mpiicc	mpiicpc	mpiifort
OpenMPI	mpicc	mpicxx	mpif90

To compile your MPI program, you use the corresponding MPI compiler. For example, to compile a C/C++/Fortran program using the Intel toolchain:

CC++Fortran

mpiicc -o myprog myprog.c

mpiicpc -o myprog myprog.cxx

mpiifort -o myprog myprog.f90

This command uses the wrapped compilers to compile the source code file myprog.c/myprog.cxx/myprog.f90 and create an executable file myprog. The -o option is used to specify the name of the output file.

Example MPI Program¶

Here is a simple MPI program that prints out the rank of each process and the total number of processes:

CC++Fortran

#include <stdio.h>  // Include the standard input/output library
#include <mpi.h>    // Include the MPI library

int main(int argc, char** argv) {
    // Initialize the MPI environment. The two arguments to MPI_Init are not
    // currently used by MPI implementations, but are there in case future
    // implementations might need the arguments.
    MPI_Init(NULL, NULL);

    // Get the number of processes. MPI_COMM_WORLD is the default communicator
    // that includes all the processes that were started. The number of
    // processes is saved in the 'world_size' variable.
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process. The rank of a process in a communicator is
    // its ID within that communicator. Ranks are assigned starting from zero.
    // The rank is saved in the 'world_rank' variable.
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Print a hello message from the current process. The message includes
    // the rank of the process and the total number of processes.
    printf("Hello world from processor with rank %d out of %d processors\n", world_rank, world_size);

    // Finalize the MPI environment. No further MPI routines should be called
    // after this.
    MPI_Finalize();
    return 0;
}

#include <mpi.h>   // Include the MPI library
#include <iostream> // Include the I/O stream library

 int main(int argc, char** argv) {
    // Initialize the MPI environment
    MPI::Init(argc, argv);

    // Get the number of processes
    int world_size = MPI::COMM_WORLD.Get_size();

    // Get the rank of the process
    int world_rank = MPI::COMM_WORLD.Get_rank();

    // Print a hello message from the current process
    std::cout << "Hello world from processor with rank " 
              << world_rank 
              << " out of " 
              << world_size 
              << " processors\n";

    // Finalize the MPI environment. No further MPI routines should be called
    // after this.
    MPI::Finalize();

    return 0;
}

program hello
    use mpi   ! Use the MPI module.

    ! Declare variables.
    integer ( kind = 4 ) error   ! Variable to capture any error code.
    integer ( kind = 4 ) rank    ! Variable to capture the rank of each processor.
    integer ( kind = 4 ) size    ! Variable to capture the total number of processors.

    ! Initialize the MPI environment. The 'error' variable will capture any error code.
    call MPI_Init(error)

    ! Get the rank of the current processor in the communicator (MPI_COMM_WORLD).
    ! The rank will be stored in the 'rank' variable. The 'error' variable will capture any error code.
    call MPI_Comm_rank(MPI_COMM_WORLD, rank, error)

    ! Get the total number of processors in the communicator (MPI_COMM_WORLD).
    ! The total number will be stored in the 'size' variable. The 'error' variable will capture any error code.
    call MPI_Comm_size(MPI_COMM_WORLD, size, error)

    ! Print a hello message from the current processor.
    ! The message includes the rank of the current processor and the total number of processors.
    print *, 'Hello world from processor with rank ', rank, ' out of ', size, ' processors'

    ! Finalize the MPI environment. No further MPI routines should be called after this.
    call MPI_Finalize(error)
end program hello

To compile and run this program, you need to follow the steps described in the previous sections. Once compiled, you can execute it using mpirun or mpiexec:

mpirun -np 4 ./myprog

Here, -np 4 specifies the number of processes to be created. The output will be something like:

Hello world from processor with rank 0 out of 4 processors
Hello world from processor with rank 1 out of 4 processors
Hello world from processor with rank 2 out of 4 processors
Hello world from processor with rank 3 out of 4 processors

Each line is printed by a different process. The order of the output might differ each time you run the program because the order in which the processes run is not deterministic.

MPI Options¶

Different MPI implementations offer different options that can be used to control the execution of MPI programs.

IntelMPI¶

-np N: Specifies the total number of processes to run.
-ppn N: Specifies the number of processes per node.
-hosts H1,H2,...: Specifies the hosts on which to run.
-genvall: Passes all environment variables from the current shell to the launched application.
-genv name value: Sets an environment variable for the launched application.
-perhost N: Launches the same number of processes on each host.
-f hostfile: Specifies a hostfile that contains a list of hosts on which to run.

OpenMPI¶

-np N: Specifies the total number of processes to run.
-hostfile HOSTFILE: Specifies the hostfile that contains the list of hosts on which to run.
-host H1,H2,...: Specifies the hosts on which to run.
-map-by ppr:N:node: Launches N processes per node.
-bind-to core: Binds each MPI process to a core.
-x VAR: Exports the environment variable VAR to the launched application.

These options allow you to have a granular control over the execution of your MPI program, including the distribution of processes across nodes, and the setting of environment variables. The exact list of options and their meanings might vary slightly depending on the specific version of Intel MPI or Open MPI that you are using, so it's always a good idea to check the official documentation or the man pages (man mpirun or man mpiexec) for the most accurate and up-to-date information.

Integration with SLURM¶

SLURM (Simple Linux Utility for Resource Management) is a powerful tool for managing resources on a cluster. It can seamlessly integrate with MPI to manage and schedule jobs, making it easier to utilize a cluster's resources effectively.

A simple SLURM script to run an MPI program could look like this:

#!/bin/bash
#SBATCH --partition=regular       # Partition
#SBATCH --job-name=my_mpi_job     # Job name
#SBATCH --output=my_mpi_job.o%j   # Name of stdout output file
#SBATCH --error=my_mpi_job.e%j    # Name of stderr error file
#SBATCH --nodes=1                 # Total number of nodes requested
#SBATCH --ntasks-per-node=4       # Total number of mpi tasks per node requested
#SBATCH --mem=40gb                # Total amount of RAM memory per node
#SBATCH --time=01:30:00           # Run time (hh:mm:ss)

module load intel
mpirun -np ${SLURM_NTASKS} ./myprog

In this script, #SBATCH options are used to specify job parameters, such as the job name, the number of nodes, and the total run time. After loading the Intel module, the script runs the MPI program using mpirun. The number of processes launched (-np) is set to match the number of tasks specified in the SLURM job (${SLURM_NTASKS}).

As an alternative, you can use srun instead of mpirun or mpiexec to launch MPI programs. srun is a command provided by SLURM with better integration into the SLURM environment. Here's how to use srun in a SLURM script:

#!/bin/bash
#SBATCH --partition=regular       # Partition
#SBATCH --job-name=my_mpi_job     # Job name
#SBATCH --output=my_mpi_job.o%j   # Name of stdout output file
#SBATCH --error=my_mpi_job.e%j    # Name of stderr error file
#SBATCH --nodes=1                 # Total number of nodes requested
#SBATCH --ntasks-per-node=4       # Total number of mpi tasks per node requested
#SBATCH --mem=40gb                # Total amount of RAM memory per node
#SBATCH --time=01:30:00           # Run time (hh:mm:ss)

module load intel
srun ./myprog

In this version of the script, we simply call srun followed by the name of the program. SLURM already knows the number of tasks to launch from its parameters, so we don't need to specify -np.

MPI for Python¶

MPI for Python (mpi4py) is a Python package that provides bindings to the MPI standard, allowing you to write parallel and distributed programs using Python. It fully supports the MPI-3.1 standard and provides an object-oriented interface which closely follows MPI-2 C++ bindings.

Installing mpi4py¶

Before you can use mpi4py, you need to install it. If you're using a Python package manager like pip or conda, you can do this as follows:

pip install mpi4py

or

conda install -c anaconda mpi4py

Note that mpi4py requires a working MPI installation to function. Make sure to load the appropriate MPI module before installing mpi4py.

Using `mpi4py`¶

Here's an example of a simple mpi4py program that does the same thing as the previous MPI examples: printing out the rank of each process and the total number of processes.

from mpi4py import MPI

# Initialize the MPI environment
comm = MPI.COMM_WORLD

# Get the rank of the process
rank = comm.Get_rank()

# Get the number of processes
size = comm.Get_size()

# Print a hello message from the current process
print(f"Hello world from processor with rank {rank} out of {size} processors")

You can run this program in a similar way to the previous examples, but using the mpirun command with the Python interpreter:

mpirun -np 4 python myprog.py

This command will start four Python processes and execute myprog.py on each one.

Just like with C, C++, and Fortran, you can use mpi4py with SLURM. Here's an example of a SLURM script that runs a Python program using mpi4py:

#!/bin/bash
#SBATCH --partition=regular       # Partition
#SBATCH --job-name=my_mpi_job     # Job name
#SBATCH --output=my_mpi_job.o%j   # Name of stdout output file
#SBATCH --error=my_mpi_job.e%j    # Name of stderr error file
#SBATCH --nodes=1                 # Total number of nodes requested
#SBATCH --ntasks-per-node=4       # Total number of mpi tasks per node requested
#SBATCH --mem=40gb                # Total amount of RAM memory per node
#SBATCH --time=01:30:00           # Run time (hh:mm:ss)

module load intel
mpirun -np ${SLURM_NTASKS} python myprog.py

This script is very similar to the previous SLURM scripts, with the main difference being that it uses the Python interpreter to execute the MPI program.