OpenMP (Open Multi-Processing)¶

Introduction¶

OpenMP is an Application Programming Interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.

OpenMP uses a model where the bulk of the code runs on a master thread. When encountering a parallelizable section, it spawns a specified number of additional threads and divides the work among them. Once the section is completed, the spawned threads are put to rest, and the master thread continues until the next parallelizable section.

Preparing your Environment¶

Before you can compile and run an OpenMP program, you need to load the appropriate environment module. Depending on whether you want to use GNU or Intel compilers, you may need to load different modules:

For GNU compilers:

module load GCC

For Intel compilers (for Intel OneAPI):

module load intel-compilers

For Intel compilers (classical compilers)

module load iccifort

Use module list to check which modules are currently loaded, and module avail to see all available modules. More information on environment modules can be found here.

Compiling OpenMP Programs¶

Compiling an OpenMP program is similar to compiling any other program, but with the addition of a flag to enable OpenMP directives. Depending on the compiler, the flag will differ. Here's an example of how to compile an OpenMP program in different languages:

GNU CompilersIntel Compilers

# For C
gcc -fopenmp -o myprog myprog.c

# For C++
g++ -fopenmp -o myprog myprog.cxx

# For Fortran
gfortran -fopenmp -o myprog myprog.f90

# For C
icc -qopenmp -o myprog myprog.c

# For C++
icpc -qopenmp -o myprog myprog.cxx

# For Fortran
ifort -qopenmp -o myprog myprog.f90

These commands will compile the source code file myprog.c/myprog.cxx/myprog.f90 and create an executable file myprog. The -fopenmp or -qopenmp option enables the compiler to process the OpenMP compiler directives.

Example OpenMP Program¶

Here is a simple OpenMP program that creates a parallel region and prints out the thread number of each process and the total number of threads:

CC++Fortran

#include <omp.h>   // Include the OpenMP library
#include <stdio.h> // Include the standard I/O library

int main() {
    // Begin the parallel region with a pragma directive
    #pragma omp parallel
    {
        // Get the thread number
        int tid = omp_get_thread_num();

        // Print a message from this thread
        printf("Hello from thread %d\n", tid);

        // Only master thread does this
        if (tid == 0) {
            int num_threads = omp_get_num_threads();
            printf("Number of threads = %d\n", num_threads);
        }
    } // All threads join master thread and terminate
    return 0;
}

#include <omp.h>   // Include the OpenMP library
#include <iostream> // Include the standard I/O library

int main() {
    // Begin the parallel region with pragma directive
#pragma omp parallel
{
    // Get the thread number
 int tid = omp_get_thread_num();

    // Print a message from this thread
 std::cout << "Hello from thread " << tid << std::endl;

    // Only master thread does this
    if (tid == 0) {
        int num_threads = omp_get_num_threads();
        std::cout << "Number of threads = " << num_threads << std::endl;
    }
} // All threads join master thread and terminate

return 0;

program hello
use omp_lib

! Begin the parallel region with a directive
!$OMP PARALLEL PRIVATE(tid)

! Get the thread number
tid = omp_get_thread_num()

! Print a message from this thread
print*, 'Hello from thread ', tid

! Only master thread does this
if (tid == 0) then
    num_threads = omp_get_num_threads()
    print*, 'Number of threads = ', num_threads
end if

!$OMP END PARALLEL
end program hello

Running OpenMP Programs¶

Running an OpenMP program is straightforward. It does not require a specific command or runner like MPI programs. Instead, you can run it like any other program:

./myprog

However, you can control the number of threads used in the parallel region by setting the OMP_NUM_THREADS environment variable before running the program:

export OMP_NUM_THREADS=4
./myprog

This will run the program with 4 threads. The number can be adjusted to suit the specific requirements of the program and the capabilities of the machine.

Debugging OpenMP Programs¶

Debugging parallel programs can be challenging, but there are tools and techniques to help. For instance, many debugging tools that work with regular programs can also be used with OpenMP programs. The GNU Debugger (GDB) is one such tool. It can be used to debug issues related to threading and synchronization.

You can also use print statements to trace the execution of the program, though this can be difficult to interpret in a parallel program because output from different threads can be intermingled.

Performance Considerations¶

When developing with OpenMP, it's important to consider the performance implications of your decisions. Here are a few things to keep in mind:

Overhead: The creation and synchronization of threads have an associated cost. If the work being done inside a parallel region is too small, this overhead can dominate the actual computation time.
Load Balance: Ideally, all threads should finish their work at the same time to make full use of the system's resources. If one thread has much more work than others, it will keep the CPU busy while the other threads sit idle.
False Sharing: Even though OpenMP manages the memory model, it's essential to be aware of false sharing, a situation where multiple threads on different cores modify variables that reside on the same cache line. This can lead to significant slowdowns.
Nested Parallelism: OpenMP supports nested parallel regions, where a thread in a parallel region can, in turn, create its own parallel region. However, this can sometimes lead to performance degradation if not managed correctly.

Parallel programming is not a magic bullet that automatically makes programs faster. It requires careful design and consideration of the underlying hardware, the problem you're solving, and the characteristics of your code. Often, it might be beneficial to start with a sequential program, identify the bottlenecks using profiling tools, and then apply parallel programming techniques to those parts of the code that will benefit the most.

Integration with SLURM¶

If you're running your OpenMP program in a cluster managed by SLURM, you need to consider how SLURM manages resources. By default, SLURM assigns only one CPU to each job. If you're using OpenMP, your job can likely benefit from more CPUs.

Here's an example of a SLURM script that requests multiple CPUs for an OpenMP job:

#!/bin/bash
#SBATCH --job-name=openmp_job     # Job name
#SBATCH --output=openmp_job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH --error=openmp_job.%j.err  # Name of stderr output file (%j expands to jobId)
#SBATCH --cpus-per-task=4         # Number of CPUs per task
#SBATCH --nodes=1
#SBATCH --mem=10gb                # Total memory
#SBATCH --time=01:00:00           # Time limit hrs:min:sec

# Load the appropriate module
module load GCC

# Set the number of threads to the number of CPUs per task
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}

# Run the OpenMP program
./myprog

In this script, #SBATCH --cpus-per-task=4 requests four CPUs for the job. The export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} line sets the number of OpenMP threads to match the number of CPUs allocated to the job. This is a common best practice to ensure that you're making full use of the resources allocated to your job.