OpenMP (Open Multi-Processing)¶
Introduction¶
OpenMP is an Application Programming Interface (API) that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran. It is a portable, scalable model that gives programmers a simple and flexible interface for developing parallel applications for platforms ranging from the standard desktop computer to the supercomputer.
OpenMP uses a model where the bulk of the code runs on a master thread. When encountering a parallelizable section, it spawns a specified number of additional threads and divides the work among them. Once the section is completed, the spawned threads are put to rest, and the master thread continues until the next parallelizable section.
Preparing your Environment¶
Before you can compile and run an OpenMP program, you need to load the appropriate environment module. Depending on whether you want to use GNU or Intel compilers, you may need to load different modules:
For GNU compilers:
module load GCC
For Intel compilers (for Intel OneAPI):
module load intel-compilers
For Intel compilers (classical compilers)
module load iccifort
Use module list
to check which modules are currently loaded, and module avail
to see all available modules. More information on environment modules can be found here.
Compiling OpenMP Programs¶
Compiling an OpenMP program is similar to compiling any other program, but with the addition of a flag to enable OpenMP directives. Depending on the compiler, the flag will differ. Here's an example of how to compile an OpenMP program in different languages:
# For C
gcc -fopenmp -o myprog myprog.c
# For C++
g++ -fopenmp -o myprog myprog.cxx
# For Fortran
gfortran -fopenmp -o myprog myprog.f90
# For C
icc -qopenmp -o myprog myprog.c
# For C++
icpc -qopenmp -o myprog myprog.cxx
# For Fortran
ifort -qopenmp -o myprog myprog.f90
These commands will compile the source code file myprog.c/myprog.cxx/myprog.f90
and create an executable file myprog
. The -fopenmp
or -qopenmp
option enables the compiler to process the OpenMP compiler directives.
Example OpenMP Program¶
Here is a simple OpenMP program that creates a parallel region and prints out the thread number of each process and the total number of threads:
#include <omp.h> // Include the OpenMP library
#include <stdio.h> // Include the standard I/O library
int main() {
// Begin the parallel region with a pragma directive
#pragma omp parallel
{
// Get the thread number
int tid = omp_get_thread_num();
// Print a message from this thread
printf("Hello from thread %d\n", tid);
// Only master thread does this
if (tid == 0) {
int num_threads = omp_get_num_threads();
printf("Number of threads = %d\n", num_threads);
}
} // All threads join master thread and terminate
return 0;
}
#include <omp.h> // Include the OpenMP library
#include <iostream> // Include the standard I/O library
int main() {
// Begin the parallel region with pragma directive
#pragma omp parallel
{
// Get the thread number
int tid = omp_get_thread_num();
// Print a message from this thread
std::cout << "Hello from thread " << tid << std::endl;
// Only master thread does this
if (tid == 0) {
int num_threads = omp_get_num_threads();
std::cout << "Number of threads = " << num_threads << std::endl;
}
} // All threads join master thread and terminate
return 0;
program hello
use omp_lib
! Begin the parallel region with a directive
!$OMP PARALLEL PRIVATE(tid)
! Get the thread number
tid = omp_get_thread_num()
! Print a message from this thread
print*, 'Hello from thread ', tid
! Only master thread does this
if (tid == 0) then
num_threads = omp_get_num_threads()
print*, 'Number of threads = ', num_threads
end if
!$OMP END PARALLEL
end program hello
Running OpenMP Programs¶
Running an OpenMP program is straightforward. It does not require a specific command or runner like MPI programs. Instead, you can run it like any other program:
./myprog
However, you can control the number of threads used in the parallel region by setting the OMP_NUM_THREADS
environment variable before running the program:
export OMP_NUM_THREADS=4
./myprog
This will run the program with 4 threads. The number can be adjusted to suit the specific requirements of the program and the capabilities of the machine.
Debugging OpenMP Programs¶
Debugging parallel programs can be challenging, but there are tools and techniques to help. For instance, many debugging tools that work with regular programs can also be used with OpenMP programs. The GNU Debugger (GDB) is one such tool. It can be used to debug issues related to threading and synchronization.
You can also use print statements to trace the execution of the program, though this can be difficult to interpret in a parallel program because output from different threads can be intermingled.
Performance Considerations¶
When developing with OpenMP, it's important to consider the performance implications of your decisions. Here are a few things to keep in mind:
-
Overhead: The creation and synchronization of threads have an associated cost. If the work being done inside a parallel region is too small, this overhead can dominate the actual computation time.
-
Load Balance: Ideally, all threads should finish their work at the same time to make full use of the system's resources. If one thread has much more work than others, it will keep the CPU busy while the other threads sit idle.
-
False Sharing: Even though OpenMP manages the memory model, it's essential to be aware of false sharing, a situation where multiple threads on different cores modify variables that reside on the same cache line. This can lead to significant slowdowns.
-
Nested Parallelism: OpenMP supports nested parallel regions, where a thread in a parallel region can, in turn, create its own parallel region. However, this can sometimes lead to performance degradation if not managed correctly.
Parallel programming is not a magic bullet that automatically makes programs faster. It requires careful design and consideration of the underlying hardware, the problem you're solving, and the characteristics of your code. Often, it might be beneficial to start with a sequential program, identify the bottlenecks using profiling tools, and then apply parallel programming techniques to those parts of the code that will benefit the most.
Integration with SLURM¶
If you're running your OpenMP program in a cluster managed by SLURM, you need to consider how SLURM manages resources. By default, SLURM assigns only one CPU to each job. If you're using OpenMP, your job can likely benefit from more CPUs.
Here's an example of a SLURM script that requests multiple CPUs for an OpenMP job:
#!/bin/bash
#SBATCH --job-name=openmp_job # Job name
#SBATCH --output=openmp_job.%j.out # Name of stdout output file (%j expands to jobId)
#SBATCH --error=openmp_job.%j.err # Name of stderr output file (%j expands to jobId)
#SBATCH --cpus-per-task=4 # Number of CPUs per task
#SBATCH --nodes=1
#SBATCH --mem=10gb # Total memory
#SBATCH --time=01:00:00 # Time limit hrs:min:sec
# Load the appropriate module
module load GCC
# Set the number of threads to the number of CPUs per task
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
# Run the OpenMP program
./myprog
In this script, #SBATCH --cpus-per-task=4
requests four CPUs for the job. The export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
line sets the number of OpenMP threads to match the number of CPUs allocated to the job. This is a common best practice to ensure that you're making full use of the resources allocated to your job.
Further Reading¶
To learn more about OpenMP, check out the official OpenMP website, which contains the full specification, resources, and a forum for asking questions. There are also many good tutorials and textbooks available.
- OpenMP Official Website: Contains the latest information, resources, and specifications about OpenMP.
- OpenMP API for C and C++: The official API specifications for using OpenMP with C and C++.
- Intel Developer Guide - OpenMP: Detailed documentation from Intel on OpenMP, includes tutorials and best practices.
- SLURM Workload Manager: Detailed documentation for the SLURM Workload Manager. This is essential if you're running your parallel programs on a cluster.
- GCC OpenMP Documentation: The GCC documentation for OpenMP. It provides useful details on the directives and functions provided by OpenMP.