Hybrid MPI/OpenMP jobs under MOAB/Torque


MPI and OpenMP can be combined to write programs that exploit the possibility given by multi-core SMP machines, and the possibility of using the Message Passing Interface based communications between nodes on the cluster.

Special care must be taken to write the job script, for this kind of problem.

Typically, one would want to run a different MPI process on each node, and then a certain number of openMP threads on each node.

An example script is the following:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- job name --
#PBS -N Hybrid_MPI_openMP
# -- email me at the beginning (b) and end (e) of the execution --
#PBS -m be
# -- My email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time): hh:mm:ss --
#PBS -l walltime=04:00:00
# -- parallel environment requests --
#PBS -l nodes=2:ppn=3
# -- specify you want want DIFFERENT NODES --
#PBS -W x=nmatchpolicy=exactnode
# -- end of PBS options --

# -- change to working directory --
cd $PBS_O_WORKDIR
# -- load compiler (if needed) and the OpenMPI module --
#module load gcc 
module load mpi/gcc

# -- OpenMP environment variables --
OMP_NUM_THREADS=$PBS_NUM_PPN
export OMP_NUM_THREADS

# -- program invocation here -- 
mpirun -npernode 1 your_hybrid_program

This program asks for a total of 6 cores, three on each of two nodes. On each of these nodes it will run 1 MPI process, that will in turn use 3 OpenMP threads. We want the threads to run on two different nodes, and not all on a single node (this may not be important in general, but in same case it can be crucial). Then we want that only 1 MPI process runs on each node.
This is realized in this way.

  • First, reserve the cores:

#PBS -l nodes=2:ppn=3

  • Force the resource manager to use DIFFERENT NODES:

#PBS -W x=nmatchpolicy=exactnode

  • Set the number of threads to be used in each of the OpenMP blocks:

OMP_NUM_THREADS=$PBS_NUM_PPN;
export OMP_NUM_THREADS

Note: the number of threads can also be set through OpenMP runtime library routines, and it can be different for the different MPI processes. In this case be sure to have enough cores reserved on each node.

  • Load the OpenMPI library

module load mpi/gcc

  • Run the program through the MPI wrapper, specifying that only 1 MPI process per node should be started:

mpirun -npernode 1 your_hybrid_program

 

Notes:

  1. This is tested with OpenMPI, it could not work with other MPI implementations.
  2. The number of openMP threads is set using the environment variable $PBS_NUM_PPN, that is automatically set to the value of ppn. This can be not ideal if you want to run more that one MPI process per node, for example.
  3. It is possible to run more that one MPI process per node, but you have to midify the script accordingly.
  4. If you need to specify other OpenMP special environment variable, follow the same syntax as for OMP_NUM_THREADS
  5. You can of course add other MPI options.