COMSOL jobs under MOAB/Torque


Running comsol from the command line

Once a comsol model is prepared (meshing included), and saved as a .mph file, you can run it from the command line, i.e. without the program trying to open the comsol Graphical User Interface.This requires the batch flag in the command line. For example

comsol batch -h

shows the command line options available.
To run the job from the command line, the most basic complete command line is:

comsol batch -inputfile wrench.mph -outputfile wrench_out.mph

The flags -inputfile and -outputfile followed by the respective file names specifies the model and the output file desired.

It is also possible to specify the name of the file where the Comsol log will be saved, with the flag -batchlog followed by a filename.

Note 1: comsol by default tries to use all the processors and cores on the machine. This is almost never what one wants on a multiple-users-on-a-single-node-cluster. So it is always better to specify the number of cores/processors the program should use.

Note 2: Comsol writes some temporary files, that are removed after the run, if it is successful. However, if it is not, it is likely that these files are not removed at all, filling up the disk. The default location of these files is the /tmp directory on linux machines. On the cluster, this is a local directory, that is not directly accessible by the user. So it is recommended to specify a different location, using the flag -tmpdir followed by the name of a directory, e.g.

-tmpdir $HOME/comsoltmp

Comsol creates the directory if it does not exist. In this way, the user can check if the directory is empty after the run.

Comsol also writes some files for recovery in a temporary directory. These files can be large, thus consuming a lot of space in your home directory. You can specify the path to a directory explicitly, with the option -recoverydir:

-recoverydir $HOME/comsolrecovery

Note 3: If you already have a scratch-directory, then use it as your tmpdir and recoverydir.

Note 4: the command comsol points to the latest version installed on the cluster. If you need an older version, look for the corresponding command. For example, to use comsol Version 4.4, the command is comsol44.

A batch job file for the cluster

To run a comsol job on the cluster non-interactively, you need to write a batch script and then call the program as in the previous section. In the following examples we will use a tutorial case, the simulation of a wrench, and comsol 4.4. We will discuss here only the batch job options relevant for comsol. For all the others have a look at this page.

Running a serial comsol job

To run a serial job (single processor), a basic script is

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- job name ---
#PBS -N Com_wrench_serial
# -- email me at the beginning (b) and end (e) of the execution --
#PBS -m be
# -- My email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time) --
#PBS -l walltime=4:00:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=1
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- program invocation here --
comsol -np 1 batch -inputfile wrench.mph -outputfile wrench_out.mph -tmpdir $HOME/comsoltmp

Notice that we request a single core

#PBS -l nodes=1:ppn=1

and we explicitly specify that we need a single core in the comsol command line

comsol -np 1 batch ...

Without this option, comsol would try to use more than one core anyway, but the the scheduler restricts the number of cores (using linux cpusets), and comsol is not aware of that.

Running a shared-memory comsol job

Comsol by default uses shared-memory parallelism. The correct way to specify it is as follows:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- job name ---
#PBS -N Com_wrench_par_shared
# -- email me at the beginning (b) and end (e) of the execution --
#PBS -m ea
# -- My email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time) --
#PBS -l walltime=4:00:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=4
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- program invocation here --
comsol -np $PBS_NUM_PPN batch -inputfile wrench.mph -outputfile wrench_out.mph -tmpdir $HOME/comsoltmp

The script asks for 4 processors on a single node

#PBS -l nodes=1:ppn=4

and then uses the $PBS_NUM_PPN environment variable that is automatically set to the number of ppn reserved by the scheduler. In this way there is no risk of a mismatch.

Note 4: the Comsol -np is used only to specify the processes that run on a single node ONLY. If you need to run comsol on more than one node, have a look at the following examples. In doubt, ask us by writing at support@hpc.dtu.dk.

 

Running a MPI comsol job

Comsol can also make use of the distributed parallelism, using the Intel-MPI library. So the script has to include the loading of the libraries, a special “fix” necessary to run comsol across multiple nodes on the DTU HPC cluster, and then the correct specifications of the number of nodes.

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- job name ---
#PBS -N Com_wrench_par_mpi
# -- email me at the beginning (b) and end (e) of the execution --
#PBS -m ea
# -- My email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time) --
#PBS -l walltime=4:00:00
# -- parallel environment requests --
#PBS -l nodes=2:ppn=2
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- load module(s) --
module load intelmpifix
source IMPI_PBS_support

# -- program invocation here --
comsol -nn $PBS_NP batch -inputfile wrench.mph -outputfile wrench_out.mph -tmpdir $HOME/comsoltmp

The scripts reserves 4 cores (2 on each of 2 nodes)

#PBS -l nodes=2:ppn=2

loads the necessary module and source a necessary file

module load intelmpifix
source IMPI_PBS_support

and then comsol is invoked with the correct option (-nn). Even in this case, to enforce consistency, the number of cores is specified using an environment variable ($PBS_NP) that is set to the total number of cores requested.

Note 5: you can explicitly ask the scheduler to assign the job to the exact number of nodes (2, in this case) by adding the line

#PBS -W x=nmatchpolicy=exactnode

(see our FAQ page).

Running a hybrid MPI-shared-memory comsol job

In this case you must combine the options.
A simple script is:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- job name ---
#PBS -N Com_wrench_par_mpi_hyb_2x2
# -- email me at the beginning (b) and end (e) of the execution --
#PBS -m ea
# -- My email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time) --
#PBS -l walltime=4:00:00
# -- parallel environment requests --
#PBS -l nodes=2:ppn=2
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR
# load modules
module load intelmpifix
source IMPI_PBS_support

# -- program invocation here --
comsol -nn $PBS_NUM_NODES -np $PBS_NUM_PPN batch -inputfile wrench.mph -outputfile wrench_out.mph \
-tmpdir $HOME/comsoltmp

In this case, we use both the mpi (-nn) and the shared memory (-np) processor flags, and set them to the value of two environment variables:
$PBS_NUM_NODES, that is set to the number of nodes requested, and
$PBS_NUM_PPN, that is set to the number of processors per node requested. This will start an MPI process on each node, and each of them will run two shared-memory processes.
More complex parallelization patterns can of course be specified, along these lines.

General Comment: The choice of using a serial, shared memory, MPI or hybrid run depends on the specific characteristics of the model. And this also holds for the performance that can be expected.