Matlab Jobs under MOAB / Torque


In order to be able to run Matlab jobs under the batch system, you have to be aware of the fact that you cannot make use of graphics, i.e. the desktop GUI, plots in a GUI, etc.

That means, you should be able to execute your Matlab code from the command line prompt first. Assuming that the matlab script is called my_matlab.m, you should be able to run it by typing in a terminal

matlab -nodisplay -r my_matlab

NOTE:

  • There is no .m extension on the command line!

Then one can run matlab on the cluster, in serial or parallel.

Serial Run

In a serial run you have to reserve one single core. Assuming that your matlab script is called my_serial_matlab.m, a basic script could look like the following:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- our name ---
#PBS -N MySerialMatlab
# -- choose queue --
#PBS -q hpc
# -- Notify me by email when execution begins (b) and ends (e) --
#PBS -m be
# -- email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time): hh:mm:ss --
#PBS -l walltime=00:10:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=1
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- commands you want to execute --
#	

matlab -nodisplay -r my_serial_matlab -logfile MySerialMatlabOut

The option #PBS -l nodes=1:ppn=1 specifies that you reserve one single core on one node.
If you omit the redirection to outputfile (MyMatlabOut), the output will be put into the standard output log file, which is named after the job name and the job id, e.g. for job id 12345: MyMatlab.o12345. Error messages will go to MyMatlab.e12345. Note: due to bug in Matlab, standard output may be scrambled and might miss lines you would expect. To catch all information, the ‘-logfile’ option is preferable.

For more information on batch jobs, see the page on MOAB/Torque jobs.

 

Parallel Run

Matlab can also run in parallel, both in a shared memory and in a distributed memory environment. If you want to run Matlab on a single node, like on your personal computer, you have to follow the instructions for the shared memory script. This limits the number of workers that you can define in your Matlab session to the number of cores available on a single node. Currently in the general HPC cluster we have 8 core and 20 core nodes. If you need to use more than 20 cores, or anyway need to use more than one single node, you have to use follow the instructions for the distributed memory script. In this case, one has to use the MATLAB Distributed Computing Server (MDCS).

IMPORTANT:

DTU has only a limited number of MDCS licenses, and each core use a separate license. So:

  • If you decide to run a parallel program, please benchmark your Matlab program to check if there is really a substantial advantage in using more cores. Doubling the number of cores for a speedup of the 10% is wasting resources.
  • If you need less than 20 workers, DO NOT use the MDCS profile, but use the default Matlab local profile. In that way you do not waste precious licenses, that someone else could need.

 

 

Shared Memory Script

Assuming that your script is called my_shared_matlab.m , your script could look like that:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- our name ---
#PBS -N MySharedMatlab
# -- choose queue --
#PBS -q hpc
# -- Notify me by email when execution begins (b) and ends (e) --
#PBS -m be
# -- email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time): hh:mm:ss --
#PBS -l walltime=00:10:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=4
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- commands you want to execute --
#	

matlab -nodisplay -r my_shared_matlab -logfile MySharedMatlabOut

The option #PBS -l nodes=1:ppn=4 specifies that you reserve 4 cores on one node. This means that you are reserving 4 cores for your matlab job, and you should not use more than 4 workers!

NOTE:

  • If you use more workers inside matlab than cores that you have reserved, matlab will only run slowly! So use at most as many workers than the cores you asked for.
  • You can get the number of cores reserved from inside matlab, and assign them to a variable:
    nw=str2num(getenv('PBS_NUM_PPN'));

Distributed Memory Script

Matlab by default does not know anything about the topology of the cluster, and so it cannot run processes across multiple nodes. If you need to use more than the cores available on a single node, yu had to use the MATLAB Distributed Computing Server (MDCS), for which DTU has some licenses. So, first you have to load the corresponding profile in Matlab (the instructions are here), and then prepare a script like the following. Here we assume that your script is called my_distributed_matlab.m.

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- our name ---
#PBS -N MyDistributedMatlab
# -- choose queue --
#PBS -q hpc
# -- Notify me by email when execution begins (b) and ends (e) --
#PBS -m be
# -- email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time): hh:mm:ss --
#PBS -l walltime=10:00:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=1
# -- end of PBS options --

# -- change to working directory
cd $PBS_O_WORKDIR

# -- commands you want to execute --
#	

matlab -nodisplay -r my_distributed_matlab -logfile MyDistributedMatlabOut

The option #PBS -l nodes=1:ppn=1 specifies that you reserve  1 core only. When you use the MDCS profile, and open a parallel pool, matlab will automatically create a new script for you, asking for a number of cores equal to the number of workers. In your script, you have to open a pool using the MDCS profile. For example, at the beginning of your my_distributed_matlab.m file you can have some lines as the following.

clust=parcluster('DTUcluster');    % load the MDCS cluster profile
clust.SubmitArguments = '-q hpc -l nodes=4:ppn=8,walltime=08:00:00';    % Options for the job scheduler
numw=32;    % Exactly the number of nodes times the number of processors per cores requested
parpool(clust, numW);

% here is the rest of your matlab script
%
%

In this way, you help the scheduler to distribute the work across the cluster in a smarter way. In the same way you can add other submit arguments. Have a look at the batch job page, for more information.

NOTE:

  • Even in this case, if you use more workers inside matlab than cores that you reserve, matlab will only run slowly! So use at most as many workers than the cores you asked for.
  • If your original job-script, the one that launches matlab, is killed, also the child job, submitted by matlab will be killed. So please select in your original job a walltime that is larger that the one specified inside matlab. Remember that if the cluster is busy, it could take some time for the job submitted by matlab, to start.