MATLAB Jobs under LSF


In order to be able to run MATLAB jobs under the batch system, you have to be aware of the fact that you cannot make use of graphics, i.e. the desktop GUI, plots in a GUI, etc.

That means, you should be able to execute your MATLAB code from the command line prompt first. Assuming that the MATLAB script is called my_matlab.m, you should be able to run it successfully typing in a terminal

matlab -batch my_matlab

NOTE:

  • There is no .m extension on the command line!
  • The old syntax ‘matlab -nodisplay -r my_matlab‘ is deprecated – use the modern and more robust ‘-batch …‘ instead!
  • Do not use the syntax ‘matlab -nodisplay … < my_matlab.m‘ – this prevents runtime optimizations to be applied!

Then one can run MATLAB on the cluster, in serial or parallel.

       NOTE:

  • There are several matlab versions installed on the cluster. The most recent ones are available as modules.
    If you want to load a version that is different from the default one, that is usually not the latest, check which versions are available with the command

    module avail matlab

    Then load the version you want, for example

    module load matlab/R2021a

    Then the matlab command with point to this version

Serial Run

In a serial run you have to reserve one single core. Assuming that your MATLAB script is called my_serial_matlab.m, a basic script could look like the following:

#!/bin/sh
# embedded options to bsub - start with #BSUB
# -- our name ---
#BSUB -J MySerialMatlab
# -- choose queue --
#BSUB -q hpc
# -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- Notify me by email when execution begins --
#BSUB -B
# -- Notify me by email when execution ends   --
#BSUB -N
# -- email address -- 
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- Output File --
#BSUB -o Output_%J.txt
# -- Error File --
#BSUB -e Error_%J.txt
# -- estimated wall clock time (execution time): hh:mm -- 
#BSUB -W 02:10 
# -- Number of cores requested -- 
#BSUB -n 1 
# -- end of LSF options -- 

# -- commands you want to execute -- 
# 
# If you want a specific matlab module remember to load it
# Example:
# module load matlab/R2021a
matlab -batch my_serial_matlab > MySerialMatlabOut

The option #BSUB -n 1 specifies that you reserve one single core on one node.
If you omit the redirection to outputfile (MyMatlabOut), the output will be put into the standard output log file, which is called Output_jobid.txt, since the string %J is expanded at runtime to the job id, e.g. for job id 12345: Output_12345.txt. Error messages will go to Error_12345.txt. Note: for some older version of MATLAB, standard output may be scrambled and might miss lines you would expect. To catch all information, use ‘-logfile’ option instead.

For more information on batch jobs, see the page on LSF jobs.

Parallel Run

MATLAB can also run in parallel, both in a shared memory and in a distributed memory environment. If you want to run MATLAB on a single node, like on your personal computer, you have to follow the instructions for the shared memory script. This limits the number of workers that you can define in your MATLAB session to the number of cores available on a single node. Currently in the general HPC cluster we have 8 core and 20 core nodes.

Shared Memory Script

Assuming that your script is called my_shared_matlab.m , your script could look like this:

#!/bin/sh
# embedded options to bsub - start with #BSUB
# -- our name ---
#BSUB -J MySharedMatlab
# -- choose queue --
#BSUB -q hpc
# -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- Notify me by email when execution begins --
#BSUB -B
# -- Notify me by email when execution ends   --
#BSUB -N
# -- email address -- 
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- Output File --
#BSUB -o Output_%J.txt
# -- Error File --
#BSUB -e Error_%J.txt
# -- estimated wall clock time (execution time): hh:mm -- 
#BSUB -W 04:00 
# -- Number of cores requested -- 
#BSUB -n 4
# -- Specify the distribution of the cores: on a single node --
#BSUB -R "span[hosts=1]"
# -- end of LSF options -- 

# -- commands you want to execute -- 
# 
# If you want a specific matlab module remember to load it
# Example:
# module load matlab/R2021a
matlab -batch my_shared_matlab > MySharedMatlabOut

The option #BSUB -n 4 specifies that you reserve 4 cores, and the line #BSUB -R "span[hosts=1]" specifies that these cores need to be on the same node. This means that you are reserving 4 cores for your MATLAB job, and you should not use more than 4 workers!

NOTE:

  • If you use more workers inside matlab than cores that you have reserved, MATLAB will only run slowly! So use at most as many workers than the cores you asked for.
  • You can get the number of cores reserved from inside MATLAB, and assign them to a variable:
    nw=str2num(getenv('LSB_DJOB_NUMPROC'));

Distributed Memory Script

MATLAB by default does not know anything about the topology of the cluster, and so it cannot run processes across multiple nodes. If you need to use more than the cores available on a single node, you have to use the MATLAB Parallel Server (formerly: MDCS), for which DTU has some licenses. So, first you have to load the corresponding profile in MATLAB (the instructions are at this page), and then prepare a script like the following. Here we assume that your script is called my_distributed_matlab.m.

#!/bin/sh
# embedded options to bsub - start with #BSUB
# -- our name ---
#BSUB -J MyDistributedMatlab
# -- choose queue --
#BSUB -q hpc
# -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- Notify me by email when execution begins --
#BSUB -B
# -- Notify me by email when execution ends   --
#BSUB -N
# -- email address -- 
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- Output File --
#BSUB -o Output_%J.txt
# -- Error File --
#BSUB -e Error_%J.txt
# -- estimated wall clock time (execution time): dd:hh:mm:ss -- 
#BSUB -W 10:00 
# -- Number of cores requested -- 
#BSUB -n 1
# -- end of LSF options --
 
# -- commands you want to execute -- 
# 
# If you want a specific matlab module remember to load it
# Example:
# module load matlab/R2021a
matlab -batch my_distributed_matlab > MyDistributedMatlabOut

The option #BSUB -n 1 specifies that you reserve 1 core only. When you use the cluster profile, and open a parallel pool, MATLAB will automatically create a new script for you, asking for a number of cores equal to the number of workers. In your script, you have to open a pool using the cluster profile. For example, at the beginning of your my_distributed_matlab.m file you can have some lines as the following.

clust=parcluster(dccClusterProfile());    % load the default cluster profile
numw=32;    % Exactly the number of nodes times the number of processors per cores requested
parpool(clust, numW);

% here is the rest of your matlab script
%
%

MATLAB will then submit a job to the scheduler using the settings that you have saved in your default cluster profile for that specific MATLAB version, last time that you saved them (for more details, see the MATLAB Parallel Server configuration page)
You can change then some submission parameters settings, to better match your need, as explained in the same page. Have a look at the LSF batch job page, for more information.

NOTE:

  • Even in this case, if you use more workers inside MATLAB than cores that you reserve, MATLAB will only run slowly! So use at most as many workers than the cores you asked for.
  • If your original job-script, the one that launches MATLAB, is killed, also the child job, submitted automatically by MATLAB will be killed. So please select in your original job a walltime that is larger that the one specified inside MATLAB. Remember that if the cluster is busy, it could take some time for the job submitted via MATLAB Parallel Server, to start.