Molpro jobs under LSF


Molpro is a Quantum Chemistry Software designed by H.-J. Werner and P. J. Knowles, and regularly maintained. It is installed on the cluster, both the serial and parallel versions.

NOTE:

  • Molpro is subject to license restrictions, so to be able to use it you need to be authorized by the license owner.
  • Molpro creates a considerable amount of temporary files. If you do not have a scratch directory already, please write support@hpc.dtu.dk and ask for one.

Serial Run

Molpro is run from the command line, and therefore it is quite easy to write a job script for it. An example for a serial script is the following:

### General options
### -- set the job Name --
#BSUB -J Molpro_Serial
### -- ask for number of cores (default: 1) --
#BSUB -n 1
### -- set walltime limit: hh:mm --
#BSUB -W 16:00 
### -- specify that we need 16GB of memory per core/slot -- 
#BSUB -R "rusage[mem=16GB]"
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
#BSUB -B
### -- send notification at completion--
#BSUB -N
### -- Specify the output and error file. %J is the job-id -- 
### -- -o and -e mean append, -oo and -eo mean overwrite -- 
#BSUB -o Output_%J.out
#BSUB -e Error_%J.err 

# Load the molpro module
module load molpro/2012.1.30-openmpi-3.1.5-gcc-7.5.0-i8

# Set environment variables
export MOLPRO_KEEPTEMP=1

#If you have a scratch directory, use it
#Set TMPDIR to point to your directory modifying the following line
#and uncomment it
#export TMPDIR=/work1/$USER/Molpro_TEMP_$LSB_JOBID/

# Tell Molpro to use TMPDIR as temporary directory. Uncomment if you have 
# correctly set TMPDIR
#export MOLPRO_OPTIONS="-d $TMPDIR"

# -- commands you want to execute --
molpro -m 2g my_molpro.inp
ret=$?

# -- remove the temporary files if the run was successful: Uncomment to use it
#if [ $ret -eq 0 ] ; then
# rm -rf $TMPDIR
#fi
exit $ret

You can find the explanation of the most common LSF options here.
The specific lines for molpro are only a few.
module load molpro/2012.1.30-openmpi-3.1.5-gcc-7.5.0-i8
Molpro is installed as a module, therefore it is necessary to load the relative module first.
Then a couple of environment variables is set.
export MOLPRO_KEEPTEMP=1 .
Molpro creates temporary files. Some of them are by default completely invisible to the user, and to some extent also to the operating system. In this way the user never knows exactly how much disk space the job needs, and this prevents the user to estimate correctly the resources necessary to run the job.
Setting this environment variable makes these files visible. The only drawback is that these files will never be removed. Therefore we added a command to remove them, if the run was successful.

# -- remove the temporary files if the run was successful
#if [ $ret -eq 0 ] ; then
#      rm -rf $TMPDIR
#fi

By default, Molpro uses /tmp as a temporary directory. This is a directory local to the node where the job runs, in the cluster, and there is not so much space. Please ask for a directory on one of the scratch filesystems, writing to support@hpc.dtu.dk, and use it.

The actual molpro call is
molpro -m 2g my_molpro.inp
-m 2g means that molpro will use 2 giga words. In “molpro-language”, a word is an 8 byte unit, so to get the the right amount of memory that the scheduler has to reserve, this number must be multiplied by 8: -m 2 gb means 16 GB.

Parallel Run

Molpro can in principle run in parallel across different nodes. Parallelization is implemented (and controlled) in two different ways. Have a look at the Molpro manual, or you can start for example from the online documentation.
To run Molpro in parallel you need to specify the keyword mpp or mppx.
From the Molpro documentation:

  • mpp means that different copies of the program execute a single task. The work is split across the cores, to get a reduced run time.
  • mppx means that different copies of the program run identical independent tasks. This is only implemented for numerical gradients and Hessians

It is up to the user to decide which kind of parallelism to select depending on the specific task. A job-script for a parallel run is shown here:

### General options
### -- set the job Name --
#BSUB -J Molpro_Parallel
### -- ask for number of cores (default: 1) --
#BSUB -n 4
### -- specify that the cores MUST BE on a single host! It's a SMP job! --
#BSUB -R "span[hosts=1]"
### -- set walltime limit: hh:mm --
#BSUB -W 16:00 
### -- specify that we need 4GB of memory per core/slot -- 
#BSUB -R "rusage[mem=16GB]"
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
#BSUB -B
### -- send notification at completion--
#BSUB -N
### -- Specify the output and error file. %J is the job-id -- 
### -- -o and -e mean append, -oo and -eo mean overwrite -- 
#BSUB -o Output_%J.out
#BSUB -e Error_%J.err 
# Load the molpro module
module load molpro/2012.1.30-openmpi-3.1.5-gcc-7.5.0-i8 

# Set environment variables
export MOLPRO_KEEPTEMP=1

#If you have a scratch directory, use it
#Set TMPDIR to point to your directory modifying the following line
#and uncomment it
#export TMPDIR=/work1/$USER/Molpro_TEMP_$LSB_JOBID/ # Tell Molpro to use TMPDIR as temporary directory. Uncomment if you have # correctly set TMPDIR #export MOLPRO_OPTIONS="-d $TMPDIR" # -- commands you want to execute -- molpro -m 2g --mpp my_molpro.inp ret=$? # -- remove the temporary files if the run was successful: Uncomment to use it #if [ $ret -eq 0 ] ; then # rm -rf $TMPDIR #fi exit $ret

The only differences with respect to the serial script, are:

  • The selection of the number of cores.
    #BSUB -n 4
    #BSUB -R "span[hosts=1]"
    

    that reserves for you 4 cores on a single node. If you need to run on more than one node, for example because you need a more memory than that available on a single node, the line could be

    #BSUB -n 8
    #BSUB -R "span[ptile=4]"
    

    to have 8 cores distributed on 2 different nodes. Refer to the LSF batch job page for more details. Please have a look at the Important Remarks section at the bottom of this page.

  • The memory specification.
    #BSUB -R "rusage[mem=16GB]"
    that reserves for you 16GB per core. This is because when running in parallel, the Molpro command line option -m 2g
    refers to each individual process (explanation here). So in this example you are asking 2g “words” = 16 GB of memory for each of the cores that you request, for a total of 64 GB.
  • The Molpro call.
    molpro -m 2g --mpp my_molpro.inp
    the only difference is the presence of the --mpp option. You do not need to specify the number of processes in the molpro command line, the program gets this information directly from the scheduler.

NOTE:

  • If you need to run the mppx version, just replace --mpp with --mppx.
  • If you do not specify anything, the default parallelism is mpp.
  • By default, when running in parallel Molpro dedicates one process per node to communication only. This behaviour can be modified with suitable command-line options. Instructions can be found in man pages, or for example here.

Important Remarks

    1. In general, the performance of a program is determined by the hardware (CPU architecture, frequency, memory), the I/O to the disk, and for parallel runs, by the cost of communication among the different processes. Molpro can be particularly heavy in I/O. The program creates a lot of temporary files, and constantly writes to and reads them. This is especially true in parallel runs, where all the processes read and write simultaneously to different files. This can easily saturate the bandwidth and/or create a lot of load to the storage filesystem, that becomes very slow. The effect can be so serious, that there is no advantage at all in using more than one core, because the costs in using 2 cores if larger than the gain in speed one can get. For this reason, you are advised to try to estimate the performance of your runs, before deciding to run in parallel. Try a test case on 1 core, 2 cores, 4 cores, and see if you get anything out of parallelism. You are welcome to write to us if you encounter any problem.
    2. By default Molpro put the temporary files in /tmp. This directory is on the local disk on the machine where the job is run. There are two main issues with that:
      1. This directory is usually only a few hundreds GB large, and not so much space could be available at runtime.
      2. According to our tests, having more than 4 molpro processes writing simultaneously to the local disk can saturate the bandwidth, and the program becomes extremely slow.

      For this reason we strongly advise to use the scratch filesystem and to set the
      MOLPRO_KEEPTEMP environment variable.

    3. Setting MOLPRO_KEEPTEMP=1 makes most of the temporary files visible, but as a side effect, these are not removed automatically when the job ends or is interrupted. It is your responsibility to clean up the temporary directory after the run. Please uncomment the corresponding lines in the script, and in case the job aborts, clean up the directory manually.