Gaussian jobs under LSF


Basic instructions

Different versions of the Gaussian-family prograns from Gaussian Inc. are installed on the HPC cluster, but due to license restrictions you have to become a member of the Gaussian group to be able to run it. In short:

  • ask to become member of the *gaussian* group;
  • ask for the creation of your own directory in one of the scratch filesystems.

This is achieved by mailing a request containing your username to Sonia Coriani:  soco@kemi.dtu.dk

You can now run Gaussian on the HPC.

The latest version installe is Gaussian16. If you are going to use it you will probably need to run it in parallel. Gaussian16 comes in two versions, one capable of running on Shared Memory nodes, and one for network parallel execution. The latter relies on a proprietary parallel execution environment (Linda), that is not available on the HPC. This means that only the shared memory version can be run. You can therefore ask for up to 20, 24 or 32  cores on a single HPC node.

Gaussian16 has its own syntax for specifying the use of parallel resources and memory, and you have to take care that these specification do not conflict with the resources requested through the scheduler, i.e. in the job-script. Conflicts could cause problem to the execution of your own job, and to the other HPC users.

Gaussian input files have .com extensions. A sample Gaussian input file (let’s call it MyGaussian.com for future references) is shown here:

%NProcShared=4

# B3LYP/6-31++G(d,p)

water energy

0   1
O  -0.464   0.177   0.0
H  -0.464   1.137   0.0
H   0.441  -0.143   0.0

It is important to leave a blank line at the end of a Gaussian input file!

The correct Gaussian directive for specifying the number of cores is %NProcShared. With %NProcShared=4 you are requesting 4 cores.
A simple script for running this Gaussian job under LSF could be the following:

#!/bin/sh
# embedded options to bsub - start with #LSF
# -- our name ---
#BSUB -J MyGaussian
# -- choose queue --
#BSUB -q hpc
# -- Notify me by email when execution begins  --
#BSUB -B
# -- Notify me by email when execution ends    --
#BSUB -N
# -- email address -- 
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- estimated wall clock time (execution time): hh:mm -- 
#BSUB -W 00:10
### -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- parallel environment requests -- 
#BSUB -n 4 
### -- specify that the cores MUST BE on a single host! It's a SMP job! --
#BSUB -R "span[hosts=1]"
### -- Specify the output and error file. %J is the job-id -- 
### -- -o and -e mean append, -oo and -eo mean overwrite -- 
#BSUB -o Output_%J.out
#BSUB -e Error_%J.err 
# -- end of LSF options -- 

# -- setup of the gaussian 16 environment --
export g16root=/appl/hgauss/Gaussian_16_A03
source $g16root/g16/bsd/g16.profile
#export GAUSS_SCRDIR=path-to-scratch

# -- commands you want to execute -- 
# 

g16 MyGaussian

The important options here are

#BSUB -n 4 
#BSUB -R "span[hosts=1]"

Combined, they request to reserve 4 cores on 1 node. The number of cores (here 4) must be the same set with %NProcShared in the Gaussian input file.

NOTE:

To be able to run Gaussian you have been assigned a scratch directory. Remember to replace path-to-scratch
with the actual path to your own scratch directory in the line
#export GAUSS_SCRDIR=path-to-scratch , and remove the comment sign “#” at the beginning.

Scratch usage and other tips

In the Gaussian input file there are other keywords that are important for the execution on the HPC.

%NProcShared=4
%Mem=2GB
%RWF=/path-to-scratch/MyGaussian.rwf
%NoSave
%Chk=MyGaussian.chk

#n B3LYP/6-31++G(d,p)  SCF=(XQC,MaxCycle=512) pop=Regular Opt(MaxCyc=300) Freq

...
.................

%Mem specifies the total amount of memory to be used by Gaussian. Increasing this number is often needed for direct SCF calculations, but requesting more memory than the amount available on the compute node will lead to poor performance.

Two other directive are worth mentioning: %Chk and %RWF. They specify the name and the location of two of the important files that Gaussian can create, and that can be used to restart the calculation.

%Chk specify the checkpoint file, that is usually saved in the your home directory.
%RWF (read write file), stores a lot of information about the state of the computation. It can be a huge file and has to be put in the scratch directory. Remember to replace path-to-scratch with the actual name of your scratch directory.

Even if you do not specify a name, Gaussian will write a .rwf file in the SCRATCH directory you specified, with a name based on the process id (these are considered unnamed files because the user does not provide explicitly a name).
Gaussian internal policy is that of always saving the .rwf file in case of unsuccessful completion, so that one can restart from this file. Gaussian deletes the unnamed file in case of successful completion, but always keeps the named .rwf files, filling the SCRATCH filesystem unnecessarily. To avoid that, the directive %NoSave specifies that the named scratch files that appear before the %NoSave have to be deleted when the computation ends normally.

Please take care of cleaning your scratch directory, because it is a common working area, and performance degradation can occurr when it is full.

NOTE:

If your model produces HUGE .rwf files, PLEASE use the gaussian option to split the file in different chunks. For example, if you expect your file to be 100GB, you can think about splitting it in chunks of 15 GB maximum. You will need 7 directories under your scratch directory, for example named rwf1, …, rwf7. Then you can add in your Gaussian imput file the following line:

%RWF=/path-to-scratch/rwf1/,15GB,/path-to-scratch/rwf2/,15GB,/path-to-scratch/rwf3/,15GB,/path-to-scratch/rwf4/,15GB,/path-to-scratch/rwf5/,15GB,/path-to-scratch/rwf6/,15GB,/path-to-scratch/rwf7/,-1

Here “path-to-scratch” must be replaced with the actual path to your own scratch directory.
In this way your .rwf file will be split in 6 files of maximum size 15GB in the first six directories, and the rest of the data will be stored in the seventh directory, this time with an unlimited size.
Please take care of the fact that your size are never bigger than 50GB, if it is possible. You can also specify a filename, otherwise Gaussian will create files with a unique filename anyway.

There are of course other options, but this is the basic setup required for effectively using the Gaussian on the HPC.

Please consult Running Gaussian for details.