Gaussian jobs under MOAB/Torque


 

Basic instructions

Gaussian09 from Gaussian Inc. is installed on the HPC cluster, but due to license restrictions you have to become a member of the Gaussian group to be able to run it. In short:

  • ask to become member of the *gaussian* group;
  • ask for the creation of your own directory in the SCRATCH filesystem.

This is achieved by mailing a request containing your username to Irene Shim: shim@kemi.dtu.dk

You can now run Gaussian on the HPC.

If you are going to use Gaussian09 you will probably need to run it in parallel. Gaussian09 comes in two versions, one capable of running on Shared Memory nodes, and one for network parallel execution. The latter relies on a proprietary parallel execution environment (Linda), that is not available on the HPC. This means that only the shared memory version can be run. You can therefore ask for up to 20 cores on a single HPC node.

Gaussian09 has its own syntax for specifying the use of parallel resources and memory, and you have to take care that these specification do not conflict with the resources requested through the MOAB/Torque. Conflicts could cause problem to the execution of your own job, and to the other HPC users.

Gaussian input files have .com extensions. A sample Gaussian input file (let’s call it MyGaussian.com for future references) is shown here:

%NProcShared=4

# B3LYP/6-31++G(d,p)

water energy

0   1
O  -0.464   0.177   0.0
H  -0.464   1.137   0.0
H   0.441  -0.143   0.0

It is important to leave a blank line at the end of a Gaussian input file!

The correct Gaussian directive for specifying the number of cores is %NProcShared. With %NProcShared=4 you are requesting 4 cores.
A simple script for running this Gaussian job under MOAB/Torque could be the following:

#!/bin/sh
# embedded options to qsub - start with #PBS
# -- our name ---
#PBS -N MyGaussian
# -- choose queue --
#PBS -q hpc
# -- Notify me by email when execution begins (b) and ends (e) --
#PBS -m be
# -- email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##PBS -M your_email_address
# -- estimated wall clock time (execution time): hh:mm:ss --
#PBS -l walltime=00:10:00
# -- parallel environment requests --
#PBS -l nodes=1:ppn=4
# -- end of PBS options --

# -- change to working directory
if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi

# -- commands you want to execute --
#	

g09 MyGaussian

The important option here is

#PBS -l nodes=1:ppn=4

It requests to reserve 4 cores on 1 node. The number of cores (ppnmust be the same set with %NProcShared in the Gaussian input file.

Scratch usage and other tips

In the Gaussian input file there are other keywords that are important for the execution on the HPC.

%NProcShared=4
%Mem=2GB
%RWF=/SCRATCH/YourUserName/MyGaussian.rwf
%NoSave
%Chk=MyGaussian.chk

#n B3LYP/6-31++G(d,p)  SCF=(XQC,MaxCycle=512) pop=Regular Opt(MaxCyc=300) Freq

...
.................

%Mem specifies the total amount of memory to be used by Gaussian. Increasing this number is often needed for direct SCF calculations, but requesting more memory than the amount available on the compute node will lead to poor performance.

Two other directive are worth mentioning: %Chk and %RWF. They specify the name and the location of two of the important files that Gaussian can create, and that can be used to restart the calculation.

%Chk specify the checkpoint file, that is usually saved in the your home directory.
%RWF (read write file), stores a lot of information about the state of the computation. It can be a huge file and has to be put in the SCRATCH directory.

Even if you do not specify a name, Gaussian will write a .rwf file in the SCRATCH directory you specified, with a name based on the process id (these are considered unnamed files because the user does not provide explicitly a name).
Gaussian internal policy is that of always saving the .rwf file in case of unsuccessful completion, so that one can restart from this file. Gaussian deletes the unnamed file in case of successful completion, but always keeps the named .rwf files, filling the SCRATCH filesystem unnecessarily. To avoid that, the directive %NoSave specifies that the named scratch files that appear before the %NoSave have to be deleted when the computation ends normally.

Please take care of cleaning your SCRATCH directory, because it is a common working area, and performance degradation can occurr when it is full.

NOTE:

    If your model produces HUGE .rwf files, PLEASE use the gaussian option to split the file in different chunks. For example, if you expect your file to be 100GB, you can think about splitting it in chunks of 15 GB maximum. You will need 7 directories under your /SCRATCH/username/ directory, for example named rwf1, …, rwf7. Then you can add in your Gaussian imput file the following line:
%RWF=/SCRATCH/username/rwf1/,15GB,/SCRATCH/username/rwf2/,15GB,/SCRATCH/username/rwf3/,15GB,/SCRATCH/username/rwf4/,15GB,/SCRATCH/username/rwf5/,15GB,/SCRATCH/username/rwf6/,15GB,/SCRATCH/username/rwf7/,-1
    In this way your .rwf file will be split in 6 files of maximum size 15GB in the first six directories, and the rest of the data will be stored in the seventh directory, this time with an unlimited size.
    Please take care of the fact that your size are never bigger than 50GB, if it is possible. You can also specify a filename, otherwise Gaussian will create files with a unique filename anyway.

There are of course other options, but this is the basic setup required for effectively using the Gaussian on the HPC.

Please consult Running Gaussian for details.

#!/bin/sh
#PBS -l walltime=4:00:00


cd $PBS_O_WORKDIR

# http://www.hpc.dtu.dk/?page_id=511
export GAUSS_SCRDIR=/SCRATCH/$USER
export g09root=/appl/hgauss/Gaussian_09_D01
. $g09root/g09/bsd/g09.profile

# water example from http://www.gaussian.com/g_tech/g_ur/m_input.htm
( echo %NProcShared=`cat $PBS_NODEFILE | wc -l`; cat H2O.com ) | g09 > H2O_scratch.log