Running ANSYS under LSF


WARNING:

On March 29 2021 there was a change in the ansys license server. If you get an error like

*** IMPORTANT LICENSING MESSAGE ***

ANSYS LICENSE MANAGER ERROR:
Capability XXXXX YYYYYY ZZZZZZ does not exist in the ANSYS licensing pool.
No specified user license preferences match available products in the specified license path:
    ANSYSLI_SERVERS: ...........
    FLEXlm Servers:  ...........

*** ERROR - ANSYS license not available.

you need to reset the license settings to the default.

NOTE about licensing:

In the old licensing scheme, a single session of any of the ansys products allowed to use up to 16 cores. After the change, this number has been reduced to 4 cores. Any extra cores uses one token from the anshpc license pool


General Info

ANSYS is a suite of software aimed at engineering simulation in many different fields. DTU provides the installation files, and the software is also installed in the HPC cluster. However the usage of ANSYS at DTU is subject to a quite strict license agreement, so please be sure that you are allowed to run the software, before submitting jobs to the HPC. Some more details and link to useful informations can be found at the ANSYS page on the gbar website.
You can find some general information about all ANSYS products on the cluster here.

CFX job scripts

ANSYS packages are usually run through a Graphical User Interface that allows to prepare the model, set up the simulation, run it and check the results of the simulation at runtime. This usage-model is not the ideal on the cluster, where the job placement and scheduling is managed by the resource manager and scheduler programs. So the preparation of the model and the actual simulation are to be done separately.

The user can open an interactive session on the HPC (find some instruction here), and from a terminal launch the program:

$ /appl/ansys/2020R1/v201/CFX/bin/cfx5solve &

In this way one can set up the simulation and check the results of any simulation run previously or still running.
To run a simulation, you have to prepare a proper job script and submit it.

The following script is a template for the submission of a parallel shared memory job, running on one single node.

#!/bin/sh
# embedded options to bsub - start with #BSUB
# -- Name of the job --
#BSUB -J ansys_CFX_example
# -- specify queue --
#BSUB -q hpc
# -- estimated wall clock time (execution time): hh:mm --
#BSUB -W 04:00
### -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- number of processors
#BSUB -n 4
# --specify that the cores MUST BE on a single host! --
#BSUB -R "span[hosts=1]" 
# -- user email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- mail notification --
# -- at start --
#BSUB -B
# -- at completion --
#BSUB -N
# --Specify the output and error file. %J is the job-id --
# --  -o and -e mean append, -oo and -eo mean overwrite --
#BSUB -oo cfx_18_IMPIC_%J.out
#BSUB -eo cfx_18_IMPIC_%J.err

#example of ansys command line call
/appl/ansys/2020R1/v201/CFX/bin/cfx5solve -def your_solver_input_file.def -start-method "Intel MPI Local Parallel" -size 1.5 -part $LSB_DJOB_NUMPROC -pri 1 -double -batch

Then save this file with a name of your choice, for this example my_ansys_cfx.sh
and then submit it

bsub < my_ansys_cfx.sh

You can find out the meaning of all the #BSUB option here.

This job script asks for 4 cores (#BSUB -n 4) on a single node (#BSUB -R "span[hosts=1]"). At the submission the environment variable $LSB_DJOB_NUMPROC is automatically set to the (total) number of cores requested, and it is called later in the ansys command line.

In the cfx command line, there are some important switches:
-batch: start the solver in batch mode, that is without the GUI;
-start-method "Intel MPI Local Parallel": set the simulation to run in parallel on a single node;
-part $LSB_DJOB_NUMPROC: start the solver in partitioning mode, using as many partitions as the number of cores asked for.
-def your_solver_input_file.def: use your_solver_input_file.def as the input file.

The other options are instead specific to the solver/model used. It is up to the user to find out the command line options available for the specific ANSYS package used.

The user can always check the status of your job with the available commands. And while the simulation is running, it is possible to open an interactive session on the HPC, launch the suitable CFX GUI, browse the home directory to find the output of the current simulation, and check the status as usual. These session and the simulation are two independent processes, however, and so it is not possible to stop the execution of the running simulation from the GUI.

Note: This script is only for running the CFX simulation on a SINGLE NODE. This means that the maximum number of cores is 24 on the current HPC cluster.

It is also possible to run CFX across multiple nodes. This requires the use of a Distributed Parallel start method. However, the jobs script is a bit more complex. If you need to run CFX using more than one node, please write to us at support@hpc.dtu.dk asking for detailed instructions.

FLUENT job scripts

FLUENT package is also usually run through a Graphical User Interface that allows to prepare the model, set up the simulation, run it and check the results of the simulation at runtime. This usage-model is not the ideal on the cluster, where the job placement and scheduling is managed by the resource manager and scheduler programs. So the preparation of the model and the actual simulation are to be done separately.

The user can open an interactive session on the HPC (find some instruction here), and from a terminal launch the program:

$ /appl/ansys/2020R1/v201/fluent/bin/fluent &

In this way one can set up the simulation and run small models, with a short runtime.  Fluent does not have a built-in monitoring functionality to check the results on any simulation running on the cluster, like CFX has. But in some way this is possible anyway.

NOTE: there is a problem th way fluent starts, and so the program does not get killed when the terminal from which it is open gets closed. Always close the fluent session properly from the GUI. If fluent keeps running in background, it blocks the license, and you could end up not being able to run any new fluent run on any machine, even your own laptop.

To run a simulation, you have to prepare a proper job script and submit it.

The following script is a template for the submission of a parallel shared memory job, running on one single node.

#!/bin/sh
# embedded options to bsub - start with #BSUB
# -- Name of the job ---
#BSUB -J ansys_FLUENT_example
# -- specify queue --
#BSUB -q hpc
# -- estimated wall clock time (execution time): hh:mm --
#BSUB -W 04:00
### -- specify that we need 2GB of memory per core/slot -- 
#BSUB -R "rusage[mem=2GB]"
# -- number of processors/cores/nodes --
#BSUB -n 4
### specify that the cores MUST BE on a single host! 
#BSUB -R "span[hosts=1]"
# -- user email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
# -- mail notification --
# -- at start --
#BSUB -B
# -- at completion --
#BSUB -N
# --Specify the output and error file. %J is the job-id --
# --  -o and -e mean append, -oo and -eo mean overwrite -- 
#BSUB -oo fluent_local_intel_%J.out
#BSUB -eo fluent_local_intel_%J.err

# Set the environment variable to fix ssh issue
export SSH_SPAWN=0
# Set the environment variable to fix IntelMPI issue (fluent v18+)
export I_MPI_SHM_LMT=shm
#example of ansys command line call
/appl/ansys/2020R1/v201/fluent/bin/fluent 3ddp -g -t$LSB_DJOB_NUMPROC -i instruction.journal -mpi=intel > fluent_run.out

Then save this file with a name of your choice, for this example my_ansys_fluent.sh
and then submit it

bsub < my_ansys_fluent.sh

The #BSUB options are the same of the previous CFX example, so we do not discuss it here. The difference isn in the command line options. Remember, that is is just an example!. You must adapt them to your specific case.

In the FLUENT command line, there are some important switches:
3ddp: just after the program name: start the solver 3d mode, double precision;
-g: start the program without the graphics (no GUI);
-t$LSB_DJOB_NUMPROC: start the program in parallel, using $LSB_DJOB_NUMPROC cores.
-i instruction.journal: fluent reads and exectuse the instructions read in the text file instruction.journal. More info later.
-mpi=intel: use the (FLUENT) intel MPI library for parallel communication;
> fluent_run.out: redirect the output of FLUENT (the information that in the GUI would appear on the “terminal” sub-window) to a file named fluent_run.out.

To see the options available on the command line, and a brief explanation, just type in a terminal:

/appl/ansys/2020R1/v201/fluent/bin/fluent -help

or look them up in the FLUENT manual.

In this way, you can submit more than one job, and check the behaviour of the simulation looking at the output file, that is constantly updated during the simulation.
FLUENT will go through the instruction that are found in the instruction.journal file, and execute them one after the other. This is a plain text file, written according to the so-called the FUENT GUI-syntax, or according to the simpler TUI-syntax (Text User Interface).
A basic example could be:

; READ CASE FILE
/file/read-case-data example.cas
; SIMULATION INSTRUCTION
/solve/iterate 24000

: WRITE case and data file
/file/write-case-data example_out.cas
exit

The lines starting with “;” are comments, and are not executed. This simple script will load the example.cas and example.dat from the current directory, then use the solver /solve/iterate for 24000 iterations, then write the resulting case and data files to the files example_out.cas and example_out.dat respectively.

NOTE: It is always better to test the commands in a short interactive run in a GUI session, first!

However, differently from CFX, FLUENT (by default) do not save any output until the simulation ends. This also means that if the job gest killed (by the user, with the command bkill, see here for the commands for managing the jobs) or by the system (e.g. when it reaches the walltime limit), no output will be saved, and the simulation will have to be run again.
To avoid this there are two main options:

  • Activate autosave. You can tell FLUENT to save the .cas and .dat file periodically during the execution, specifying the frequency. In this way, if something happens and the run is interrupted, one has at least an intermediate restart configuration. To do this, add the following lines to your .journal file:
; AUTO-SAVE/BATCH OPTIONS
file/auto-save/case-frequency if-case-is-modified
;
/file/auto-save/data-frequency 25
/file/auto-save/retain-most-recent-files yes
;
; STRING USED TO BUILD THE FILE NAME : "string"
; THE AUTO-SAVE FILE will be: "string""iteration count".cas
;
file/autosave/root-name "Auto_"

In this way, the case file will be saved only if modified, the dat file saved every 25 iterations, and only the last 5 most recent saved file will be kept. The intermediate files will be named “Auto_-x-yyy.cas” and Auto_-x-yyy.dat”, so the names will be unique. You can of course change the name of the files, and look for more option in the FLUENT manual.

  • Activate checkpointing. You can force FLUENT to save the .cas and .dat file at any time during the execution, specifying the name of a “trigger files”. FUENT at each iteration checks if a file with the name that you choose exists in a user-specified directory. If it finds it, FLUENT first dumps the current case and data files, and then removes the trigger-file. You can also trigger the exit fo a program, with a specified exit-triggering file. In this case, FLUENT dumps the results, and the exits. To specify this, you have to add some lines in the .journal file:
; CHECKPOINT/EXIT settings
(set! checkpoint/check-filename "/path-to-your-dir/my-check-fluent")
(set! checkpoint/exit-filename "path-to-your-dir/my-exit-fluent")

You have to specify the correct path to the directory where you want FLUENT to look for the trigger files. The checkpoint-trigger file will be my-check-fluent, and the exit trigger file will be my-exit-fluent. To create the file, just open a terminal, navigate to the specified directory, ad type

touch my-check-fluent

This will create an empty file with that name.

When you have the .cas and .dat files created in one of these ways, you can also check the status of the simulation from the GUI. Just open a fluent GUI on one of the interactive nodes, and load the corresponding case and data files. In this way, you can in principle monitor (with a short delay) all your fluent jobs running on the cluster.

Note: This script is only for running the FLUENT simulation on a SINGLE NODE. This means that the maximum number of cores is 24 on the current HPC cluster.

It is also possible to run FLUENT across multiple nodes. This requires the use of a different command line syntax, and a few tricks. Therefore, the jobs script is a bit more complex. If you need to run FLUENT using more than one node, please write to us at support@hpc.dtu.dk asking for detailed instructions.