steno usage tips

steno software stack
Important remarks

The steno queues are managed by the LSF scheduler, so the general instructions for LSF are also valid. When submitting to the steno-queues, the users need to specify the name of the queue: stenocpu for the cpu-only queue, stenogpu or stenopippi for jobs that require GPU cards. Not all the users that have access to the stenocpu queue are also automatically granted access to the other steno-clusters.
An example script to request gpus can be found here.

The maximum walltime for all jobs for the steno-queues is at the moment 7 days.

steno-specific software stack

All the nodes are fully integrated into the DCC infrastructure, and this also means that all the application installed for the general DCC clusters are available either immediately or after loading the respective module, on the steno-machines also.

Additionally, users of the steno-clusters have also access to a specific software stack. To get access to those applications, the users need to first source a special file. In a terminal,

source /appl/steno/steno.bashrc

This makes some additional modules available, as can be seen using the command

module avail steno

----------------------- /appl/steno/SL73/modules/generic -----------------------
steno-apbs/1.5               steno-pdb2pqr/2.1.1-py2.7.15
steno-namd/2.13              steno-turbomole/7.0
steno-namd-cuda/2.13

-------------------- /appl/steno/SL73/XeonE5-2660v3/modules --------------------
steno-amber/16.15           steno-amber/19.17
steno-amber/17.15           steno-amber/19.17-openblas
steno-amber/18.17           steno-gromacs/4.62-20180301

Those module are ready to be loaded as all the other cluster modules.

Note: every time a job is dispatched to the cluster, the working-environment is cleared and the modifications made in the shell are not inherited. It can be necessary to perform the sourcing of /appl/steno/steno.bashrc file and the loading of the module in the job-script, before the call to the program.

Important remarks

The landscape of the software packages used by the chemistry community is especially heterogeneous, and there is no “one size fits all” approach in the usage of the cluster. There are codes that require a lot of cpu/GPU power, other that require a lot or RAM, others that require a lot of disk space (for storage and or temporary files used at runtime), and finally others that require long execution time.

In the following we provide some hints, for each of those categories.

Programs that require a lot of cpu/GPU power

Those are the typical programs for an HPC cluster. Multi-threaded programs and/or MPI-programs are usually well supported on the cluster, unless they also need lots of disk-space (see below).

A special note for program that supports GPUs. All the stenogpu and stenopippi machines have multiple GPUs. Please do not block GPUs by requesting too many cores. For example, if a machine has 12 cores and 3 GPUs, do not ask for more than 4 cores per GPU.

Programs that require a lot of RAM.

The memory available on the steno-nodes is quite limited. Those program would probably be run on the general HPC cluster instead.

Programs that require a lot of disk space

Disk space and disk access patterns are very critical for some of the Chemistry codes. For that some kind of codes the disk access could be the actual problem that kills the performance, and limits the possibility of running multiple jobs. There are codes that require a lot of storage space, others that require a lot of temporary space, and some that need both.

Programs that requires a lot of storage space

For example, molecular dynamics simulation, where the trajectory file needs to be saved for long runs.

Those programs need to be run on one of our scratch filesystem. However, remember that the scratch filesystem should only be used for temporary files, because when they are “too full” they become slower, and affect all the cluster users.

So if you need to run those kind of simulation, you MUST have a plan to regularly remove from the scratch filesystem the data reguarly, and keep the usage at the minimum. The scratch filesystem is a precious (and expensive) resource, it should not be wasted.

Please contact the department IT to get feasible solutions.

Programs that require a lot of temporary disk space

For example Gaussian, molpro, QChem…
Those programs typically create large temporary files that are accessed very frequently, putting the filesystem under stress.
The I/O access can become so massive that the program spends most of the time just waiting to read and write to the filesystem. For those kind of programs, it could be necessary to adopt specific tailored solutions.

As a general remark, do not expect to be able to run many of those program simultaneously.

Programs that require long runtime

To be clear, long runtime must be avoided as much as possible. There is a maximum walltime on all the queues (7 days for the steno-queues), and exceptions to this can be granted but must be avoided.

If a program can be checkpointed, make regular checkpoint and split a long run in smaller chunks.

The problem with long runs are:

A resource that is blocked for a long time reduces the efficiency of the scheduling of jobs
The longer the run, the more likely that something happens to the machine the program is running on, or to the shared filesystem. If this happens, the job will be killed and the results are lost, without regular checkpointing.
We have regular service windows (potentially one per month). Jobs which are still running at the start of the service window will be killed. Jobs that are waiting in a queue and not expected to finish before the start of the service window, will not be allowed to start.