Using GPUs under LSF10

We have right now 38 nodes with GPUs in our generally available LSF10-setup.
The walltime is limited to 24 hours per job at the moment.

4 nodes with 2 x Tesla A100 PCIE 40 GB (owned by DTU Compute) – queuename: gpua100
6 nodes with 2 x Tesla A100 PCIE 80 GB (owned by DTU Compute) – queuename: gpua100
6 nodes with 2 x Tesla V100 16 GB (owned by DTU Compute&DTU Elektro) – queuename: gpuv100
8 nodes with 2 x Tesla V100 32 GB (owned by DTU Compute&DTU Environment&DTU MEK) – queuename gpuv100
1 nodes with 2 x Tesla A10 PCIE 24 GB (owned by DTU Compute) – queuename gpua10
1 nodes with 2 x Tesla A40 48 GB with NVlink (owned by DTU Compute) – queuename gpua40
3 nodes with 4 x Tesla V100 32 GB with NVlink (owned by DTU Compute) – queuename gpuv100
2 nodes with 4 x TitanX (Pascal) – queuename: gputitanxpascal (retired)
1 node with 4 x Tesla K80 – queuename: gpuk80 (retired)
1 node with 2 x Tesla K40 – queuename: gpuk40 (retired)

1 node with 2 x AMD Radeon Instinct MI50 16 GB gpus – not on queue
1 node with 2 x AMD Radeon Instinct MI25 16 GB gpus – queuename gpuamd

For being able to run code on the Nvidia A100 please make sure to compile your code with
cuda 11.0 or newer.

1 interactive V100-node reachable via voltash
1 interactive V100-node with NVlink reachable via sxm2sh
1 interactive A100-node with NVlink reachable via a100sh.

Here is an example jobscript:

#!/bin/sh
### General options
### –- specify queue --
#BSUB -q gpuv100
### -- set the job Name --
#BSUB -J testjob
### -- ask for number of cores (default: 1) --
#BSUB -n 4
### -- Select the resources: 1 gpu in exclusive process mode --
#BSUB -gpu "num=1:mode=exclusive_process"
### -- set walltime limit: hh:mm --  maximum 24 hours for GPU-queues right now
#BSUB -W 1:00
# request 5GB of system-memory
#BSUB -R "rusage[mem=5GB]"
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
#BSUB -B
### -- send notification at completion--
#BSUB -N
### -- Specify the output and error file. %J is the job-id --
### -- -o and -e mean append, -oo and -eo mean overwrite --
#BSUB -o gpu_%J.out
#BSUB -e gpu_%J.err
# -- end of LSF options --

nvidia-smi
# Load the cuda module
module load cuda/11.6

/appl/cuda/11.6.0/samples/bin/x86_64/linux/release/deviceQuery

Then submit with

bsub < jobscript.sh

For requesting a GPU with 32GB (only available in the gpuv100 queue) of memory, then please add a

#BSUB -R "select[gpu32gb]"

to your jobscript. In the gpua100-queue we have GPUs with 40GB and with 80 GB of memory.

For requesting GPUs with NVLINK (only available in the gpuv100-queue), then please add a

#BSUB -R "select[sxm2]"

to your jobscript.

For requesting more GPUs, please modify the jobscript accordingly, i.e., for two GPUs use

#BSUB -n 8
#BSUB -gpu "num=2:mode=exclusive_process"