We have right now 38 nodes with GPUs in our generally available LSF10-setup.
The walltime is limited to 24 hours per job at the moment.
4 nodes with 2 x Tesla A100 PCIE 40 GB (owned by DTU Compute) – queuename: gpua100
6 nodes with 2 x Tesla A100 PCIE 80 GB (owned by DTU Compute) – queuename: gpua100
6 nodes with 2 x Tesla V100 16 GB (owned by DTU Compute&DTU Elektro) – queuename: gpuv100
8 nodes with 2 x Tesla V100 32 GB (owned by DTU Compute&DTU Environment&DTU MEK) – queuename gpuv100
1 nodes with 2 x Tesla A10 PCIE 24 GB (owned by DTU Compute) – queuename gpua10
1 nodes with 2 x Tesla A40 48 GB with NVlink (owned by DTU Compute) – queuename gpua40
3 nodes with 4 x Tesla V100 32 GB with NVlink (owned by DTU Compute) – queuename gpuv100
2 nodes with 4 x TitanX (Pascal) – queuename: gputitanxpascal
(retired)
1 node with 4 x Tesla K80 – queuename: gpuk80
(retired)
1 node with 2 x Tesla K40 – queuename: gpuk40
(retired)
1 node with 2 x AMD Radeon Instinct MI50 16 GB gpus – not on queue
1 node with 2 x AMD Radeon Instinct MI25 16 GB gpus – queuename gpuamd
For being able to run code on the Nvidia A100 please make sure to compile your code with
cuda 11.0 or newer.
1 interactive V100-node reachable via voltash
1 interactive V100-node with NVlink reachable via sxm2sh
1 interactive A100-node with NVlink reachable via a100sh
.
Here is an example jobscript:
#!/bin/sh ### General options ### –- specify queue -- #BSUB -q gpuv100 ### -- set the job Name -- #BSUB -J testjob ### -- ask for number of cores (default: 1) -- #BSUB -n 4 ### -- Select the resources: 1 gpu in exclusive process mode -- #BSUB -gpu "num=1:mode=exclusive_process" ### -- set walltime limit: hh:mm -- maximum 24 hours for GPU-queues right now #BSUB -W 1:00 # request 5GB of system-memory #BSUB -R "rusage[mem=5GB]" ### -- set the email address -- # please uncomment the following line and put in your e-mail address, # if you want to receive e-mail notifications on a non-default address ##BSUB -u your_email_address ### -- send notification at start -- #BSUB -B ### -- send notification at completion-- #BSUB -N ### -- Specify the output and error file. %J is the job-id -- ### -- -o and -e mean append, -oo and -eo mean overwrite -- #BSUB -o gpu_%J.out #BSUB -e gpu_%J.err # -- end of LSF options -- nvidia-smi # Load the cuda module module load cuda/11.6 /appl/cuda/11.6.0/samples/bin/x86_64/linux/release/deviceQuery
Then submit with
bsub < jobscript.sh
For requesting a GPU with 32GB (only available in the gpuv100 queue) of memory, then please add a
#BSUB -R "select[gpu32gb]"
to your jobscript. In the gpua100-queue we have GPUs with 40GB and with 80 GB of memory.
For requesting GPUs with NVLINK (only available in the gpuv100-queue), then please add a
#BSUB -R "select[sxm2]"
to your jobscript.
For requesting more GPUs, please modify the jobscript accordingly, i.e., for two GPUs use
#BSUB -n 8
#BSUB -gpu "num=2:mode=exclusive_process"