GPU nodes


Available GPUs

The following NVIDIA GPUs are currently available as part of the DCC managed HPC clusters:

# GPUsNameYearArchitectureCUDA cap.CUDA coresClock MHzMem GiBSP peak GFlopsDP peak GFlopsPeak GB/s
8Tesla M20502012GF100 (Fermi)2.04485752.621030 515148.4
6Tesla M2070Q2012GF100 (Fermi)2.04485755.251030515150.3
2*GeForce GTX 6802012GK104-400 (Kepler)3.0153610581.953090128192.2
3Tesla K20c2013GK110 (Kepler)3.524967454.6335241175208
5Tesla K40c2013GK110B (Kepler)3.52880745 / 87511.174291 / 50401430 / 1680288
8Tesla K80c (dual)2014GK210 (Kepler)3.72496562 / 87511.172796 / 4368932 / 1456240
1*GeForce GTX TITAN X2015GM200-400 (Maxwell)5.23072107611.926144192336
8*TITAN X2016GP102 (Pascal)6.135841417 / 153111.9010157 / 10974317.4 / 342.9480

*Please note that the NVIDIA consumer GPUs (GForce GTX 680 and GForce GTX TITAN X) as well as TITAN X do not support ECC.

In addition, we have 1 Xeon-Phi node with 2×Intel Xeon Phi 5110P accelerators (60 cores, 8 GB memory), which can be used for testing purposes.


Running interactively on GPUs

There are currently two nodes available for running interactive jobs on NVIDIA GPUs.

Node n-62-17-44 is installed with 2×NVIDIA Tesla M2070Q, which are based on the Fermi architecture (same as NVIDIA Tesla M2050).

To run interactively on this node, you can use the following command:

hpclogin1: $ gpush

This command executes a bash script that submits an interactive job to the gpushqueue queue.

Node n-62-18-47 is installed with 1×NVIDIA GForce GTX TITAN X, 2×NVIDIA Tesla K20c, and 1×NVIDIA Tesla K40c, all based on the Kepler architecture (same as NVIDIA Tesla K80c and NVIDIA GForce GTX 680).

To run interactively on this node, you can use the following command:

hpclogin1: $ k40sh

This command executes a bash script that submits an interactive job to the k40_interactive queue.

Please note that multiple users are allowed on these nodes, and all users will be able to access all the GPUs on the node. We have set the GPUs to the “Exclusive process” runtime mode, which means that you will encounter a “device not available” (or similar) error, if someone is using the GPU you are trying to access.

In order to avoid too many conflicts we ask you to follow this code-of-conduct:

  • Please monitor which GPUs are currently occupied using the command nvidia-smi and predominantly select unoccupied GPUs (e.g., using cudaSetDevice()) for your application.
  • If you need to run on all CPU cores, e.g., for performance profiling, please make sure that you are not disturbing other users.
  • We kindly ask you to use the interactive nodes mainly for development, profiling, and short test jobs.

If you have further questions or issues using the GPUs please write to


Requesting GPUs under MOAB / Torque

The current hpc queue has 4 GPU nodes, each installed with 2×NVIDIA Tesla M2050, giving a total of 8 GPUs. The machines are n-62-16-17, n-62-16-18, n-62-16-29 and n-62-16-32 .

Similarly the visual queue has 2 nodes (n-62-17-41 and n-62-17-42) installed with 2×NVIDIA Tesla M2070Q each.

To request access to these nodes using MOAB/Torque, you can use the following example:

hpclogin1: $ qsub -l nodes=1:ppn=2:gpus=1

The above example translates to requesting 1 node, with 2 processors (cores), with 1 GPU per node ( 1 GPU total ).

You can also use the msub command to request the same resources:

hpclogin1: $ msub -l nodes=4:gpus=1

In the above example, MOAB will allocate 4 nodes, with 1 GPU each (4 GPUs total ).

Using the gpus flag makes it easier for the user and MOAB/Torque to schedule jobs, and ensures that the jobs are scheduled in the best way possible.

Since there might be more than 1 GPU per node, you should make sure to only access the GPUs MOAB has reserved for you.  This information is stored in the a file which can be accessed through the $PBS_GPUFILE variable.  The entries are one line for each GPU, and are of the form


i.e. the hostname and the device ID of the GPU you have been assigned.   Please make sure that your application uses the right device ID, e.g. the devices 0 and 1 in the example above.

The following script demonstrates how to measure memcopy bandwidth on the GPU assigned by MOAB:


# -- run in the current working (submission) directory --

# The CUDA device reserved for you by the batch system
CUDADEV=`cat $PBS_GPUFILE | rev | cut -d"-" -f1 | rev | tr -cd [:digit:]`

# load the required modules
module load cuda/5.5

cp -rp /opt/cuda/5.5/samples/1_Utilities/bandwidthTest .
cd bandwidthTest
sed -i -e 's|INCLUDES.*=.*|INCLUDES=-I$(CUDA_PATH)/samples/common/inc|' Makefile
sed -i -e 's|../../bin/|./bin/|' Makefile
./bandwidthTest --device=${CUDADEV}


If you have further questions or issues using the GPUs please write to


Requesting GPUs under LSF

We have currently two nodes with Kepler GPUs available for computation, which are managed by the LSF scheduler. Node n-62-18-49 has four GPUs (4×NVIDIA Tesla K40c) and node n-62-24-17 has eight GPUs (4×NVIDIA Tesla K80c). Both nodes currently have their own separate LSF queues, gpuqueuek40 and gpuqueuek80, respectively.

We also have two nodes (n-62-30-9 and n-62-30-10) with Pascal GPUs (both nodes with 4×NVIDIA TITAN X) available for computation on the queue gpuqueuetitanx. Please note that the Titan cards are not efficient for calculations in double precision and do not support ECC.

To use these nodes, you need first to access the LSF part of the cluster, by logging into

An example script for using the K80 GPU follows

### General options
### –- specify queue --
#BSUB -q gpuqueuek80
### -- set the job Name --
### -- ask for number of cores (default: 1) --
#BSUB -n 2
### -- Select the resources: 2 gpus in exclusive process mode --
#BSUB -R "rusage[ngpus_excl_p=2]"
### -- set walltime limit: hh:mm --
#BSUB -W 16:00
### -- set the email address --
# please uncomment the following line and put in your e-mail address,
# if you want to receive e-mail notifications on a non-default address
##BSUB -u your_email_address
### -- send notification at start --
### -- send notification at completion--
### -- Specify the output and error file. %J is the job-id --
### -- -o and -e mean append, -oo and -eo mean overwrite --
#BSUB -o gpu-%J.out
#BSUB -e gpu_%J.err
### -- small workaround -- no comment ;)
#BSUB -L /bin/bash
# -- end of LSF options -- 

# Load the cuda module 
module load cuda/7.5 

# here follow the commands you want to execute 
/appl/cuda/7.5/samples/bin/x86_64/linux/release/matrixMulCUBLAS --device=2

For an explanation of the general BSUB options, refer to this page. If you want to submit an LSF-job you have to use the following syntax:

bsub <

The special options for the GPU usage are mainly two:

#BSUB -q gpuqueuek80
#BSUB -R "rusage[ngpus_excl_p=2]"

The first line is to select the queue with the K80 accelerator. The second line requests the gpu resources, in this specific case 2 in exclusive mode.
Then you need to load the cuda runtime environment

module load cuda/7.5

and finally you can add the command for your specific program. Just replace the line

/appl/cuda/7.5/samples/bin/x86_64/linux/release/matrixMulCUBLAS --device=2

with your command line.
If you want, uncommenting the line


you will see which Cuda devices are visible to your program, to be sure that your request was correct.

If you have further questions or issues using the GPUs please write to