Requesting GPUs under MOAB / Torque


Accessing GPU nodes

The current hpc queue has 4 GPU nodes, each installed with 2 GPUs, giving a total of 8 GPUs. The machines are n-62-16-17, n-62-16-18, n-62-16-29 and n-62-16-32 . To request access to these nodes using MOAB/Torque, you can use the following example:

hpc-fe1: $ qsub -l nodes=1:ppn=2:gpus=1 my_script.sh

The above example translates to requesting 1 node, with 2 processors (cores), with 1 GPU per node ( 1 GPU total ).

You can also use the msub command to request the same resources:

hpc-fe1: $ msub -l nodes=4:gpus=1 my_script.sh

In the above example, MOAB will allocate 4 nodes, with 1 GPU each ( 4 GPUs total ).

Using the gpus flag makes it easier for the user and MOAB/Torque to schedule jobs, and ensures that the jobs are scheduled in the best way possible.

Since there might be more than 1 GPU per node, you should make sure to only access the GPUs MOAB has reserved for you.  This information is stored in the a file which can be accessed through the $PBS_GPUFILE variable.  The entries are one line for each GPU, and are of the form

hostname1-gpu0
hostname2-gpu1
...

i.e. the hostname and the device ID of the GPU you have been assigned.   Please make sure that your application uses the right device ID, e.g. the devices 0 and 1 in the example above.

The following script demonstrates how to measure memcopy bandwidth on the GPU assigned by MOAB:

 

#!/bin/sh

# -- run in the current working (submission) directory --
if test X$PBS_ENVIRONMENT = XPBS_BATCH; then cd $PBS_O_WORKDIR; fi

# The CUDA device reserved for you by the batch system
CUDADEV=`cat $PBS_GPUFILE | rev | cut -d"-" -f1 | rev | tr -cd [:digit:]`

# load the required modules
module load cuda/5.5

cp -rp /opt/cuda/5.5/samples/1_Utilities/bandwidthTest .
cd bandwidthTest
sed -i -e 's|INCLUDES.*=.*|INCLUDES=-I$(CUDA_PATH)/samples/common/inc|' Makefile
sed -i -e 's|../../bin/|./bin/|' Makefile
make
./bandwidthTest --device=${CUDADEV}