Requesting GPUs under MOAB / Torque

Accessing GPU nodes

The current hpc queue has 4 GPU nodes, each installed with 2 GPUs, giving a total of 8 GPUs. The machines are n-62-16-17, n-62-16-18, n-62-16-29 and n-62-16-32 . To request access to these nodes using MOAB/Torque, you can use the following example:

hpc-fe1: $ qsub -l nodes=1:ppn=2:gpus=1

The above example translates to requesting 1 node, with 2 processors (cores), with 1 GPU per node ( 1 GPU total ).

You can also use the msub command to request the same resources:

hpc-fe1: $ msub -l nodes=4:gpus=1

In the above example, MOAB will allocate 4 nodes, with 1 GPU each ( 4 GPUs total ).

Using the gpus flag makes it easier for the user and MOAB/Torque to schedule jobs, and ensures that the jobs are scheduled in the best way possible.

Since there might be more than 1 GPU per node, you should make sure to only access the GPUs MOAB has reserved for you.  This information is stored in the a file which can be accessed through the $PBS_GPUFILE variable.  The entries are one line for each GPU, and are of the form


i.e. the hostname and the device ID of the GPU you have been assigned.   Please make sure that your application uses the right device ID, e.g. the devices 0 and 1 in the example above.

The following script demonstrates how to measure memcopy bandwidth on the GPU assigned by MOAB:



# -- run in the current working (submission) directory --

# The CUDA device reserved for you by the batch system
CUDADEV=`cat $PBS_GPUFILE | rev | cut -d"-" -f1 | rev | tr -cd [:digit:]`

# load the required modules
module load cuda/5.5

cp -rp /opt/cuda/5.5/samples/1_Utilities/bandwidthTest .
cd bandwidthTest
sed -i -e 's|INCLUDES.*=.*|INCLUDES=-I$(CUDA_PATH)/samples/common/inc|' Makefile
sed -i -e 's|../../bin/|./bin/|' Makefile
./bandwidthTest --device=${CUDADEV}