GPU Jobs

Submitting GPU Jobs

To see the current GPU's we have available in the cluster, take a look at the following section. To request any GPU, you need to use the flag ngpus (to request the desired number of gpus);

#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1

The value of ngpus is per node. You can select up to 8 GPU's per node. If you request 4 nodes (select=4) and 8 GPU's (ngpus=8) that will equate to 32 GPU's.

Warning

Please only request more than 1 GPU if you know your application supports this and you have made the necessary changes to utilise this; some applications need extra steps in order to use more than 1 GPU!

Within the context of the running job, the shell environment variable CUDA_VISIBLE_DEVICES will be set automatically with the UUID of the allocated GPUs. Our systems utilises Cgroups, meaning you will only be able to see the resource that is allocated to your job, including GPU.

Job Limits

All GPU jobs currently run via the gpu72 queue. You will find up to date job limits for this queue on the Job sizing guidance page.

Requesting Specific GPU types

In order to request a specific GPU type, you can add the following PBS flag - gpu_type

There are 3 GPU types available in the batch queue: L40S, RTX6000, and A100:

#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1:gpu_type=L40S
or
#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1:gpu_type=RTX6000
or
#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1:gpu_type=A100
The default option is the L40S cards, the RTX6000 can be good if your work doesn't require much GPU compute as they are normally in less demand. We only have a few A100 cards and even then, they only have 40GB of VRAM, they are normally used by those that need the double precision of an A100 but not the VRAM.

GPU Node Specification

L40S PCIe 48 GB A100 PCIe 40GB A40 PCIe 48GB RTX6000 PCIe 24GB
FP64 Double Precision (Tensor) TFLOPS N/A 19.5 N/A N/A
TF32 Single Precision (Tensor) TFLOPS 183 156 74.8 N/A
FP64 Double Precision TFLOPS N/A 9.7 N/A <1
FP32 Single Precision TFLOPS 91.6 19.5 37.4 16.3
Memory 48 GB GDDR6 40 GB GDDR6 48 GB GDDR6 24 GB GDDR6
Memory Bandwidth GB/s 864 1,555 696 672
CUDA Compute Capability 8.9 8.0 8.6 7.5
GPU Architecture Ada Lovelace Ampere Ampere Turing
---
Available in batch queue Yes Yes* No No
Available in Jupyterhub No No Yes Yes

* Note that there are only a few A100's available in the batch queue, as a result, only request them if you're absolutely sure you need them, otherwise you may be queueing for a long time.