GPU Jobs
Submitting GPU Jobs
The current GPU's we have available in the cluster are Nvidia RTX 6000's. To request a gpu, you need to use both the qsub flag ngpus
(to request the desired number of gpus) and the qsub flag gpu_type
(to select the type of GPU). e.g.;
#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1:gpu_type=RTX6000
The value of ngpus
is per node. You can select up to 8 GPU's per node. If you request 4 nodes (select=4
) and 8 GPU's (ngpus=8
) that will equate to 32 GPU's. Please only request more than 1 GPU if you know your application supports this and you have made the necessary changes to utilise this. Within the context of the running job, the shell environment variable CUDA_VISIBLE_DEVICES
will be set automatically with indices of the allocated GPUs. Jobs must respect this setting, or they will interfere with other jobs co-located on the execution node.
Job Limits
All GPU jobs currently run via the gpu72 queue. You will find up to date job limits for this queue on the Job sizing guidance page.
AMD Rome or Intel Skylake
We have GPU nodes with both AMD Rome and Intel Skylake processors; the above example submission options will run on either the Intel Skylake or AMD Rome compute nodes. The following examples demonstrate how to force your job to only run on either the Intel Skylake or AMD Rome GPU nodes (see below for the node specification):
Intel Skylake example resource request
#PBS -l select=1:ncpus=4:mem=24gb:ngpus=1:gpu_type=RTX6000:cpu_type=skylake
AMD Rome example resource request
#PBS -l select=1:ncpus=16:mem=96gb:ngpus=1:gpu_type=RTX6000:cpu_type=rome
GPU Node Specification
Node type | Type of GPU | No. of GPUS per node | No. of CPUs | RAM |
---|---|---|---|---|
Intel Skylake | RTX6000 | 8 | 32 | 192GB |
AMD Rome | RTX6000 | 8 | 256 | 960GB |
GPU Specification
The details of the different GPU types are:
GPU Type | Single Precision TFLOPS |
Double Precision TFLOPS |
Memory GB |
Memory Bandwidth GB/s |
CUDA Compute Capability | GPU Architecture |
---|---|---|---|---|---|---|
RTX6000* | 16.3 | <1 | 24 | 670 | 7.5 | Turing |
* Note that there are two versions of the RTX 6000, one based on the Turing architecture and one based on the Ada architecture, the RTX 6000s deployed at Imperial are the earlier Turing based RTX 6000s.