Example job scripts - GPU jobs
The tabs below have example job submission scripts for different types of GPU job. Note that not all software can make use of GPUs and you should consult an application's documentation to check. The Software pages have example GPU job scripts for certain applications, such as GROMACS, Amber and Matlab.
tabs content
Small GPU job
To improve capacity, some GPUs have been divided into multiple, smaller GPUs. Jobs can
request a 1/8th fraction of a GPU for testing and work not requiring an entire GPU. This unit is the default GPU type for the cuda queue.
#!/bin/bash# Example job requesting 1/8 GPU.# Slurm resource requests:#SBATCH –p cuda #submit the job to the CUDA queue#SBATCH –t 01:00:00 # time limit for job# uncomment one of the following two #SBATCH lines to allocate GPU resources.# Scripts with neither line uncommented will be# allocated one unit of the default type, ie. 1 * 1/8 GPU.##SBATCH --gres=gpu:1 # simpler notation##SBATCH --gres=gpu:h200_nvl_1g.18gb:1 # full notation, recommended# CPU cores, memory and temporary disk space will be allocated automatically.# Do not request them in this script.# Commands to be run:module load my_module./my_gpu_program
If saved in a file called my_gpu_job.sh, this could be submitted to the queue with the command:
sbatch my_gpu_job.sh
Large GPU job
This script is suitable for jobs that can make effective use of one or more whole GPUs.
#!/bin/bash# Example job requesting a whole GPU# Slurm resource requests:#SBATCH –p cuda #submit the job to the CUDA queue#SBATCH –t 01:00:00 # time limit for job#SBATCH --gres=gpu:h200_nvl:1 # request a number of whole GPUs# CPU cores, memory and temporary disk space will be allocated automatically.# Do not request them in this script.# Commands to be run:module load my_module./my_cuda_program
If saved in a file called my_gpu_job.sh, this could be submitted to the queue with the command:
sbatch my_gpu_job.sh
Multiple processes on one GPU
Jobs have exclusive access to the GPU resources they are allocated. However, some programs or problems can make only partial use of the capability of a GPU and in this case, you may wish to run multiple copies of a program within the job, where they each access the GPU.
By default, NVIDIA GPUs only allow a single process to use the GPU at any one time, so running (say) 2 copies of a program will mean that each program will only be able to use the GPU for half the time on average. As both copies cannot run at the same time, they cannot make use of the capacity unused by the other process, and performance will remain disappointing.
The NVIDIA CUDA Multi Process Service (MPS) can be used to get round this and allow multiple programs to use the GPU at the same time. It can be enabled by including the following lines to your job script, ahead of launching the processes that use the GPU:
# start MPS (multi-process sharing of a GPU)nvidia-cuda-mps-control -d
An example script would be:
#!/bin/bash# Example job requesting a whole GPU# Slurm resource requests:#SBATCH –p cuda #submit the job to the CUDA queue#SBATCH –t 01:00:00 # time limit for job#SBATCH --gres=gpu:h200_nvl:1 # request a number of whole GPUs# CPU cores, memory and temporary disk space will be allocated automatically.# Do not request them in this script.# Commands to be run:module load my_modulenvidia-cuda-mps-control -d./my_cuda_program
If saved in a file called my_gpu_job.sh, this could be submitted to the queue with the command:
sbatch my_gpu_job.sh
MPI jobs and GPUs
Applications that use MPI (Message Passing Interface) for parallelisation typically use sbatch options to select the number of processes (#SBATCH -n) and number of CPU cores per process (#SBATCH -c).
The cuda queue rejects these options as it provides a fixed CPU resource allocation based on the GPU request, so we are trialling an updated version of the mpirun command which includes a new option "--arc-par". This can be used to specify how many processes per available GPU or CPU core should be launched within the jobs' CPU allocation. The general form of the command to launch MPI program my_mpi_program is:
mpirun --arc-par <par_option> ./my_mpi_program
Where <par_option> typically takes the form Nppg - e.g. 1ppg, 2ppg, etc - to launch N MPI processes (ranks) per GPU
For example, for a job submitted to the cuda queue and allocated two whole H200 GPUs, the following line in the job script would launch 8 MPI processes:
mpirun.test --arc-par 4ppg ./my_mpi_program
However, if each MPI process attempts to use the same GPU, performance will be disappointing unless the job launches NVIDIA MPS first, as in the example below. The "Multiple processes on a single GPU" tab has further information on MPS.
Example job script:
#!/bin/bash# Example job requesting 2 whole GPUs# and running 4 MPI processes/ranks per GPU# Slurm resource requests:#SBATCH –p cuda #submit the job to the CUDA queue#SBATCH –t 01:00:00 # time limit for job#SBATCH --gres=gpu:h200_nvl:2 # request a number of whole GPUs# CPU cores, memory and temporary disk space will be allocated automatically.# Do not request them in this script.# Launch NVIDIA MPS so that multiple MPI processes/ranks can# access each GPU at oncenvidia-cuda-mps-control -d# Other commands to be run:module load my_modulempirun --arc-par 4ppg ./my_cuda_program
If saved in a file called my_gpu_job.sh, this could be submitted to the queue with the command:
sbatch my_gpu_job.sh