Hyperion system¶
Job submission¶
Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.
QoS and partitions¶
Users can request a partition for each job they submit. These are the available partitions:
Partition | Nodelist |
---|---|
general (D) | hyperion-[001-005,011-175,177-179,182,206,208-224,234-254,257,262-263] |
preemption | hyperion-[176,180-181,183-205,207,225-233,255-256,260-261] |
*(D) = Default partition
On Hyperion, we conceptualize a partition as a set of nodes to which we can associate a Quality of Service (QoS). As such, we only have two partitions, being one the general
partition that encompasses all nodes for public or general use, and the preemption
partition, which exhibits distinct behavior that we will elaborate on later. This preemption
partition includes all nodes exclusively designated for use by their respective ownership groups.
Having explained this, users must select one of the following QoSs when submitting a job:
QoS | Priority | MaxWall | MaxNodesPU | MaxJobsPU | MaxSubmitPU | MaxTRES |
---|---|---|---|---|---|---|
regular (D) | 200 | 1-00:00:00 | 100 | 300 | ||
test | 1000 | 00:10:00 | 2 | 2 | 2 | |
long | 200 | 2-00:00:00 | 24 | 20 | ||
xlong | 200 | 8-00:00:00 | 20 | 10 | ||
serial | 200 | 2-00:00:00 | 500 | cpu=1 gpu=1 node=1 |
*(D) = Default QoS
This is what each columns means:
MaxWall
: Maximum amount of time the job is allowed to run.1-00:00:00
reads as one day or 24 hours.MaxNodesPU
: Maximum amount of nodes user's jobs can use at a given time.MaxJobsPU
: Maximum number of running jobs per user.MaxSubmitPU
: Maximum number of jobs that can be submitted to the QoS/partition.MaxTRES
: Maximum amount of trackable resources.
Tip
If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate partition.
Special partition: the "preemption" partition¶
As mentioned before, this partition includes all the nodes belonging to specific groups, accessible only from partitions exclusive to these groups. Many of these nodes often remain in an idle
state for extended periods. Henceforth, these nodes will be also available to all the cluster users through the preemption
partition, having this following behavior:
-
When a job is submitted to the
preemption
partition:- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use, the job will remain in a pending state, same as on the
general
partition.
-
When a job is submitted to a private partition accessible only to the nodes' owners:
- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use by jobs submitted to the "preemption" partition, those jobs will be canceled and requeued to
preemption
. - If the requested resources are in use by jobs submitted to this private partition, the job will remain in a pending state.
The procedure to submit jobs to the preemption
partition is as follows:
#SBATCH --partition=preemption
#SBATCH --qos=<regular,serial,test,long or xlong>
Warning
We want to emphasize that jobs submitted to the preemption
will be canceled and requeued in case the nodes' owners require them, initiating their execution from the first step if manual checkpoints are not configured.
Basic submission script for MPI applications:
Generic batch script for MPI based applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48
module load program/program_version
srun binary < input
Basic submission script for OpenMP applications:
For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS
or SLURM's --cpus-per-task
job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.
Generic batch script for OpenMP applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
srun binary < input
Basic submission script for Hybrid (MPI+OpenMP) applications:
Generic batch script for Hybrid (MPI+OpenMP) jobs
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input
Using GPU accelerators¶
Basic submission script for GPGPU capable applications:
Generic batch script requesting 1 GPU
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:rtx3090:1
#SBATCH --mem=90gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
srun binary < input
You can request up to one NVIDIA RTX A6000 two NVIDIA RTX 3090 and eight NVIDIA A100 per node. In order to do that you can adjust the corresponding line in the batch script:
#SBATCH --gres=gpu:rtx3090:2
or
#SBATCH --gres=gpu:a100:8
There are two different types of GPUs on Hyperion:
Compute Node | GPU | How to request the GPU with SLURM |
---|---|---|
hyperion-[030, 052, 074, 96, 118, 140, 162, 182] | 2x NVIDIA RTX 3090 24GB | #SBATCH --gres=gpu:rtx3090:2 |
hyperion-[253,255-256] | 8x NVIDIA A100 SXM4 80GB | #SBATCH --gres=gpu:a100:8 #SBATCH --constraint=a100-sxm4 |
hyperion-[252,254,257] | 8x NVIDIA A100 PCIe 80GB | #SBATCH --gres=gpu:a100:8 #SBATCH --constraint=a100-pcie |
hyperion-263 | 1x NVIDIA RTX A6000 48GB | #SBATCH --gres=gpu:a6000:1 |
As specified on the table above, the selection of a specific type of NVIDIA A100 GPUs can be done by adding the corresponding --constraint
to the submission script.
If no --constraint
is specified, either type of A100 can be assigned for the job.
Node Selection in Hyperion¶
By default, computations on Hyperion will be directed to either Icelake nodes or Cascadelake nodes, but never a mix of both. To ensure your calculations run on a specific type of node, you can specify a constraint in your batch script:
#SBATCH --constraint=icelake
or
#SBATCH --constraint=cascadelake
You can also specify the microarchitecture constraint as a command-line option:
$ sbatch --constraint=<microarchitecture>
This constraint is also applicable when using srun
and salloc
.
GPU and Node Allocation¶
When you request GPU resources through the gres
option in your SLURM script, you are allocated a node with the specified type of GPU. The current configuration is such that:
- Nodes equipped with NVIDIA RTX 3090 GPUs have Cascadelake microarchitecture processors.
- Nodes equipped with NVIDIA RTX A6000 GPUs have Icelake microarchitecture processors.
- Nodes equipped with NVIDIA A100 GPUs have Icelake microarchitecture processors.
Specific features for GPU nodes have been defined: gpu-cascadelake
and gpu-icelake
. However, specifying these features is not necessary at the moment since the gres
option and the --constraint
option provide the necessary information for node allocation based on GPU type and microarchitecture.