Hyperion system¶
Job submission¶
Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.
QoS and partitions¶
Users can request a partition for each job they submit. These are the available partitions:
Partition | Nodelist |
---|---|
general (D) | hyperion-[001-005,011-175,177-179,182,206,208-224,234-254,257,262-263,282-284] |
preemption | hyperion-[176,180-181,183-205,207,225-233,255-256,260-261,264-281] |
preemption-gpu | hyperion-[253,282-284] |
*(D) = Default partition
On Hyperion, we conceptualize a partition as a set of nodes to which we can associate a Quality of Service (QoS). As such, we have the general
partition that encompasses all nodes for public or general use, the preemption
partition, which exhibits distinct behavior that we will elaborate on later in the Special partition: the "preemption" partition section down below. This preemption
partition includes all nodes exclusively designated for use by their respective ownership groups. And the preemption-gpu partition, which behavior is explained on the Special partition: the "preemption-gpu" partition section.
Having explained this, users must select one of the following QoSs when submitting a job on the general
and preemption
partitions:
QoS | Priority | MaxWall | MaxNodesPU | MaxJobsPU | MaxSubmitPU | MaxTRES |
---|---|---|---|---|---|---|
regular (D) | 200 | 1-00:00:00 | 60 | 180 | ||
test | 1000 | 00:10:00 | 2 | 2 | 2 | |
long | 200 | 2-00:00:00 | 25 | 40 | ||
xlong | 200 | 8-00:00:00 | 20 | 20 | 200 | |
serial | 200 | 2-00:00:00 | 1000 | 2000 | CPUs=1 GPUs=0 nodes=1 |
The preemption-gpu
partition contains its own QoS so it can be preempted by the QoSs on the general
partition:
QoS | Priority | MaxWall | MaxTRES |
---|---|---|---|
preemption-gpu (D) | 200 | 8-00:00:00 | CPUs=12, GPUs=3, nodes=1 |
*(D) = Default QoS
Warning
A global MaxSubmitPU of 1000 jobs has been established, so the sum of all the submitted jobs through all QoSs can't be higher than 1000, with the excepción of the serial QoS, where up to 2000 jobs can be submitted.
This is what each columns means:
MaxWall
: Maximum amount of time the job is allowed to run.1-00:00:00
reads as one day or 24 hours.MaxNodesPU
: Maximum amount of nodes user's jobs can use at a given time.MaxJobsPU
: Maximum number of running jobs per user.MaxSubmitPU
: Maximum number of jobs that can be submitted to the QoS/partition.MaxTRES
: Maximum amount of trackable resources.
Tip
If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate partition.
Special partition: the "preemption" partition¶
As mentioned before, this partition includes all the nodes belonging to specific groups, accessible only from partitions exclusive to these groups. Many of these nodes often remain in an idle
state for extended periods. Henceforth, these nodes will be also available to all the cluster users through the preemption
partition, having this following behavior:
-
When a job is submitted to the
preemption
partition:- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use, the job will remain in a pending state, same as on the
general
partition.
-
When a job is submitted to a private partition accessible only to the nodes' owners:
- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use by jobs submitted to the "preemption" partition, those jobs will be canceled and requeued to
preemption
. - If the requested resources are in use by jobs submitted to this private partition, the job will remain in a pending state.
The procedure to submit jobs to the preemption
partition is as follows:
#SBATCH --partition=preemption
#SBATCH --qos=<regular,serial,test,long or xlong>
Warning
We want to emphasize that jobs submitted to the preemption
will be canceled and requeued in case the nodes' owners require them, initiating their execution from the first step if manual checkpoints are not configured.
Basic submission script for MPI applications:
Generic batch script for MPI based applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48
module load program/program_version
srun binary < input
Basic submission script for OpenMP applications:
For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS
or SLURM's --cpus-per-task
job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.
Generic batch script for OpenMP applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
srun binary < input
Basic submission script for Hybrid (MPI+OpenMP) applications:
Generic batch script for Hybrid (MPI+OpenMP) jobs
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input
Node Selection in Hyperion¶
By default, computations on Hyperion will be directed to either Icelake, Cascadelake or Emerald Rapid nodes, but never a mix of them. To ensure your calculations run on a specific type of node, you can specify a constraint in your batch script:
#SBATCH --constraint=icelake
or
#SBATCH --constraint=cascadelake
or
#SBATCH --constraint=emerald
You can also specify the microarchitecture constraint as a command-line option:
$ sbatch --constraint=<microarchitecture>
This constraint is also applicable when using srun
and salloc
.