Hyperion system¶

Job submission¶

Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.

QoS and partitions¶

Users can request a partition for each job they submit. These are the available partitions:

Partition	Nodelist
general (D)	hyperion-[001-005,011-175,177-179,182,206,208-224,234-254,257,262-263,282-284]
preemption	hyperion-[176,180-181,183-205,207,225-233,255-256,260-261,264-281]
preemption-gpu	hyperion-[253,282-284]

^{*(D) = Default partition}

On Hyperion, we conceptualize a partition as a set of nodes to which we can associate a Quality of Service (QoS). As such, we have the general partition that encompasses all nodes for public or general use, the preemption partition, which exhibits distinct behavior that we will elaborate on later in the Special partition: the "preemption" partition section down below. This preemption partition includes all nodes exclusively designated for use by their respective ownership groups. And the preemption-gpu partition, which behavior is explained on the Special partition: the "preemption-gpu" partition section.

Having explained this, users must select one of the following QoSs when submitting a job on the general and preemption partitions:

QoS	Priority	MaxWall	MaxNodesPU	MaxJobsPU	MaxSubmitPU	MaxTRES
regular (D)	200	1-00:00:00	60	180
test	1000	00:10:00	2	2	2
long	200	2-00:00:00	24	40
xlong	200	8-00:00:00	20	20	200
serial	200	2-00:00:00		500	2000	CPUs=1 GPUs=0 nodes=1

The preemption-gpu partition contains its own QoS so it can be preempted by the QoSs on the general partition:

QoS	Priority	MaxWall	MaxTRES
preemption-gpu (D)	200	8-00:00:00	CPUs=12, GPUs=3, nodes=1

^{*(D) = Default QoS}

Warning

A global MaxSubmitPU of 1000 jobs has been established, so the sum of all the submitted jobs through all QoSs can't be higher than 1000, with the excepción of the serial QoS, where up to 2000 jobs can be submitted.

This is what each columns means:

MaxWall: Maximum amount of time the job is allowed to run. 1-00:00:00 reads as one day or 24 hours.
MaxNodesPU: Maximum amount of nodes user's jobs can use at a given time.
MaxJobsPU: Maximum number of running jobs per user.
MaxSubmitPU: Maximum number of jobs that can be submitted to the QoS/partition.
MaxTRES: Maximum amount of trackable resources.

Tip

If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate partition.

Special partition: the "preemption" partition¶

As mentioned before, this partition includes all the nodes belonging to specific groups, accessible only from partitions exclusive to these groups. Many of these nodes often remain in an idle state for extended periods. Henceforth, these nodes will be also available to all the cluster users through the preemption partition, having this following behavior:

When a job is submitted to the preemption partition:
- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use, the job will remain in a pending state, same as on the general partition.
When a job is submitted to a private partition accessible only to the nodes' owners:
- If the requested resources are available, the job will start running instantly.
- If the requested resources are in use by jobs submitted to the "preemption" partition, those jobs will be canceled and requeued to preemption.
- If the requested resources are in use by jobs submitted to this private partition, the job will remain in a pending state.

The procedure to submit jobs to the preemption partition is as follows:

#SBATCH --partition=preemption
#SBATCH --qos=<regular,serial,test,long or xlong>

Warning

We want to emphasize that jobs submitted to the preemption will be canceled and requeued in case the nodes' owners require them, initiating their execution from the first step if manual checkpoints are not configured.

Basic submission script for MPI applications:

Generic batch script for MPI based applications

#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

srun binary < input

Basic submission script for OpenMP applications:

For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS or SLURM's --cpus-per-task job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.

Generic batch script for OpenMP applications

#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input

Basic submission script for Hybrid (MPI+OpenMP) applications:

Generic batch script for Hybrid (MPI+OpenMP) jobs

#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input

Node Selection in Hyperion¶

By default, computations on Hyperion will be directed to either Icelake, Cascadelake or Emerald Rapid nodes, but never a mix of them. To ensure your calculations run on a specific type of node, you can specify a constraint in your batch script:

#SBATCH --constraint=icelake

or

#SBATCH --constraint=cascadelake

You can also specify the microarchitecture constraint as a command-line option:

$ sbatch --constraint=<microarchitecture>

This constraint is also applicable when using srun and salloc.