Skip to content

Hyperion system

Job submission

Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.

QoS and partitions

Users can request a partition for each job they submit. These are the available partitions:

Partition Nodelist
general (D) hyperion-[001-005,011-175,177-179,182,206,208-224,234-254,257,262-263]
preemption hyperion-[176,180-181,183-205,207,225-233,255-256,260-261,264-281]

*(D) = Default partition

On Hyperion, we conceptualize a partition as a set of nodes to which we can associate a Quality of Service (QoS). As such, we only have two partitions, being one the general partition that encompasses all nodes for public or general use, and the preemption partition, which exhibits distinct behavior that we will elaborate on later. This preemption partition includes all nodes exclusively designated for use by their respective ownership groups.

Having explained this, users must select one of the following QoSs when submitting a job:

QoS Priority MaxWall MaxNodesPU MaxJobsPU MaxSubmitPU MaxTRES
regular (D) 200 1-00:00:00 60 180
test 1000 00:10:00 2 2 2
long 200 2-00:00:00 24 40
xlong 200 8-00:00:00 20 20 200
serial 200 2-00:00:00 500 2000 cpu=1
gpu=0
node=1

*(D) = Default QoS

Warning

A global MaxSubmitPU of 1000 jobs has been established, so the sum of all the submitted jobs through all QoSs can't be higher than 1000, with the excepci├│n of the serial QoS, where up to 2000 jobs can be submitted.

This is what each columns means:

  • MaxWall: Maximum amount of time the job is allowed to run. 1-00:00:00 reads as one day or 24 hours.
  • MaxNodesPU: Maximum amount of nodes user's jobs can use at a given time.
  • MaxJobsPU: Maximum number of running jobs per user.
  • MaxSubmitPU: Maximum number of jobs that can be submitted to the QoS/partition.
  • MaxTRES: Maximum amount of trackable resources.

Tip

If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate partition.

Special partition: the "preemption" partition

As mentioned before, this partition includes all the nodes belonging to specific groups, accessible only from partitions exclusive to these groups. Many of these nodes often remain in an idle state for extended periods. Henceforth, these nodes will be also available to all the cluster users through the preemption partition, having this following behavior:

  1. When a job is submitted to the preemption partition:

    • If the requested resources are available, the job will start running instantly.
    • If the requested resources are in use, the job will remain in a pending state, same as on the general partition.
  2. When a job is submitted to a private partition accessible only to the nodes' owners:

    • If the requested resources are available, the job will start running instantly.
    • If the requested resources are in use by jobs submitted to the "preemption" partition, those jobs will be canceled and requeued to preemption.
    • If the requested resources are in use by jobs submitted to this private partition, the job will remain in a pending state.

The procedure to submit jobs to the preemption partition is as follows:

#SBATCH --partition=preemption
#SBATCH --qos=<regular,serial,test,long or xlong>

Warning

We want to emphasize that jobs submitted to the preemption will be canceled and requeued in case the nodes' owners require them, initiating their execution from the first step if manual checkpoints are not configured.

Basic submission script for MPI applications:

Generic batch script for MPI based applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

srun binary < input 

Basic submission script for OpenMP applications:

For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS or SLURM's --cpus-per-task job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.

Generic batch script for OpenMP applications
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input

Basic submission script for Hybrid (MPI+OpenMP) applications:

Generic batch script for Hybrid (MPI+OpenMP) jobs
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun binary < input

Using GPU accelerators

Basic submission script for GPGPU capable applications:

Generic batch script requesting 1 GPU
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:rtx3090:1
#SBATCH --mem=90gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input

You can request up to one NVIDIA RTX A6000 two NVIDIA RTX 3090 and eight NVIDIA A100 per node. In order to do that you can adjust the corresponding line in the batch script:

#SBATCH --gres=gpu:rtx3090:2
or
#SBATCH --gres=gpu:a100:8

There are two different types of GPUs on Hyperion:

Compute Node
GPU
How to request the GPU with SLURM
hyperion-[030, 052, 074, 96, 118, 140, 162, 182] 2x NVIDIA RTX 3090 24GB #SBATCH --gres=gpu:rtx3090:2
hyperion-[253,255-256] 8x NVIDIA A100 SXM4 80GB #SBATCH --gres=gpu:a100:8
#SBATCH --constraint=a100-sxm4
hyperion-[252,254,257] 8x NVIDIA A100 PCIe 80GB #SBATCH --gres=gpu:a100:8
#SBATCH --constraint=a100-pcie
hyperion-263 1x NVIDIA RTX A6000 48GB #SBATCH --gres=gpu:a6000:1

As specified on the table above, the selection of a specific type of NVIDIA A100 GPUs can be done by adding the corresponding --constraint to the submission script.

If no --constraint is specified, either type of A100 can be assigned for the job.

Node Selection in Hyperion

By default, computations on Hyperion will be directed to either Icelake nodes or Cascadelake nodes, but never a mix of both. To ensure your calculations run on a specific type of node, you can specify a constraint in your batch script:

#SBATCH --constraint=icelake 

or

#SBATCH --constraint=cascadelake

You can also specify the microarchitecture constraint as a command-line option:

$ sbatch --constraint=<microarchitecture>

This constraint is also applicable when using srun and salloc.

GPU and Node Allocation

When you request GPU resources through the gres option in your SLURM script, you are allocated a node with the specified type of GPU. The current configuration is such that:

  • Nodes equipped with NVIDIA RTX 3090 GPUs have Cascadelake microarchitecture processors.
  • Nodes equipped with NVIDIA RTX A6000 GPUs have Icelake microarchitecture processors.
  • Nodes equipped with NVIDIA A100 GPUs have Icelake microarchitecture processors.

Specific features for GPU nodes have been defined: gpu-cascadelake and gpu-icelake. However, specifying these features is not necessary at the moment since the gres option and the --constraint option provide the necessary information for node allocation based on GPU type and microarchitecture.