Hyperion system¶
How to log into Hyperion¶
To establish a connection a SSH client is necessary. Please, refer to SSH for more information.
Establish connection with Hyperion:
$ ssh username@hyperion.sw.ehu.es
You would need to bring your files and data over, compile your code or use the compiled one, and create a batch submission script. Then submit that script so that your application runs on the compute nodes. Pay attention to the various file systems available and the choices in programming environments.
Specifications¶
Compute Node Range | Processor | # of cores | Memory | Accelerator | Total Nodes |
---|---|---|---|---|---|
hyperion-[001-007], [023-029], [045-051], [067-073], [089-095], [111-117], [133-139], [155-161], [177-181], [206] | Intel Xeon Gold 6342 (Icelake) | 48 | 256 GB | - | 63 |
hyperion-[008], [030], [052], [074], [096], [118], [140], [162], [182] | Intel Xeon Gold 6248R (Cascadelake) | 48 | 96 GB | 2x NVIDIA RTX 3090 | 9 |
hyperion-[009-022], [031-044], [053-066], [075-088], [097-110], [119-132], [141-154], [163-176], [183-205], [207] | Intel Xeon Gold 6248R (Cascadelake) | 48 | 96 GB | - | 106 |
Hyperion employs Infiniband HDR technology for the interconnection network.
Filesystems and IO¶
Filesystem | Mount point | Quota | Size | Purpose | Backup |
---|---|---|---|---|---|
scratch | /scratch | 1.5 TB | 600 TB | running jobs | No |
lscratch | /lscratch | None | - | running single node jobs | No |
Home directories | /home | 50 GB | 44 TB | storage, dotfiles, config files | No |
Data directories | /data | 3 TB | 600 TB | storage | No |
Login Nodes¶
- Hyperion has 2 login nodes:
hyperion-login-01.sw.ehu.es
andhyperion-login-02.sw.ehu.es
. - Each node has two sockets populated with a 48 core Intel Xeon Platinum 8362 each.
- Each node has 256 GB of RAM.
Warning
Remember that login nodes should only be used to do small tasks or compilation and not to run interactive jobs.
Job submission¶
Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.
QoS and partitions¶
Users can request a quality of service (QoS) or partition for each job they submit. These are the available QoS:
QoS/Partition | Priority | MaxWall | MaxNodesPU | MaxJobsPU | MaxSubmitPU | MaxTRES |
---|---|---|---|---|---|---|
regular | 200 | 1-00:00:00 | 20 | |||
test | 500 | 00:10:00 | 2 | 2 | 2 | |
long | 200 | 2-00:00:00 | 20 | |||
xlong | 200 | 8-00:00:00 | 20 | |||
gpu | 200 | 2-00:00:00 | 4 | |||
serial | 200 | 2-00:00:00 | 20 | 120 | cpu=1 gpu=1 node=1 |
This is what each columns means:
MaxWall
: Maximum amount of time the job is allowed to run.1-00:00:00
reads as one day or 24 hours.MaxNodesPU
: Maximum amount of nodes user's jobs can use at a given time.MaxJobsPU
: Maximum number of running jobs per user.MaxSubmitPU
: Maximum number of jobs that can be submitted to the QoS/partition.
Tip
If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate Quality of Service (QOS).
srun
¶
Using srun
in your batch scripts simplifies jobs execution. srun
will directly gather the information provided in the resource specification list and will allocate the resources for the job automatically. Some example of batch script using srun
are provided bellow.
Basic submission script for MPI applications¶
Batch script: SLURM with srun
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48
module load program/program_version
srun binary < input
Batch script: SLURM
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48
module load program/program_version
mpirun -np $SLURM_NTASKS binary < input
Basic submission script for OpenMP applications¶
For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS
or SLURM's --cpus-per-task
job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.
Batch script: OpenMP job with srun
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
srun binary < input
Batch script: OpenMP job
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=20gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
binary < input
Basic submission script for Hybrid (MPI+OpenMP) applications¶
Batch script: Hybrid (MPI+OpenMP) job
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np $SLURM_NTASKS binaryi < input
Batch script: Hybrid (MPI+OpenMP) job with srun
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=20gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12
module load program/program_version
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -n $SLURM_NTASKS binary
Basic submission script for GPGPU capable applications¶
Batch script: Requesting 1 GPU job with srun
#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:rtx3090:1
#SBATCH --mem=90gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
module load program/program_version
srun binary < input
In general you can request up to 2 GPUs per node. In order to do that you can adjust the corresponding line in the batch script:
#SBATCH --gres=gpu:rtx3090:2
Running jobs on GPUs¶
There are two different types of GPUs on Atlas EDR:
Compute Node | GPU | How to request the GPU with SLURM |
---|---|---|
hyperion-[030, 052, 074, 96, 118, 140, 162, 182] | 2x NVIDIA RTX 3090 | #SBATCH --gres=gpu:rtx3090:2 (up to 2 GPUs per node) |
- | 8x NVIDIA A100 | #SBATCH --gres=gpu:a100:8 (up to 8 GPUs per node) |
Software¶
Cluster Architecture Considerations¶
Hyperion is a heterogeneous cluster, composed of nodes with various microarchitectures including Cascadelake, ICelake, and potentially future Sapphire Rapid nodes.
When compiling your code, it's essential to target the specific microarchitecture of the node you intend to run on. For reliable results and performance, compile and execute programs on nodes of the same microarchitecture.
For instance, if you're compiling your code for a Cascadelake node, you should also run it on a Cascadelake node. Using binaries compiled for one microarchitecture on a different microarchitecture may lead to unpredictable behaviors, performance issues, or even application crashes.
Compiling your code¶
Intel compilers are recommended for building your applications on Hyperion. There is no system default modulefile that takes care of this. Use the module avail command to see what versions are available and load an Intel compiler module before compiling. For example:
$ module load intel/2022a
Notice that when a compiler module is loaded, some environment variables are set or modified to add the paths to certain commands, include files, or libraries, to your environment. This helps to simplify the way you do your work.
As an alternative, Hyperion also offers a collection of open source tools such as compilers or scientific libraries. Use module avail
command to see versions available. For example:
module avail intel
module avail FFTW
To learn more about compilers and scientific libraries checkout Environment Modules.