Atlas EDR system¶

How to log into Atlas EDR¶

To establish a connection a SSH client is necessary. Please, refer to SSH for more information.

Establish connection Atlas-EDR:

$ ssh username@atlas-edr.sw.ehu.es

You can also establish direct connection with the login nodes:

$ ssh username@atlas-edr-login-01.sw.ehu.es
$ ssh username@atlas-edr-login-02.sw.ehu.es

You would need to bring your files and data over, compile your code or use the compiled one, and create a batch submission script. Then submit that script so that your application runs on the compute nodes. Pay attention to the various file systems available and the choices in programming environments.

Specifications¶

Compute Node	# nodes	Processor	# of cores	Memory	Accelerator
atlas-[249-256]	8	Intel Xeon Platinum 8168	48	384 GB	2x NVIDIA Tesla P40
atlas-[257-280]	24	Intel Xeon Platinum 8168	48	384 GB	1x NVIDIA Tesla P40
atlas-[281-283]	3	Intel Xeon Platinum 8168	48	64 GB	-
atlas-[284-285]	2	Intel Xeon Platinum 8168	48	384 GB	-
atlas-[286-293]	8	Intel Xeon Platinum 8280	56	192 GB	-
atlas-295	1	Intel Xeon Platinum 8268	48	384 GB	-
atlas-[296-298]	3	Intel Xeon Platinum 8268	48	96 GB	-
atlas-[301-303]	3	Intel Xeon Platinum 8268	48	96 GB	-
atlas-[304-327]	24	Intel Xeon Gold 6248R	48	192 GB	-
atlas-[328-334]	7	Intel Xeon Gold 6248R	48	96 GB	2x NVIDIA RTX 3090

Atlas EDR employs Infiniband EDR technology for the interconnection network. Network topology is a fat-tree with a 5:1 blocking factor.

Filesystems and IO¶

Filesystem	Mount point	Quota	Size	Purpose	Backup
scratch	/scratch	1.5TB	200 TB	running jobs	No
lscratch	/lscratch	None	750 GB	running single node jobs	No
Home directories	/dipc	None	880 TB	storage	Daily

Atlas-EDR has 2 login nodes: atlas-edr-login-01.sw.ehu.es and atlas-edr-login-02.sw.ehu.es.
Each node has two sockets populated with a 48 core Intel Xeon Platinum 8260 each.
Each node has 64 GB of RAM.

Warning

Remember that login nodes should only be used to do small tasks or compilation and not to run interactive jobs.

Job submission¶

Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.

QoS and partitions¶

Users can request a quality of service (QoS) or partition for each job they submit. These are the available QoS:

QoS/Partition	Priority	MaxWall	MaxNodesPU	MaxJobsPU	MaxSubmitPU	MaxTRES
regular	200	1-00:00:00	24	50
test	500	00:10:00	2	2	2
long	200	2-00:00:00	24	20
xlong	200	8-00:00:00	6	14
large	200	2-00:00:00	40	6
xlarge	200	2-00:00:00	80	6
serial	200	2-00:00:00	24	120		cpu=1 gpu=1 node=1

This is what each columns means:

MaxWall: Maximum amount of time the job is allowed to run. 1-00:00:00 reads as one day or 24 hours.
MaxNodesPU: Maximum amount of nodes user's jobs can use at a given time.
MaxJobsPU: Maximum number of running jobs per user.
MaxSubmitPU: Maximum number of jobs that can be submitted to the QoS/partition.

Tip

If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate Quality of Service (QOS).

`srun`¶

Using srun in your batch scripts simplifies jobs execution. srun will directly gather the information provided in the resource specification list and will allocate the resources for the job automatically. Some example of batch script using srun are provided bellow.

Basic submission script for MPI applications¶

Atlas: SLURM with srun

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

srun binary < input

Atlas: SLURM

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

mpirun -np $SLURM_NTASKS binary < input

Basic submission script for OpenMP applications¶

For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS or SLURM's --cpus-per-task job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.

Batch script: OpenMP job with srun

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input

Batch script: OpenMP job

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

binary < input

Basic submission script for Hybrid (MPI+OpenMP) applications¶

Batch script: Hybrid (MPI+OpenMP) job

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np $SLURM_NTASKS binaryi < input

Batch script: Hybrid (MPI+OpenMP) job with srun

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -n $SLURM_NTASKS binary

Basic submission script for GPGPU capable applications¶

Batch script: Requesting 1 GPU job with srun

#!/bin/bash
#SBATCH --partition=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --gres=gpu:p40:1
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input

In general you can request up to 2 GPUs per node. In order to do that you can adjust the corresponding line in the batch script:

#SBATCH --gres=gpu:p40:2

Running jobs on GPUs¶

There are two different types of GPUs on Atlas EDR:

Compute Node	GPU	How to request the GPU with SLURM
atlas-249 - atlas-256	2x NVIDIA Tesla P40	`#SBATCH --gres=gpu:p40:2` (up to 2 GPUs per node)
atlas-257 - atlas-280	1x NVIDIA Tesla P40	`#SBATCH --gres=gpu:p40:1` (up to 1 GPUs per node)
atlas-328 - atlas-331	2x NVIDIA RTX 3090	`#SBATCH --gres=gpu:rtx3090:2` (up to 2 GPUs per node)

Warning

You would need to take into account that in order to run your code on NVIDIA RTX 3090 GPUs your code or the compiled software you can find on the platform must be built using CUDA 11 or a higher version or compiled with a CUDA capable toolchain with version 2020b or higher.

This sums it up:

GPU	CUDA version	Toolchain
NVIDIA RTX 3090	>= 11.0	`>= fosscuda/2020b`, `>= intelcuda/2020b`
Tesla P40	>= 8.0	Any toolchain with CUDA support

Software¶

Compiling your code¶

Intel compilers are recommended for building your applications on Atlas-EDR. There is no system default modulefile that takes care of this. Use the module avail command to see what versions are available and load an Intel compiler module before compiling. For example:

$ module load intel/2020a

Notice that when a compiler module is loaded, some environment variables are set or modified to add the paths to certain commands, include files, or libraries, to your environment. This helps to simplify the way you do your work.

As an alternative, Atlas-EDR also offers a collection of open source tools such as compilers or scientific libraries. Use module avail command to see versions available. For example:

module load intel/2020a
module load FFTW/3.3.8-intel-2020a

Limits and policies¶

Limit type	Max. proccessors per user
Soft	274
Hard	2224