Skip to content

Atlas FDR system

Atlas

How to log into Atlas FDR

To establish a connection a SSH client is necessary. Please, refer to SSH for more information.

Establish connection Atlas-FDR:

$ ssh username@atlas-fdr.sw.ehu.es

As long as you are in our subnet of using EHU/UPVs VPN, you also establish direct connection with the login nodes:

$ ssh username@atlas-fdr-login-01.sw.ehu.es
$ ssh username@atlas-fdr-login-02.sw.ehu.es

You would need to bring your files and data over, compile your code or use the compiled one, and create a batch submission script. Then submit that script so that your application runs on the compute nodes. Pay attention to the various file systems available and the choices in programming environments.

Specifications

Compute Node # nodes Processor # of cores memory (GB)
atlas-[001-018] 18 Intel Xeon E5-2680 v3 24 512
atlas-[019-073] 55 Intel Xeon E5-2680 v3 24 128
atlas-[074-106] 33 Intel Xeon E5-2680 v3 24 256
atlas-[107-170] 64 Intel Xeon E5-2683 v4 32 256
atlas-[171-194] 24 Intel Xeon Gold 6140 36 384
atlas-[195-199] 5 Intel Xeon Gold 6140 36 1536
atlas-[200-201] 2 Intel Xeon E5-2683 v4 32 256
atlas-[202-209] 8 Intel Xeon Platinum 8164 52 128

Filesystems and IO

Filesystem Mount point Quota Size Purpose Backup
scratch /scratch 500 GB 88 TB running jobs No
lscratch /lscratch None 700 GB running single node jobs No
Home directories /dipc None 880 TB storage Daily

Login Nodes

  • Atlas FDR has 2 login nodes (atlas-fdr-login-01 and atlas-fdr-login-02).
  • Each node has two sockets populated with a 6 core Intel Xeon E5-2609 v3 each.
  • Each node has 64 GB of RAM.

Warning

Remember that login nodes should only be used to do small tasks or compilation and not to run interactive jobs.

In case a multi-process execution or high memory demanding process is detected on a login node, all the user processes will be terminated, and the user will be banned from the cluster until contacting with suppor-cc@dipc.org

Job submission

Here you will find some batch scripts you can use as template to submit your jobs. For more specific information about how to submit jobs please visit SLURM resource manager webpage.

QoS and partitions

Users can request a partition for each job they submit. These are the available partitions:

Partition Description
general (D) Partition that includes all the publicly available nodes.
preemption Partition that includes nodes belonging to specific research groups and that are publicly available with restrictions.

*(D) = Default partition

On Atlas-FDR, we conceptualize a partition as a set of nodes to which we can associate a Quality of Service (QoS). As such, we only have two partitions, being one the general partition that encompasses all nodes for public or general use, and the preemption partition, which exhibits distinct behavior that we will elaborate on later. This preemption partition includes all nodes exclusively designated for use by their respective ownership groups.

Having explained this, users must select one of the following QoSs when submitting a job:

QoS Priority MaxWall MaxNodesPU MaxJobsPU MaxSubmitPU MaxTRES
regular (D) 200 1-00:00:00 24 50
test 1000 00:10:00 2 2 2
long 200 2-00:00:00 24 20
xlong 200 8-00:00:00 12 10
serial 200 2-00:00:00 500 cpu=1
gpu=1
node=1

*(D) = Default QoS

This is what each columns means:

  • MaxWall: Maximum amount of time the job is allowed to run. 1-00:00:00 reads as one day or 24 hours.
  • MaxNodesPU: Maximum amount of nodes user's jobs can use at a given time.
  • MaxJobsPU: Maximum number of running jobs per user.
  • MaxSubmitPU: Maximum number of jobs that can be submitted to the QoS/partition.
  • MaxTRES: Maximum amount of trackable resources.

Tip

If your jobs require longer execution times or more nodes, contact us. Limits can be adjusted and custom QoS/partitions can be temporarily created to match your purposes by specifying an appropriate Quality of Service (QOS).

srun

Using srun in your batch scripts simplifies jobs execution. srun will directly gather the information provided in the resource specification list and will allocate the resources for the job automatically. Some example of batch script using srun are provided bellow.

Basic submission script for MPI applications

Atlas: SLURM with srun
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

srun binary < input 
Atlas: SLURM
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=1
#SBATCH --mem=200gb
#SBATCH --nodes=8
#SBATCH --ntasks-per-node=48

module load program/program_version

mpirun -np $SLURM_NTASKS binary < input 

Basic submission script for OpenMP applications

For a OpenMP application the number of threads can be controlled defining the OMP_NUM_THREADS or SLURM's --cpus-per-task job directive. If this variable is not defined, the number of threads created will be equal to the amount of cores reserved in your cpuset, that is, the number of cores requested in the batch script.

Batch script: OpenMP job with srun
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

srun binary < input
Batch script: OpenMP job
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=48
#SBATCH --mem=200gb
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

binary < input

Basic submission script for Hybrid (MPI+OpenMP) applications

Batch script: Hybrid (MPI+OpenMP) job
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
mpirun -np $SLURM_NTASKS binaryi < input
Batch script: Hybrid (MPI+OpenMP) job with srun
#!/bin/bash
#SBATCH --qos=regular
#SBATCH --job-name=JOB_NAME
#SBATCH --cpus-per-task=4
#SBATCH --mem=200gb
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=12

module load program/program_version

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
srun -n $SLURM_NTASKS binary

Software

Compiling your code

Intel compilers are recommended for building your applications on Atlas FDR. There is no system default modulefile that takes care of this. Use the module avail command to see what versions are available and load an Intel compiler module before compiling. For example:

$ module load intel/2019b

Notice that when a compiler module is loaded, some environment variables are set or modified to add the paths to certain commands, include files, or libraries, to your environment. This helps to simplify the way you do your work.

As an alternative, Atlas FDR also offers a collection of open source tools such as compilers or scientific libraries. Use module avail command to see versions available. For example:

module load foss/2017b
module load FFTW/3.3.6-foss-2017b