Skip to content

Data Transfer Documentation for DIPC Supercomputing Center

Overview

This guide explains how to transfer data to and from the DIPC Supercomputing Center's systems. It includes detailed explanations of SCP, SFTP, and rsync, suitable for users of varying technical backgrounds.

Understanding File Systems at DIPC

  • Home Directories on Atlas (/dipc): Shared across all systems but only accessible from Atlas EDR and Atlas FDR.
  • Home Directories on Hyperion (/home): Only accessible from Hyperion.
  • Scratch File System: High-speed access on HPC systems. Note: Atlas EDR, Atlas FDR, and Hyperion have separate /scratch directories.

File Transfer Methods

SCP

  • SCP encrypts the file and transfers it over a secure SSH tunnel.
  • It's ideal for transferring small to medium-sized files.

SCP Command Reference

Command Description
scp file user@host:path Transfer a single file to a remote host.
scp user@host:file path Transfer a single file from a remote host.
scp -r dir user@host:path Recursively transfer a directory.
scp -P port file user@host:path Transfer a file using a specific SSH port.
scp -C file user@host:path Transfer a file with compression.

SCP Usage Examples

  • Transfer a Single File to Atlas EDR:

    scp /local/path/filename.ext username@atlas-edr.sw.ehu.es:/dipc/username/target_dir
    

  • Transfer a Single File to Atlas FDR:

    scp /local/path/filename.ext username@atlas-fdr.sw.ehu.es:/dipc/username/target_dir
    

  • Transfer Multiple Files to Atlas EDR:

    scp /local/path/{file1.ext, file2.ext} username@atlas-edr.sw.ehu.es:/dipc/username/target_dir
    

  • Transfer Multiple Files to Atlas FDR:

    scp /local/path/{file1.ext, file2.ext} username@atlas-fdr.sw.ehu.es:/dipc/username/target_dir
    

  • Transfer a Directory Recursively to Atlas EDR:

    scp -r /local/path/directory username@atlas-edr.sw.ehu.es:/dipc/username/target_dir
    

  • Transfer a Directory Recursively to Atlas FDR:

    scp -r /local/path/directory username@atlas-fdr.sw.ehu.es:/dipc/username/target_dir
    

In these examples, username should be replaced with the user's actual username at the DIPC Supercomputing Center. Similarly, /local/path/filename.ext, /local/path/directory, and /local/path/{file1.ext, file2.ext} should be replaced with the actual paths and filenames of the files or directories the user wishes to transfer.

Secure FTP (SFTP)

SFTP provides a secure way to transfer files using the encrypted SSH protocol, but unlike SCP, it allows for interactive file management.

Key Features of SFTP

  • Interactive interface for managing files.
  • Supports file transfer, directory management, and more.

SCP Command Reference

Command Description
scp file user@host:path Transfer a single file to a remote host.
scp user@host:file path Transfer a single file from a remote host.
scp -r dir user@host:path Recursively transfer a directory.
scp -P port file user@host:path Transfer a file using a specific SSH port.
scp -C file user@host:path Transfer a file with compression.

SFTP Usage Examples

  • Start an SFTP Session:
    sftp username@atlas-fdr.sw.ehu.es
    sftp username@atlas-edr.sw.ehu.es
    
  • Transfer Files within the Session:
  • put localfile.ext – Uploads a file to the remote host.
  • get remotefile.ext – Downloads a file from the remote host.
  • mkdir directory – Creates a new directory on the remote host.
  • rmdir directory – Removes a directory on the remote host.

rsync for Advanced File Synchronization

rsync is a versatile file synchronization tool that optimizes file transfers by only syncing the changes made, which significantly reduces data transfer volume.

Understanding rsync

  • Efficiency: Transfers only the differences between source and destination, saving bandwidth and time.
  • Versatility: Suitable for both backup and mirroring purposes.
  • Security: Can run over an SSH connection for secure transfers.

rsync Command Reference

Command Description
rsync -av source destination Basic synchronization of files/directories.
rsync -z source destination Sync with compression.
rsync --delete source destination Sync and delete files not in source.
rsync -av --progress source destination Sync with progress display.
rsync -avz --dry-run source destination Perform a trial run with no changes.
rsync -av --exclude 'pattern' source destination Sync while excluding certain files.
rsync -av --bwlimit=KBPS source destination Limit bandwidth used by rsync.

Basic rsync Commands with DIPC Hostnames and Mountpoints

  • Sync from Local to Atlas EDR:
rsync -av /local/path/ username@atlas-edr.sw.ehu.es:/dipc/username/target_dir/
  • Sync from Local to Atlas FDR:
rsync -av /local/path/ username@atlas-fdr.sw.ehu.es:/dipc/username/target_dir/
  • Sync from Atlas EDR to Local:
rsync -av username@atlas-edr.sw.ehu.es:/dipc/username/source_dir/ /local/path/
  • Sync from Atlas FDR to Local:
rsync -av username@atlas-fdr.sw.ehu.es:/dipc/username/source_dir/ /local/path/
  • Creating a Backup:
rsync -avz --progress /local/path/ username@atlas-edr.sw.ehu.es:/dipc/username/backup/

This command creates a backup of the local directory on Atlas EDR, compressing files during transfer and showing progress.

Remember to replace username, /local/path/, and /dipc/username/target_dir/ with your actual username and paths. These commands provide a practical guide to using rsync in the context of the DIPC Supercomputing Center's systems.

When to Use rsync

  • Large Data Sets: When transferring large numbers of files or large-sized files.
  • Regular Backups: For incremental backups, as it only transfers changed files.
  • Bandwidth Limitations: When operating over limited or costly bandwidth.
  • Data Mirroring: To maintain an exact replica of directories in different locations.

Comparison Table of File Transfer Methods

Feature SCP SFTP rsync
Protocol SSH SSH SSH (optional)
Security High (SSH encryption) High (SSH encryption) High (if used with SSH)
Use Case Quick file transfer Interactive file management Efficient synchronization
Transfer Speed Fast for small files Moderate Fast, optimized for large data sets
Compression Not available Not available Available (-z option)
Resume Ability No Yes Yes
File Sync No No Yes
Bulk Transfer Less efficient Less efficient Highly efficient
Directory Sync Possible (with -r option) Possible Native capability
Ease of Use Simple for basic use Interactive, more complex Complex, but powerful
Data Integrity Basic Basic Advanced (checksums)
Customization Limited Extensive (interactive session) Highly customizable (numerous flags)

Windows Users

GUI Clients for SCP/SFTP

  • GUI tools like WinSCP or FileZilla simplify file transfer with a user-friendly interface.

Usage:

  • Connect using SSH protocol (default port 22) with credentials.
  • Drag and drop files between the local machine and the server.

Using WinSCP

  1. Download and Install: Obtain WinSCP from its official website.
  2. Connect to Atlas:

  3. Hostname: username@atlas-edr.sw.ehu.es or username@atlas-fdr.sw.ehu.es.

  4. Use SCP or SFTP protocol as needed.

  5. Transfer Files: Drag and drop files or directories to transfer.

Best Practices for File Transfer

Secure Handling of Data

  • Strong Passwords: Always use complex, unique passwords for your accounts.
  • SSH Key Management: Generate a secure SSH key pair for authentication and keep your private key in a safe, encrypted location.

Safe Transfer Protocols

  • Encrypted Transfers: Ensure that the transfer protocol (SCP, SFTP) uses SSH for encryption, safeguarding data in transit.
  • Regular Updates: Keep your SSH clients and servers updated to the latest version for security patches and enhancements.

Handling Sensitive Information

  • Data Classification: Be aware of the type of data you are transferring and comply with relevant data protection policies.
  • Minimize Data Exposure: Transfer only necessary files and avoid transferring highly sensitive data unless absolutely necessary.

Performance Optimization Tips

Efficient Data Transfer

  • Off-Peak Hours: Schedule large transfers during off-peak hours to avoid network congestion.
  • Compress Data: Use compression options (like -z in rsync) to reduce the data size, speeding up the transfer process.

Network Considerations

  • Bandwidth Throttling: Use bandwidth limit options (e.g., --bwlimit in rsync) to prevent overuse of network resources.
  • Stable Connection: Ensure a stable and reliable network connection to avoid interruptions during large transfers.

File Management

  • Incremental Backups: For regular backups, use tools like rsync to transfer only changed files, reducing the total amount of data transferred.
  • Split Large Files: Consider splitting very large files into smaller chunks to improve transfer reliability and resume capability.
  • SCP (Secure Copy Protocol): SCP is typically included in most SSH implementations. For more information on SCP and its usage, refer to the OpenSSH SCP Documentation.

  • SFTP (SSH File Transfer Protocol): Like SCP, SFTP comes with most SSH implementations. To learn more about SFTP, visit the OpenSSH SFTP Documentation.

  • rsync: rsync is a standalone utility often used for backup and file synchronization. For detailed information, tutorials, and updates, visit the rsync Official Website.

  • WinSCP for Windows: WinSCP is a popular SFTP and SCP client for Windows. You can download it and find more information on the WinSCP Official Website.

  • FileZilla for SFTP: FileZilla supports SFTP and is available for various platforms. Find more details at the FileZilla Official Website.