Skip to content

AlphaFold3

AlphaFold

AlphaFold3 Terms of Use and Restrictions

Important Legal Notice: AlphaFold3 model parameters and outputs are subject to strict terms of use:

  • Non-commercial use only: Exclusively available for non-commercial research by academic institutions, non-profits, educational, journalism, and government bodies
  • No commercial activities: Cannot be used for any commercial purposes, including research on behalf of commercial organizations
  • No model training: Outputs cannot be used to train other machine learning models for biomolecular structure prediction
  • Model parameters are confidential: Do not share or publish the model weights outside your organization
  • Clinical use prohibited: Predictions are for theoretical modeling only and must not be used for clinical purposes or medical advice

By using AlphaFold3 on this system, you agree to comply with the full Terms of Use. Violations may result in termination of access and legal consequences.

Contact your institution's legal department if you have questions about compliance, especially regarding collaborations with commercial entities.

Overview

AlphaFold3 represents a significant advancement in computational structural biology, extending beyond protein folding to predict the structures of entire biomolecular complexes. This third iteration from DeepMind can model proteins, nucleic acids (DNA and RNA), small molecules, ions, and their interactions with high accuracy.

On the Hyperion cluster, AlphaFold3 is deployed as an optimized Apptainer container, providing researchers with access to this technology while ensuring reproducibility and efficient resource utilization.

Key Capabilities

  • Multi-molecular complexes: Model proteins, DNA, RNA, and ligands together
  • Post-translational modifications: Include phosphorylation, methylation, and other PTMs
  • Covalent interactions: Specify bonds between proteins and ligands
  • High accuracy: Achieves atomic-level precision for well-folded domains

Getting started

Environment modules

The AlphaFold3 installation on Hyperion uses the Lmod module system. Loading the module automatically configures all necessary paths and dependencies.

module load AlphaFold/3.0.1

This command sets up your environment with:

  • The AlphaFold3 container path
  • Access to model weights (neural network parameters)
  • Reference databases (UniRef90, PDB, and RNA sequences)
  • Optimized wrapper scripts for simplified execution
  • Automatic Apptainer module loading

Module environment variables

Variable Description
AF3_CONTAINER Path to the AlphaFold3 Apptainer container
AF3_WEIGHTS Location of neural network model parameters
AF3_DB Reference sequence databases for MSA generation
PATH Updated to include AlphaFold3 wrapper scripts

Available commands

Command Description
alphafold3-run Main prediction wrapper that handles container execution
alphafold3-check Validates installation and environment setup
alphafold3-help Displays comprehensive usage information

Verifying the environment

Before running your first prediction, verify that everything is properly configured:

module load AlphaFold/3.0.1
alphafold3-check
Expected output
======================================
AlphaFold3 Installation Check
======================================

Environment Variables:
----------------------
AF3_CONTAINER: <location of the container file>
AF3_WEIGHTS:   <location of neural network parameters>
AF3_DB:        <location of reference sequence database>

Component Status:
-----------------
✓ Container found (2.8G)
✓ Model weights found (1.1G)
✓ Database directory found (10 files)

Key Database Files:
  ✓ uniref90_2022_05.fa
  ✓ pdb_seqres_2022_09_28.fasta
  ✓ bfd-first_non_consensus_sequences.fasta

GPU Status:
-----------
✓ GPU detected: NVIDIA A100-SXM4-80GB, 81920 MiB
  CUDA_VISIBLE_DEVICES: 0

SLURM Context:
--------------
Job ID: 3209670
Partition: preemption
QOS: regular
CPUs: 1
Memory: 204800

Container Test:
---------------
Testing container execution...
Python OK
✓ Container execution successful

======================================
Check complete

Quick start guide

Interactive session

For development and testing, interactive sessions provide immediate feedback:

# Request GPU resources for 2 hours
srun --gres=gpu:1 --mem=100G --partition=general --qos=regular --pty bash

# Load the AlphaFold3 module
module load AlphaFold/3.0.1

# Run your prediction
alphafold3-run input.json output_dir/

Interactive mode is ideal for

  • Testing input files before batch submission
  • Debugging failed predictions
  • Small proteins requiring quick results
  • Learning the system

Batch submission

For production runs, submit jobs through SLURM:

Basic batch script
#!/bin/bash
#SBATCH --job-name=af3_prediction
#SBATCH --partition=general
#SBATCH --qos=regular
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100-pcie
#SBATCH --cpus-per-task=48
#SBATCH --mem=100GB
#SBATCH --output=%x_%j.out
#SBATCH --error=%x_%j.err

module load AlphaFold/3.0.1

# Your input file and output directory
alphafold3-run complex_input.json results_${SLURM_JOB_ID}/

Input format specification

AlphaFold3 uses JSON files to define molecular systems for prediction. This section covers the essential input format requirements.

Basic structure

Every AlphaFold3 input requires five mandatory fields:

{
  "name": "my_prediction",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MAFSAEDVLK..."
      }
    }
  ],
  "modelSeeds": [1, 2, 3],
  "dialect": "alphafold3",
  "version": 1
}
Field Type Description
name string Job identifier (alphanumeric, hyphens, underscores)
sequences array List of molecular entities
modelSeeds array Random seeds for generating multiple models (e.g., [1, 2, 3])
dialect string Must be exactly "alphafold3"
version integer Format version: 1, 2, 3, or 4

Molecular entity types

Proteins

{
  "protein": {
    "id": ["A"],
    "sequence": "MAFSAEDVLKEYDRRRM..."
  }
}
  • id: Chain identifier(s). Use ["A", "B"] for homodimers
  • sequence: Standard single-letter amino acid codes

Nucleic acids

{
  "rna": {
    "id": ["R"],
    "sequence": "AUGCAUGC"
  }
}
{
  "dna": {
    "id": ["D"],
    "sequence": "ATGCATGC"
  }
}

Ligands

{
  "ligand": {
    "id": ["L"],
    "smiles": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
  }
}

Recommended for custom molecules. Generate SMILES from MOL2/SDF using OpenBabel or RDKit.

{
  "ligand": {
    "id": ["L"],
    "ccdCodes": ["ATP"]
  }
}

For standard PDB ligands. Find codes at RCSB PDB Chemical Component Dictionary.

Array requirement for CCD codes

The ccdCodes field must always be an array: ["ATP"] not "ATP"

Ions

{
  "ion": {
    "id": ["I"],
    "ccdCodes": ["MG"]
  }
}

Common ion codes: MG (magnesium), ZN (zinc), CA (calcium), FE (iron)

Complete examples

Protein-ligand complex (GLUT1 + glucose)
{
  "name": "GLUT1_glucose_complex",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MEPSDKDKKE..."
      }
    },
    {
      "ligand": {
        "id": ["B"],
        "smiles": "C([C@@H]1[C@H]([C@@H]([C@H](C(O1)O)O)O)O)O"
      }
    }
  ],
  "modelSeeds": [1, 2, 3],
  "dialect": "alphafold3",
  "version": 1
}
Protein-DNA complex with metal ion
{
  "name": "zinc_finger_DNA",
  "sequences": [
    {
      "protein": {
        "id": ["A"],
        "sequence": "MTCPECGE..."
      }
    },
    {
      "dna": {
        "id": ["D"],
        "sequence": "GCGTGGGCG"
      }
    },
    {
      "ion": {
        "id": ["Z"],
        "ccdCodes": ["ZN"]
      }
    }
  ],
  "modelSeeds": [1, 2, 3, 4, 5],
  "dialect": "alphafold3",
  "version": 1
}

Format conversion

MOL2 and SDF formats are not directly supported. Convert to SMILES:

obabel input.mol2 -O output.smi
from rdkit import Chem

# From MOL2
mol = Chem.MolFromMol2File('input.mol2')
smiles = Chem.MolToSmiles(mol)
print(smiles)

# From SDF
suppl = Chem.SDMolSupplier('input.sdf')
for mol in suppl:
    smiles = Chem.MolToSmiles(mol)
    print(smiles)

Advanced features

For advanced use cases, consult the official documentation:

AlphaFold3 Input Format Documentation →

Advanced topics include:

  • Post-translational modifications (phosphorylation, glycosylation)
  • Covalent bond specifications between entities
  • Custom MSA and template provision
  • User-defined chemical components
  • Chain stoichiometry and multiple copies

Understanding resource requirements

GPU selection strategy

AlphaFold3's memory requirements depend on the size of your molecular system. The Hyperion cluster provides two GPU types, each suited for different scales:

GPU: RTX 3090
System RAM: 32-48 GB
Expected runtime: 2-4 hours

#SBATCH --gres=gpu:1
#SBATCH --constraint=rtx3090
#SBATCH --mem=32G

GPU: RTX 3090
System RAM: 48 GB
Expected runtime: 4-8 hours

#SBATCH --gres=gpu:1
#SBATCH --constraint=rtx3090
#SBATCH --mem=48G

GPU: A100
System RAM: 64 GB
Expected runtime: 8-12 hours

#SBATCH --gres=gpu:1
#SBATCH --constraint=a100-pcie
#SBATCH --mem=64G

GPU: A100
System RAM: 128 GB
Expected runtime: 12-24 hours

#SBATCH --gres=gpu:1
#SBATCH --constraint=a100-pcie
#SBATCH --mem=128G

Token calculation

Understanding token counts is crucial for resource planning:

Component Tokens
Proteins 1 token per amino acid residue
DNA/RNA 1 token per nucleotide
Small molecules/ligands 1 token per heavy atom (non-hydrogen)
Metal ions 1 token each
Modified residues Tokenized as individual atoms

Token calculation example

A protein complex with:

  • 500 amino acid residues = 500 tokens
  • ATP ligand (31 heavy atoms) = 31 tokens
  • Mg²⁺ ion = 1 token

Total: 532 tokens → Use RTX 3090 with 32GB RAM

Memory considerations

AlphaFold3 uses memory in two critical phases:

  1. MSA Generation (CPU-intensive): Searches sequence databases to find evolutionary related sequences
  2. Structure Inference (GPU-intensive): Predicts 3D coordinates using the neural network

Quadratic scaling

GPU memory requirement scales roughly quadratically with sequence length due to attention mechanisms in the neural network architecture.

Job submission templates

Standard single job

Standard batch script
#!/bin/bash
#SBATCH --job-name=af3_protein
#SBATCH --partition=general
#SBATCH --qos=regular
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100-pcie
#SBATCH --cpus-per-task=48
#SBATCH --mem=100GB
#SBATCH --output=logs/%x_%j.out
#SBATCH --error=logs/%x_%j.err

# Create output directories
mkdir -p logs

# Load AlphaFold3
module load AlphaFold/3.0.1

# Define paths
INPUT_JSON="input.json"
OUTPUT_DIR="af3_output_${SLURM_JOB_ID}"

# Run prediction
alphafold3-run ${INPUT_JSON} ${OUTPUT_DIR}

# Report completion
if [ $? -eq 0 ]; then
    echo "Successfully completed at $(date)"
    echo "Results saved to: ${OUTPUT_DIR}"
else
    echo "Prediction failed - check error logs"
fi

Array jobs for high-throughput screening

Process multiple structures efficiently with array jobs:

Array job batch script
#!/bin/bash
#SBATCH --job-name=af3_array
#SBATCH --partition=general
#SBATCH --qos=regular
#SBATCH --gres=gpu:1
#SBATCH --constraint=a100-pcie
#SBATCH --cpus-per-task=48
#SBATCH --mem=100GB
#SBATCH --array=1-10%2    # Process 10 structures, max 2 concurrent
#SBATCH --output=logs/array_%A_%a.out
#SBATCH --error=logs/array_%A_%a.err

module load AlphaFold/3.0.1

# Directory structure for array jobs
INPUT_DIR="inputs"
OUTPUT_DIR="outputs"

# Each array task processes one input file
INPUT_FILE="${INPUT_DIR}/input_${SLURM_ARRAY_TASK_ID}.json"
JOB_OUTPUT="${OUTPUT_DIR}/job_${SLURM_ARRAY_TASK_ID}"

mkdir -p ${JOB_OUTPUT}

echo "Processing structure ${SLURM_ARRAY_TASK_ID} of ${SLURM_ARRAY_SIZE}"
alphafold3-run ${INPUT_FILE} ${JOB_OUTPUT}

Understanding the wrapper script

How the wrapper works

The alphafold3-run wrapper script abstracts the complexity of container execution.

alphafold3-run input.json output/
apptainer exec --nv \
    --bind ${AF3_WEIGHTS}:/root/models:ro \
    --bind ${AF3_DB}:/root/public_databases:ro \
    --bind $(pwd):/work \
    --pwd /work \
    ${AF3_CONTAINER} \
    python /app/alphafold/run_alphafold.py \
    --json_path=input.json \
    --output_dir=output \
    --db_dir=/root/public_databases \
    --model_dir=/root/models

The wrapper handles:

  • Container GPU enablement (--nv flag)
  • Read-only mounting of databases and weights
  • Working directory binding
  • Path translations between host and container

Direct container usage

For advanced users needing custom parameters:

Direct container execution with custom options
module load AlphaFold/3.0.1

# Direct execution with custom options
apptainer exec --nv \
    --bind ${AF3_WEIGHTS}:/root/models:ro \
    --bind ${AF3_DB}:/root/public_databases:ro \
    --bind $(pwd):/work \
    --pwd /work \
    ${AF3_CONTAINER} \
    python /app/alphafold/run_alphafold.py \
    --json_path=input.json \
    --output_dir=output \
    --db_dir=/root/public_databases \
    --model_dir=/root/models \
    --num_diffusion_samples=5 \
    --flash_attention_implementation=triton \
    --buckets=256,512,768,1024,1280,1536,2048

Performance optimization

Automatic GPU configuration

Automatic optimization

The AlphaFold/3.0.1 module automatically detects your GPU type and configures optimal memory settings. These environment variables are set based on your hardware allocation.

Manual environment variables

Container environment variables

To override automatic settings, prefix variables with APPTAINERENV_ so they're passed to the container

# Conservative memory usage for 24GB VRAM
export APPTAINERENV_XLA_PYTHON_CLIENT_PREALLOCATE=false
export APPTAINERENV_XLA_PYTHON_CLIENT_MEM_FRACTION=0.85
export APPTAINERENV_TF_FORCE_GPU_ALLOW_GROWTH=true

These settings prevent memory exhaustion on smaller GPUs by:

  • Disabling memory preallocation
  • Limiting usage to 85% of VRAM
  • Allowing gradual memory growth
# Maximum performance for 40/80GB VRAM
export APPTAINERENV_XLA_PYTHON_CLIENT_PREALLOCATE=true
export APPTAINERENV_XLA_PYTHON_CLIENT_MEM_FRACTION=0.95

# For very large systems
export APPTAINERENV_CUDA_MANAGED_FORCE_DEVICE_ALLOC=1

These settings maximize performance by:

  • Preallocating GPU memory
  • Using 95% of available VRAM
  • Enabling unified memory for oversized models

JAX compilation cache

Reduce startup time significantly by caching compiled kernels:

# Add to your SLURM script
export JAX_COMPILATION_CACHE_DIR=/scratch/$USER/.jax_cache
mkdir -p $JAX_COMPILATION_CACHE_DIR

Cache benefits

  • First run: ~10-15 minutes compilation time
  • Subsequent runs: Skip compilation, save time
  • Persistence: Cache persists across jobs
  • Sharing: Shared cache possible for research groups

Optimization strategies

Pre-compilation strategy

Run a minimal test case first to populate the JAX cache, then use the cached compilation for production runs. This saves 10-15 minutes on each subsequent run.

MSA reuse

Save MSA results from similar sequences to skip redundant database searches. This can reduce CPU time by 4-8 hours for related proteins.

Batch processing

  • Group proteins of similar size together
  • Optimize GPU utilization
  • Reduce overall queue wait times

Off-peak submission

Submit large jobs during night hours (22:00-06:00) for better resource availability and faster queue movement.

Output structure and files

Directory organization

AlphaFold3 creates a structured output directory with all prediction results:

output_dir/
├── <job_name>_model_0.cif          # Structure from seed 1
├── <job_name>_model_1.cif          # Structure from seed 2
├── <job_name>_model_2.cif          # Structure from seed 3
├── <job_name>_confidences.json     # Confidence metrics
├── <job_name>_summary_confidences.json  # Prediction summary
└── <job_name>_data.json            # Complete prediction data

File descriptions

File Type Description Usage
.cif files Crystallographic Information Format structures Viewable in PyMOL or ChimeraX for 3D visualization
_confidences.json Contains pLDDT scores and PAE matrices Quality assessment and confidence evaluation
_summary_confidences.json Metadata about the prediction run Run parameters and summary statistics
_data.json Comprehensive prediction data All intermediate results and detailed information

Storage planning

System Type Size Output Size Storage Recommendation
Small protein 500 residues ~50 MB Standard quota sufficient
Medium complex 2000 tokens ~200 MB Consider compression
Large assembly 5000 tokens ~500 MB Use scratch, then archive

Troubleshooting guide

Common issues and solutions

Out of Memory Error

Error Message:

RuntimeError: Resource exhausted: Out of memory

Solutions:

  1. Switch to larger GPU:

    #SBATCH --constraint=a100-pcie  # Instead of rtx3090
    

  2. Increase system memory:

    #SBATCH --mem=128GB  # Increase from default
    

  3. Reduce prediction seeds:

    "modelSeeds": [1]  // Instead of [1,2,3,4,5]
    

  4. Use multi-stage workflow (see templates above)

Process Killed by SLURM

Error Message:

slurmstepd: error: Detected 1 oom-kill event(s)

Cause: System RAM exhaustion (not GPU memory)

Solution: Increase memory allocation:

#SBATCH --mem=200GB  # Generous allocation for large systems

GPU Not Found

Error Message:

RuntimeError: No CUDA GPUs are available

Checklist:

  1. Verify GPU request in SLURM:

    #SBATCH --gres=gpu:1
    

  2. Check module is loaded:

    module list  # Should show AlphaFold/3.0.1
    

  3. Confirm you're on GPU partition:

    #SBATCH --partition=gpu  # or 'general' with GPU request
    

Invalid JSON Format

Error Message:

ValueError: AlphaFold 3 input JSON must contain `dialect` and `version` fields

Solution: Ensure your JSON includes all required fields:

{
  "name": "my_protein",
  "sequences": [...],
  "modelSeeds": [1, 2, 3],
  "dialect": "alphafold3",  // Must be exactly "alphafold3"
  "version": 1              // Integer, not string
}

Validation command:

python -m json.tool input.json  # Checks JSON syntax

Data management best practices

Efficient scratch usage

The /scratch filesystem provides fast temporary storage ideal for AlphaFold3 runs:

# Recommended workflow
SCRATCH_DIR="/scratch/$USER/af3_runs/${SLURM_JOB_ID}"
mkdir -p ${SCRATCH_DIR}
cd ${SCRATCH_DIR}

# Run prediction in scratch
alphafold3-run input.json output/

# Move only essential results to permanent storage
rsync -av output/*.cif output/*_confidence*.json /home/$USER/af3_results/

# Cleanup scratch
rm -rf ${SCRATCH_DIR}

Storage guidelines

Storage quotas

  • Home directory: Limited quota, avoid storing large outputs
  • Scratch: Fast but temporary, should be cleaned periodically

Always compress completed predictions and remove intermediate files.

Quick reference

Essential information card

Quick Reference Card

Module Load:

module load AlphaFold/3.0.1

Basic Run:

alphafold3-run input.json output/

Token Calculation:

  • Protein: 1 token/residue
  • DNA/RNA: 1 token/nucleotide
  • Ligand: 1 token/heavy atom
  • Ion: 1 token

JSON Required Fields:

{
  "name": "job_name",
  "sequences": [...],
  "modelSeeds": [1,2,3],
  "dialect": "alphafold3",
  "version": 1
}

GPU Selection:

  • <2000 tokens: RTX 3090
  • 2000 tokens: A100

Common Fixes:

  • Out of memory → Use A100
  • Process killed → Increase --mem
  • No GPU → Add --gres=gpu:1
  • Invalid JSON → Check dialect field

Confidence interpretation

Understanding AlphaFold3's confidence metrics is crucial for interpreting results:

pLDDT Score Confidence Level Interpretation
>90 Very high Atomic-level accuracy expected
70-90 Confident Backbone reliable, sidechains approximate
50-70 Low Overall topology likely correct
<50 Very low Potentially disordered region

Additional resources