Deploying high-performance Lustre filesystems on AWS ParallelCluster traditionally requires extensive manual configuration and coordination across multiple components. This Ansible-based automation process provides an interactive deployment that handles everything from cluster sizing to post-installation configuration.

Important: This is a step-by-step Lustre deployment process that builds each Lustre component individually (MGS, MDS, OSS) and creates the filesystem from scratch. This approach does not use AWS built-in services like Amazon FSx for Lustre, but instead deploys a native Lustre filesystem directly on EC2 instances with full control over configuration, performance tuning, and customization.

GitHub Repository: https://github.com/veloduff/hpc/ansible-playbooks/pcluster-lustre

The complete automation scripts, Ansible playbooks, and supporting tools referenced in this post are available in the repository. This includes the Lustre deployment automation, cluster setup scripts, and storage management utilities.

What This Automation Does

The run-pcluster-lustre.sh script provides a complete end-to-end deployment solution:

  • Interactive configuration with intelligent defaults
  • Pre-configured cluster sizes optimized for different workloads
  • Automated Lustre setup with proper component distribution
  • Post-installation scripts for immediate usability
  • Comprehensive validation of prerequisites and credentials

Getting Started

Prerequisites

# Install required tools
pip install ansible aws-parallelcluster

# Configure AWS credentials
aws configure

Custom AMI

This process depends on the Lustre, DKMS, and ZFS modules being installed. It will handle loading of the modules, but they need to be installed. For customizing a ParallelCluster AMI, see my Building Custom ParallelCluster AMIs with Lustre Server Support blog post.

ParallelCluster VPC

This process depends on ParallelCluster being configured, and using ParallelCluster to setup the VPC is the recommended way of setting up the VPC. See my Creating an HPC Cluster with AWS ParallelCluster blog post for setting up a ParallelCluster VPC.

Basic Deployment

# Clone the repository and navigate to the playbook directory
cd ansible-playbooks/pcluster-lustre

# Run the interactive setup
$ ./run-pcluster-lustre.sh
ParallelCluster Lustre Cluster Ansible Setup
============================================
Verifying AWS credentials... verified
Cluster name [lustre-cluster-Jul23-20250642]:
AWS region []: us-west-2
Custom AMI []: ami-111122223333
Operating System []: rhel8
SSH key file path []: /path/to/my-key.pem
EC2 key pair name [my-key]:
Head node subnet ID []: subnet-12121212
Compute subnet ID []: subnet-23232323
Placement group name []: my-placement-group-01
File system size (small/medium/large/xlarge/local) [small]: large

Example file system, shown for a 1PB file system:

$ lfs df -h

...
testfs-OST0032_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:50]
testfs-OST0033_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:51]
testfs-OST0034_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:52]
testfs-OST0035_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:53]
testfs-OST0036_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:54]
testfs-OST0037_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:55]
testfs-OST0038_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:56]
testfs-OST0039_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:57]
testfs-OST003a_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:58]
testfs-OST003b_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:59]
testfs-OST003c_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:60]
testfs-OST003d_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:61]
testfs-OST003e_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:62]
testfs-OST003f_UUID        15.7T        8.0M       15.7T   1% /mnt/lustre[OST:63]

filesystem_summary:      1007.1T       11.3G     1007.1T   1% /mnt/lustre

Pre-Configured File System Sizes

The total size of the file system will depend on the size of the cluster and the size of each OST.

Note: Each MDS will get one MDT, and the MGS has mirrored MGT volumes.

In the ansible-playbooks/pcluster-lustre/lustre_fs_settings.sh file the size of the OST and number of OSTs per OSS can be changed. Here are the settings for the small file system. With the defaults, it is 96TB file system with 40K IOPS:

"small")
    # Default performance: 20K IOPS, 4.8TB capacity
    MDT_USE_LOCAL=false
    OST_USE_LOCAL=false
    
    # MGT will be mirrored volumes
    MGT_SIZE=1                         # Size (GB) for MGT volumes
    MGT_VOLUME_TYPE="gp3"              # Volume type for MDT (io1, io2, gp3)
    MGT_THROUGHPUT=125                 # MDT Throughput in MiB/s
    MGT_IOPS=3000                      # MDT IOPS
    
    # Settings for MDTs when *NOT* using local disk (see MDT_USE_LOCAL)
    MDTS_PER_MDS=1                     # Number of MDTs to create per MDS server
    MDT_VOLUME_TYPE="io2"              # Volume type for MDT (io1, io2, gp3)
    MDT_THROUGHPUT=1000                # MDT Throughput in MiB/s
    MDT_SIZE=512                       # Size (GB) for MDT volumes
    MDT_IOPS=12000                     # MDT IOPS
    
    # Settings for OSTs when *NOT* using local disk (see OST_USE_LOCAL)
    OSTS_PER_OSS=1                     # Number of OSTs to create per OSS server
    OST_VOLUME_TYPE="io1"              # Volume type for OST (io1, io2, gp3) 
    OST_THROUGHPUT=250                 # Throughput in MiB/s
    OST_SIZE=1200                      # Size (GB) for OST volumes 
    OST_IOPS=3000                      # IOPS
    ;;

The Ansible run file ansible-playbooks/pcluster-lustre/run-pcluster-luster.sh has the cluster size and instance types, for example:

"small")
    HEADNODE_INSTANCE_TYPE="m6idn.xlarge"
    MGS_INSTANCE_TYPE="m6idn.large"
    MGS_MIN_COUNT=1
    MGS_MAX_COUNT=1
    MDS_INSTANCE_TYPE="m6idn.xlarge"
    MDS_MIN_COUNT=2
    MDS_MAX_COUNT=8
    OSS_INSTANCE_TYPE="m6idn.xlarge"
    OSS_MIN_COUNT=4
    OSS_MAX_COUNT=16
    BATCH_INSTANCE_TYPE="m6idn.large"
    BATCH_MIN_COUNT=4
    BATCH_MAX_COUNT=32

Pre-Configured Cluster Sizes

The script includes five optimized configurations for different use cases:

Configuration Use Case Head Node MGS MDS OSS Compute
Small Development, testing, small workloads m6idn.xlarge 1x m6idn.large 2-8x m6idn.xlarge 8-16x m6idn.xlarge 4-32x m6idn.large
Medium Production workloads, moderate scale m6idn.xlarge 1x m6idn.xlarge 4-8x m6idn.xlarge 20-40x m6idn.xlarge 8-128x m6idn.large
Large High-performance computing, large datasets m6idn.2xlarge 1x m6idn.xlarge 8-16x m6idn.2xlarge 40-128x m6idn.2xlarge 16-256x m6idn.xlarge
XLarge Extreme scale, mission-critical workloads m6idn.2xlarge 1x m6idn.xlarge 16x m6idn.2xlarge (fixed) 40-128x m6idn.2xlarge 16-256x m6idn.xlarge
Local Maximum performance with local NVMe storage m6idn.2xlarge 1x m6idn.xlarge 16-32x m6idn.2xlarge 40-64x m6idn.2xlarge 16-256x m6idn.xlarge

Automated Post-Installation Pipeline

The script orchestrates the post-installation process:

1. Cluster Setup

  • Package installation via cluster_setup.sh
  • System configuration and optimization
  • Dependency management for Lustre components

2. Lustre Host Configuration

  • Host file management via fix_lustre_hosts_files.sh
  • Network configuration for Lustre communication
  • Service discovery setup

3. Lustre Filesystem Creation

  • Component creation via setup_lustre.sh
  • MGS/MDS/OSS deployment across designated nodes
  • Filesystem mounting and validation

4. Supporting Scripts

  • EBS volume management for storage provisioning
  • Lustre component configuration with proper settings
  • Performance tuning and optimization