Operating System Images for Ubiquity
On This Page
- Overview
- Scope
- Prerequisites
- Ubiquity Image Building
- Custom Image Creation
- Bare Metal Deployment
- Cloud Deployment
- Advanced Configurations
- Troubleshooting
Overview
This guide provides comprehensive instructions for creating, customizing, and deploying operating system images within the Ubiquity platform. Ubiquity supports multiple deployment scenarios including bare metal provisioning, cloud deployments, and hybrid environments, each with specific image requirements and customization options.
Scope
This guide covers: - Base OS Images: Rocky Linux, Ubuntu, and Fedora support for Ubiquity clusters - Bare Metal Images: PXE-bootable images for automated provisioning - Cloud Images: Custom images for cloud providers (AWS, Azure, GCP, OpenStack) - Container Images: Base images for HPC workloads and applications - Specialized Images: HPC-optimized images with NVIDIA drivers, InfiniBand support, and performance tools
Prerequisites
Hardware Requirements
- Build Host: System capable of running diskimage-builder (minimum 4GB RAM, 20GB storage)
- Target Platform: Bare metal servers or cloud instances for deployment
Software Requirements
- Operating System: Rocky Linux 8/9, Ubuntu 20.04/22.04, or Fedora Server
- Tools: diskimage-builder, qemu-utils, ansible (for bare metal)
- Access: Administrative privileges on build host
Abbreviations and Acronyms
- BMO: Bare Metal Operator
- DIB: Disk Image Builder
- HPC: High Performance Computing
- IPA: Ironic Python Agent
- MLNX_OFED: NVIDIA Mellanox OpenFabrics Enterprise Distribution
- PXE: Preboot Execution Environment
Ubiquity Image Building
Built-in Image Builder
Ubiquity includes a comprehensive image building system located in tools/disk-image/mkimage/
that provides:
- Automated Building: Script-based image creation with minimal configuration
- Multiple Formats: Support for qcow2, raw, and other disk formats
- Custom Elements: Pre-built elements for Ubiquity-specific configurations
- Multi-Architecture: Support for x86_64 and ARM64 architectures
Supported Operating Systems
Rocky Linux (Recommended) - Rocky Linux 8.x and 9.x - Default choice for Ubiquity deployments - Optimized kickstart configurations for bare metal - Full HPC stack support
Ubuntu Server - Ubuntu 20.04 LTS and 22.04 LTS - Cloud-optimized images - Extensive package ecosystem
Fedora Server - Latest stable releases - Cutting-edge kernel features - Development and testing environments
Quick Start
# Navigate to image builder
cd tools/disk-image/mkimage
# Prepare build environment
./prep.sh
# Build all images
./build-images.sh
# Build specific images
image_filter="rocky" ./build-images.sh
# Specify output format
output_type="qcow2,raw" ./build-images.sh
Available Custom Elements
Ubiquity Element (custom-elements/ubiquity/
)
- Core Ubiquity platform integration
- Kubernetes node preparation
- Longhorn storage optimization
MOFED Element (custom-elements/mofed/
)
- NVIDIA MLNX_OFED network drivers
- InfiniBand support for HPC workloads
- High-speed interconnect optimization
Cloud-init Element (custom-elements/cloud-init-install/
)
- Cloud environment compatibility
- Automated system configuration
- User and SSH key management
Custom Base Element (custom-elements/custom-base/
)
- Common configurations across all images
- Package installations and system tuning
- Security hardening
Custom Image Creation
Setting Up Build Environment
Rocky Linux Build Host:
# Install dependencies
sudo dnf install -y qemu-img python3-pip git
sudo pip3 install diskimage-builder
# Clone Ubiquity repository
git clone https://github.com/ubiquitycluster/ubiquity.git
cd ubiquity-open/tools/disk-image/mkimage
# Initialize build environment
./prep.sh
Ubuntu Build Host:
# Install dependencies
sudo apt update
sudo apt install -y qemu-utils python3-pip git
sudo pip3 install diskimage-builder
# Setup Ubiquity build environment
git clone https://github.com/ubiquitycluster/ubiquity.git
cd ubiquity-open/tools/disk-image/mkimage
./prep.sh
Creating Ubiquity-Optimized Images
Basic Ubiquity Node Image:
export ELEMENTS_PATH="custom-elements:elements"
export DIB_RELEASE="9" # Rocky 9
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
cloud-init-install \
-o ubiquity-node-rocky9
HPC-Optimized Image with InfiniBand:
export ELEMENTS_PATH="custom-elements:elements"
export DIB_RELEASE="9"
export DIB_MOFED_FILE="/tmp/MLNX_OFED_LINUX-5.8-1.0.1.1-rhel8.7-x86_64.iso"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
mofed \
cloud-init-install \
-o ubiquity-hpc-rocky9
GPU-Enabled Compute Image:
export ELEMENTS_PATH="custom-elements:elements"
export DIB_RELEASE="9"
export DIB_CUDA_URL="https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
nvidia-cuda \
cloud-init-install \
-o ubiquity-gpu-rocky9
Bare Metal Deployment
Kickstart-Based Provisioning
Ubiquity's bare metal deployment uses automated kickstart installations optimized for HPC environments:
Key Features: - Automated Partitioning: Intelligent disk layout based on available storage - LVM Configuration: Optimized for Kubernetes and Longhorn storage - Network Configuration: Support for bonding, VLANs, and complex topologies - Security: Automated SSH key deployment and system hardening
Partition Layout (Rocky Linux):
# System Volume Group (Smallest disk)
/boot/efi 512MB vfat
/boot 2GB ext4
/ 20% ext4 (System VG)
/tmp 5% ext4 (System VG)
/var/log 2% ext4 (System VG)
/var/crash 10% ext4 (System VG)
/var/lib/rancher 10% ext4 (System VG)
/home 53% ext4 (System VG)
# Data Volume Group (Larger disks)
/var/lib/kubelet 1% ext4 (Data VG)
/var/lib/longhorn 60% ext4 (Data VG)
/home 39% ext4 (Data VG)
Network Configuration Examples:
Single Interface:
network_interfaces:
- name: eno1
device: eno1
ip: 192.168.1.100
netmask: 255.255.255.0
gateway: 192.168.1.1
nameserver: 192.168.1.1
Bonded Interface:
network_interfaces:
- name: bond0
device: bond0
slaves: eno1,eno2
bond_opts: "mode=802.3ad miimon=100"
ip: 192.168.1.100
netmask: 255.255.255.0
gateway: 192.168.1.1
VLAN Configuration:
network_interfaces:
- name: vlan100
device: eno1
vlanid: 100
ip: 10.0.100.100
netmask: 255.255.255.0
PXE Boot Process
Ubiquity implements a comprehensive PXE boot system for bare metal provisioning:
flowchart TD
A[Bare Metal Node] --> B[PXE Boot Request]
B --> C[DHCP Server Response]
C --> D[TFTP Bootloader Download]
D --> E[HTTP Kickstart Download]
E --> F[OS Installation]
F --> G[Automated Configuration]
G --> H[Kubernetes Join]
DHCP Configuration:
# Node receives IP, boot server, and bootloader info
subnet 192.168.1.0 netmask 255.255.255.0 {
range 192.168.1.100 192.168.1.200;
option domain-name-servers 192.168.1.1;
option domain-name "ubiquitycluster.local";
option routers 192.168.1.1;
filename "pxelinux.0";
next-server 192.168.1.10;
}
Boot Configuration:
# PXE menu entry for Rocky Linux installation
LABEL rocky9-install
MENU LABEL Install Rocky Linux 9 (Ubiquity)
KERNEL vmlinuz-rocky9
APPEND initrd=initrd-rocky9.img ks=http://192.168.1.10/kickstart/node01.ks
Cloud Deployment
Cloud Provider Images
AWS AMI Creation:
# Create AMI-compatible image
export AWS_DEFAULT_REGION="us-west-2"
export DIB_RELEASE="9"
disk-image-create \
vm \
cloud-init-datasources \
growroot \
install-static \
rocky-container \
ubiquity \
-t ami \
-o ubiquity-aws-rocky9
Azure VHD Creation:
# Create Azure-compatible VHD
export DIB_RELEASE="9"
disk-image-create \
vm \
azure \
cloud-init-datasources \
growroot \
rocky-container \
ubiquity \
-t vhd \
-o ubiquity-azure-rocky9
GCP Image Creation:
# Create GCP-compatible image
export DIB_RELEASE="9"
disk-image-create \
vm \
google \
cloud-init-datasources \
growroot \
rocky-container \
ubiquity \
-o ubiquity-gcp-rocky9
OpenStack Image Creation:
# Create OpenStack-compatible qcow2 image
export DIB_RELEASE="9"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
growroot \
rocky-container \
ubiquity \
-o ubiquity-openstack-rocky9
# Upload to OpenStack
openstack image create \
--public \
--disk-format qcow2 \
--container-format bare \
--file ubiquity-openstack-rocky9.qcow2 \
"Ubiquity Rocky 9"
Cloud-init Configuration
Ubiquity images include comprehensive cloud-init support:
User Configuration:
#cloud-config
users:
- name: admin
groups: wheel
shell: /bin/bash
sudo: ['ALL=(ALL) NOPASSWD:ALL']
ssh-authorized-keys:
- ssh-rsa AAAAB3NzaC1yc2E... admin@ubiquity
Package Installation:
packages:
- kubectl
- docker
- git
- htop
- iotop
- nfs-utils
Service Configuration:
runcmd:
- systemctl enable --now docker
- systemctl enable --now kubelet
- /opt/ubiquity/setup-node.sh
Advanced Configurations
HPC-Optimized Images
InfiniBand and RDMA Support:
# Enable RDMA networking for HPC workloads
export DIB_RELEASE="9"
export DIB_MOFED_FILE="/path/to/MLNX_OFED_LINUX-5.8-1.0.1.1-rhel9.0-x86_64.iso"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
mofed \
cloud-init-install \
-o ubiquity-hpc-ib-rocky9
Multi-GPU Configurations:
# Support for multiple GPU configurations
export DIB_RELEASE="9"
export DIB_CUDA_URL="https://developer.download.nvidia.com/compute/cuda/12.2.0/local_installers/cuda_12.2.0_535.54.03_linux.run"
export DIB_GPU_DRIVER_VERSION="535.54.03"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
nvidia-cuda \
nvidia-fabric-manager \
cloud-init-install \
-o ubiquity-multi-gpu-rocky9
Container Runtime Integration:
# Docker and containerd optimizations for HPC
export DIB_RELEASE="9"
export DIB_CONTAINER_RUNTIME="containerd"
export DIB_DOCKER_STORAGE_DRIVER="overlay2"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
container-runtime \
nvidia-container-toolkit \
cloud-init-install \
-o ubiquity-container-rocky9
Storage Optimizations
Longhorn Storage Preparation:
# Optimized for Longhorn distributed storage
export DIB_RELEASE="9"
export DIB_LONGHORN_VERSION="1.5.1"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
longhorn-prep \
cloud-init-install \
-o ubiquity-storage-rocky9
NFS and Distributed Storage:
# Support for NFS and distributed filesystems
export DIB_RELEASE="9"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
nfs-client \
cvmfs-client \
cloud-init-install \
-o ubiquity-nfs-rocky9
Networking Configurations
SR-IOV and Hardware Acceleration:
# Support for SR-IOV and hardware offloading
export DIB_RELEASE="9"
export DIB_SRIOV_DRIVERS="i40e,ixgbe,mlx5_core"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
sriov-support \
dpdk-support \
cloud-init-install \
-o ubiquity-sriov-rocky9
Advanced Security Features:
# Security hardening and compliance
export DIB_RELEASE="9"
export DIB_SECURITY_PROFILE="hardened"
disk-image-create \
vm \
dhcp-all-interfaces \
cloud-init-datasources \
dracut-regenerate \
growroot \
rocky-container \
ubiquity \
security-hardening \
audit-logging \
cloud-init-install \
-o ubiquity-secure-rocky9
Performance Tuning
Kernel and System Optimizations:
# Custom kernel parameters for HPC workloads
kernel_parameters:
- "intel_iommu=on"
- "iommu=pt"
- "hugepagesz=1G"
- "hugepages=32"
- "default_hugepagesz=1G"
- "isolcpus=1-15,17-31"
- "rcu_nocbs=1-15,17-31"
- "nohz_full=1-15,17-31"
CPU Tuning:
# CPU governor and frequency scaling
echo "performance" > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
# CPU isolation for real-time workloads
echo "1-15,17-31" > /sys/devices/system/cpu/isolated
Memory Optimizations:
# Huge page configuration
echo 1024 > /proc/sys/vm/nr_hugepages
echo "vm.nr_hugepages = 1024" >> /etc/sysctl.conf
# NUMA policy optimization
echo "interleave" > /proc/sys/kernel/numa_balancing
Troubleshooting
Build Issues
Common Build Failures:
Insufficient Disk Space:
# Check available space
df -h /tmp
# Clean up previous builds
rm -rf /tmp/image.* ~/.cache/diskimage-builder/
# Use alternative temp directory
export TMP_DIR="/var/tmp"
export DIB_IMAGE_CACHE="/var/cache/diskimage-builder"
Network Connectivity Issues:
# Verify repository access
curl -I http://mirror.centos.org/centos/
# Configure proxy if needed
export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="https://proxy.example.com:8080"
Element Dependencies:
# Verify custom elements path
ls -la custom-elements/
# Check element permissions
find custom-elements/ -name "*.sh" -exec chmod +x {} \;
# Debug element execution
export DIB_DEBUG_TRACE=1
Deployment Issues
PXE Boot Problems:
DHCP Configuration:
# Check DHCP server status
systemctl status dhcpd
# Verify lease file
tail -f /var/lib/dhcpd/dhcpd.leases
# Test DHCP response
dhcping -s 192.168.1.10 -c 192.168.1.100
TFTP Issues:
# Verify TFTP service
systemctl status tftp
# Test TFTP connectivity
tftp 192.168.1.10 -c get pxelinux.0
# Check file permissions
ls -la /var/lib/tftpboot/
Kickstart Problems:
# Validate kickstart syntax
ksvalidator /var/www/html/kickstart/node01.ks
# Check HTTP access
curl -I http://192.168.1.10/kickstart/node01.ks
# Monitor installation logs
tail -f /var/log/httpd/access_log
Performance Issues
Image Size Optimization:
# Remove unnecessary packages
export DIB_MINIMAL_IMAGE=1
# Clean package cache
export DIB_APT_CLEAN=1
export DIB_YUM_CLEAN=1
# Compress images
qemu-img convert -c -O qcow2 input.qcow2 output-compressed.qcow2
Boot Time Optimization:
# Reduce systemd timeout
sed -i 's/#DefaultTimeoutStartSec=90s/DefaultTimeoutStartSec=30s/' /etc/systemd/system.conf
# Disable unnecessary services
systemctl disable NetworkManager-wait-online
systemctl disable plymouth-start
Monitoring and Logging
Build Process Monitoring:
# Monitor build progress
tail -f /tmp/dib-build-*.log
# Check system resources during build
watch -n 5 'df -h && free -h && ps aux | grep disk-image'
Runtime Diagnostics:
# Check node health
kubectl get nodes -o wide
# Monitor system metrics
top
iostat -x 1
sar -u 1 5
# Network diagnostics
ss -tuln
netstat -i
Log Analysis:
# System logs
journalctl -f -u kubelet
journalctl -f -u docker
# Kubernetes logs
kubectl logs -n kube-system -l k8s-app=kubelet
# Storage logs
kubectl logs -n longhorn-system -l app=longhorn-manager
Recovery Procedures
Failed Deployments:
# Reset failed node
ansible-playbook -i inventories/production clean.yml --limit failed_node
# Reinstall from PXE
ipmitool -I lanplus -H <bmc_ip> -U admin -P password chassis bootdev pxe
ipmitool -I lanplus -H <bmc_ip> -U admin -P password power reset
Image Corruption:
# Verify image integrity
qemu-img check ubiquity-node-rocky9.qcow2
# Repair corrupted image
qemu-img convert -f qcow2 -O qcow2 corrupted.qcow2 repaired.qcow2
# Rebuild if necessary
rm -f corrupted.qcow2
./build-images.sh
Configuration Recovery:
# Backup configurations
cp -r custom-elements/ custom-elements.backup.$(date +%Y%m%d)
# Restore from version control
git checkout -- custom-elements/
# Reset to working configuration
git reset --hard <working_commit_hash>