DataScience Workbook / 06. High-Performance Computing (HPC) / 4. Software Available on HPC


Introduction

Software on HPC - overview

The software available on a high-performance computing (HPC) system can vary depending on the specific system and its intended use. However, some common types of software that are typically available on HPC systems include:

1. compilers
[ a special program that translates a programming language’s source code into machine code ]
HPC systems often have a variety of compilers installed, such as GCC ⤴ and Intel Fortran ⤴, that can be used to compile and optimize code for the system’s architecture.

2. programming languages
[ a system of syntax and semantics for writing instructions that can be executed by a computer ]
HPC systems often have a wide range of programming languages installed, such as C ⤴, C++ ⤴, Fortran ⤴, Python ⤴, and R ⤴, which can be used to write and run code on the system.

3. libraries and frameworks
[ ready-made code (functions and classes) to solve common tasks and boost development ]
HPC systems often have a variety of libraries and frameworks available, such as the MPI library ⤴ for parallel programming and the CUDA library ⤴ for GPU programming.

4. basic visualization software
[ remote visualization to display data and derive meaningful insights ]
HPC systems often have software for visualizing and analyzing data, such as ParaView ⤴, VisIt ⤴, gnuplot ⤴, VMD ⤴, and other ⤴.

5. job schedulers
[ a computer application for controlling resources and execution of jobs ]
HPC systems often have a job scheduler installed, such as Slurm ⤴, PBS ⤴ or LSF ⤴, which is responsible for managing the allocation of resources and scheduling jobs to run on the system.

6. data management
HPC systems have data management software, including:

  • distributed file systems (e.g., Lustre ⤴, GlusterFS ⤴, and GPFS ⤴) that enable efficient data sharing and collaboration among multiple users

  • backup and archiving software (e.g., Amanda ⤴, Bacula ⤴, and Tivoli Storage Manager ⤴) that protect data by creating regular backups and archiving older data to long-term storage

  • data transfer software (e.g., Globus ⤴, GridFTP ⤴, and Aspera ⤴) that transfer large amounts of data quickly and efficiently between HPC systems and other storage or computing machines

  • data cataloging and metadata management software (e.g., iRODS ⤴, Dataverse ⤴, and XNAT ⤴) that manage and organize large amounts of data, and provide search capabilities

  • database management software (e.g., MySQL ⤴ and MongoDB ⤴) that store, manage, and analyze large amounts of structured data

How to find available software?

There are several ways to find available software on a high-performance computing (HPC) system. For some of them you can find the hands-on mini tutorials in the following subsections:

WARNING:
It's worth noting that each HPC system can have a different way of managing and organizing the software, so it's best to consult the documentation or ask the system administrator for specific instructions.

Additionally, in some cases a software is not publicly available and may require to go through a software request process, so it's best to clarify the availability and access rights of the software.

Software as built-in commands

There are many different types of software that may be available as built-in commands on a high-performance computing (HPC) system. Some examples include:

  1. System utilities, see more
  2. Text processing and manipulation tools, see more
  3. Compression and archiving tools, see more
  4. Job management tools, see more
  5. Remote access tools, see more
  6. File transfer tools, see more

How to check built-in commands?

There are a few ways to check the list of available built-in commands on a high-performance computing (HPC) system:

  • Using the help command
    Many HPC systems have a help command that can be used to view a list of built-in commands.
    For example: help or man will give a list of all the commands.
    help
    man
    
  • Using the builtin command
    Some HPC systems have a builtin command that can be used to view a list of built-in commands.
    Try these commands on your HPC system:
    builtin
    compgen -b
    
  • Using the alias command
    The alias command can be used to view a list of all currently defined aliases, which are often used to create custom built-in commands. Learn more from subsection 3.4 Define aliases ⤴ in the practical Introduction to UNIX Shell ⤴ tutorial in this workbook.
    alias
    
  • Examining shell initialization files
    Some HPC systems may define built-in commands in shell initialization files such as .bashrc, .bash_profile, .bash_aliases or similar. The user can check these files for custom built-in commands. Learn more from hands-on tutorials available in this workbook:
  • Trying to use the desired command
    If you know what the command corresponding to the program could be called, you can always try calling it in the terminal window. If such a command exists then usually calling it with the -h flag will display the available options.
    For example:
    chmod -h
    

    Check if command exists

    If such a command does not exist then an error message will be printed.

    random_command
    

    Check if command exists


Explore example software typically available as built-in commands:

1. System utilities

Basic system utilities, such as ls, cd, and mkdir, are often available as built-in commands. These utilities allow users to:

  • navigate the file system,
  • manage files and directories,
  • and perform other basic tasks.

Learn more from the practical tutorial Basic Commands: Navigation, File Creation & Preview ⤴, available in section 02. Introduction to the Command Line ⤴ of this workbook.

2. Text processing and manipulation tools

Some common text processing and manipulation tools like sed, awk, grep and cut are often available, which allow users to manipulate and extract data from text files or command-line text streams. Learn more from the practical tutorial Useful Text Manipulation Programs ⤴, available in section 02. Introduction to the Command Line ⤴ of this workbook.

There are also a built-in command-line text editors with basic graphical interface, such as nano or vim, which allow to write a script, edit a configuration file, modify data file, or create a quick note or documentation. Learn more from the practical tutorial Text Files Editors ⤴, available in section 02. Introduction to the Command Line ⤴ of this workbook.

3. Compression and archiving tools

Tools like gzip, tar and zip are often available, which allow users to compress and archive large files and directories.

Quick cheatsheet
Compress a single file:
gzip -c filename > filename.gz

Compress all files in a directory:
gzip -r directory

Decompress a single file:
gzip -d filename.gz

Decompress all files in a directory:
gzip -dr directory.gz

Compress an entire directory or a single file to `.tar.gz` archive:
tar -czvf archive_name.tar.gz /path/to/directory-or-file

Extract the `.tar.gz` archive:
tar -xzvf archive.tar.gz


4. Job management tools

The PBS ⤴ or Slurm ⤴ tools are commonly used on HPC systems to submit and manage jobs on the system. These tools allow users to submit jobs, monitor the status of their jobs, and view the job queue.

Useful job management commands:

SLURM tools PBS tools description
squeue -u {user} qstat -u {user} gives info about user’s jobs
sbatch {job_script} qsub {job_script} submits job to the queue
scancel {jobID} qdel {jobID} stops and removes job
sinfo -N -l pbsnodes -l gives info about queues, partitions, or nodes
scontrol show pbsnodes -l provides details about jobs job jobID, partitions partition pID, or nodes nodes
seff {job_ID} qstat -fxw {job_ID} provides resource usage report for a finished job
salloc qsub -I {job_script} starts interactive session

Learn more from the practical Introduction to Job Scheduling ⤴ tutorials (including SLURM ⤴ and PBS ⤴) in the section 06. High-Performance Computing (HPC) ⤴ of this workbook. For more, see also PBS to Slurm Conversion Cheat Sheet ⤴.

5. Remote access tools

Tools like ssh, telnet and rlogin are often available, which allow users to remotely access and control other systems on the network. Learn more about SSH connection from the practical tutorial Secure Shell Connection (SSH) ⤴, available in section 06. High-Performance Computing (HPC) ⤴ of this workbook.

6. File transfer tools

Tools such as scp and rsync are often available as built-in commands, which allow users to securely transfer files to and from the HPC system.

Copy data from local to remote machine (while being on a local machine):

# syntax:
scp <path_on_local>/<transferred_file_name> <user>@<hostname_to_remote>:<path_on_remote>

# example:
scp ~/.bashrc alex.badacz@atlas-dtn.hpc.msstate.edu:/project/90daydata/

Copy data from remote to local machine (while being on a local machine):

# syntax:
scp <user>@<hostname_to_remote>:<path_on_remote>/<transferred_file_name> <path_on_local>/

# example:
scp alex.badacz@atlas-dtn.hpc.msstate.edu:/project/90daydata/file.txt ~/DATA/
PRO TIP:
To copy directories use:
scp -r {path_to_the source} {path_to_the_destination}

To synchronize the content in both locations, recursively transfer the data using rsync command:
rsync -avz --no-p --no-g {path_to_the source} {path_to_the_destination}


Learn more from the practical tutorials about Remote Data Transfer ⤴, available in section 07. Data Acquisition and Wrangling ⤴ of this workbook.
If you seek for a guide about transferring data to SCINet HPC system ⤴, see tutorials:
Command line data transfer to Atlas ⤴ and
Copy your data to Juno ⤴.

Software as built-in modules

Many HPC systems use a system of software modules to manage and organize the software that is available. The Environment Modules ⤴ package can help to make an HPC system more user-friendly, efficient, and accessible for a wide range of users. It allows users to manage and access software in a more flexible way, and can help to make the system more efficient.

The module command can be used to list the available modules, and to see which modules are currently loaded.

module avail            # List available packages
module avail <name>     # List available variants of a given package
module list             # List currently loaded modules

Learn more from the practical tutorial Accessing pre-Installed Modules ⤴, available in section 06. High-Performance Computing (HPC) ⤴ of this workbook.

Software via package manager

The centralized package manager enables searching for and listing the available software packages on HPC systems. Different package managers match various operating systems. So first, check the operating system (OS) on your HPC infrastructure and identify the package manager in use. Then follow the cheatsheet below to search for the software needed. Learn more from the practical tutorial Accessing Software via Package Manager ⤴, available in section 06. High-Performance Computing (HPC) ⤴ of this workbook.

  • for Ubuntu / Debian: .deb packages managed by apt and dpkg
    # List installed and available packages:
    apt list
    
    # Search apt list for a given package:
    apt search <software_name>
    
  • for RHEL / Fedora / Rocky: .rpm packages managed by yum
    (yum has been supplanted by dnf)
    # List installed and available packages:
    yum list all
    
    # List only available packages:
    yum list available
    
    # Search dnf list for a given package:
    yum search <software_name>
    
  • for FreeBSD: .txz packages managed by pkg
    # Search pkg list for a given package:
    pkg search <software_name>
    

Learn more from external resources:

Check the documentation

HPC systems often have extensive documentation available, including information on the software that is available. Users can consult the documentation to find a list of the software packages that are available. Primarily, such a list should contain:

  • the licensed software, which may be available only for selected users
  • the software with graphical interface (GUI) only, which may require the user to log on to the HPC via a web-based interface (instead of command line)

Software available on the SCINet HPC systems:

Software available on the ISU HPC systems:

Ask the system administrator

The HPC administrator have access to the information on all the software installed, thus in case of any doubts, it’s best to reach out to them for assistance.

  • regarding SCINet HPC, contact VRSC team: scinet_vrsc@usda.gov
  • regarding ISU HPC, contact administrators: hpc-help@iastate.edu

How to get new software installed?

  1. Check that the software is not already installed (follow the guide in this tutorial)
  2. Consider the following criteria:
    • if you think that the new software will be useful to many more users
      or
    • the software is licensed
      or
    • installation requires superuser privileges

      If the answer to any is yes, contact the HPC administrator and submit a request for software installation.
      Otherwise, go to step 3.
  3. Go to the Installing Custom Programs in User Space ⤴ tutorial to learn how to install the necessary software yourself.

Further Reading


Homepage Section Index Previous Next top of page