DataScience Workbook / 06: High-Performance Computing (HPC) / 4. Software Available on HPC / 4.3. Installing custom programs in user space

Introduction

Installing programs on a high-performance computing (HPC) system can be different from installing software on a personal computer due to the complex nature of HPC systems and limited privileges for regular users.

If you need a specific software package, first check whether this software is already pre-installed on HPC.

Typically, compilers, programming languages, libraries and frameworks, basic visualization software, text editors, and job schedulers are all available. What’s more, popular software for specialized analysis (such as blast for bioinformatics) is often not only available but also regularly upgraded to the latest release.

Installing custom programs on HPC

Most HPC systems run on Linux-based operating system, so installing custom programs is done on the command line.

If you would like to learn more about the command line interface and Linux-based operating systems start with the tutorials:

What you can NOT do as a regular user on HPC:

install new packages using the package manager, such as YUM, APT, DNF, ZYpp, or Pacman

install software for the system-wide use

install software that requires superuser privileges

What you can DO as a regular user on HPC: install custom software
in the user space
in the group-wide accessible location
add custom software to the module manager
create virtual environments and install custom software

This handy guide is for installing programs in UNIX environment on HPC systems. Most of these steps assume that you:

  1. are installing package in a user or group accessible location,
  2. without root privilages, and
  3. utilizing
    • the environment module systems or
    • virtual environment systems for package management.

Install custom software

Where to install the software?

If you need to install software on a high-performance computing (HPC) system, there are several methods you can use, depending on the software and the HPC system.

Note that global installations are not possible when you are not the superuser (root, administrator), and personal directory installations are only available to one person (see user-only access).

If the software will be used by members of a particular group, it is a good idea to install one copy of the software available to all (see group-wide access).

Finally, if there is a chance that the software can serve a larger number of users from different groups, it is reasonable to ask the cluster administrators for system-wide installations (see How to get new software installed?).

Install for user-only access

Some HPC systems allow users to install software in their home directory. This is typically done by downloading the software, compiling it from source code, and installing it in a directory within the user’s home directory. This method is often used for small programs because of the limited storage space in the home directory.

Installing all the software in the home directory will quickly fill the available space, and this will result in serious dysfunctions in the operation of user’s account.

The recommended solution is to install programms elsewhere (i.e., in the working directory) and soft-link the installation location to the home directory.

Explore section Home directory tutorial, to find out:

Follow the guide in the tutorial Setting up your home directory for data analysis to learn about the file system organization on the HPC, including the principles for home directory, working directory, and storage space.

Quick Guide

Avoid installing anything in your home directory. This is your default location when you log in, accesed with a shortcut cd ~.

On HPC systems, when you install or start using software like Python or R libraries, or Jupyter Lab, the files are typically saved in your home directory by default, which can quickly fill up; therefore, it’s crucial to move these files to a directory like /project that has a higher space quota to avoid running out of space.

1. The working directory (or workdir) is usually on a path directly inherited from root, /. Typically it is called /work or /project or similar. Further on, there are directories of particular groups or projects, and subdirectories of individual users or tasks.

You can list directories on a path using the ls command:

ls /project/<group-wide folder>

If you are a new user, your subdirectory may not yet exist. Create it in a group/project you have access to:

mkdir /project/<group-wide folder>/user_name

2. Export the path to your folder in the working space as the environmental variable:

export PROJECTFOLDER=/project/<group-wide folder>/user_name

3. Create a subfolder to store all installation settings:

mkdir $PROJECTFOLDER/${USER}_software

Then, create subfolders for most common configuration types and soft-link them to the home directory:

for i in .config .nextflow .singularity .cache .spack .conda .local .lmod.d
do
    mkdir $PROJECTFOLDER/${USER}_software/$i
done
ls -d $PROJECTFOLDER/${USER}_software/.* | sort | awk 'NR>2' | xargs -I xx ln -s xx

From now on, all installation processes attempting to save files in the home directory will be redirected to the corresponding subdirectories in the working directory. At the same time, all processes looking for software in the home directory will find it via symbolic links.

4. If you want to download the source code to install it manually, you should also go to the working directory, create a subdirectory, and do the installation there.
To follow a practical guide, go to the section How to install regular packages?

Install for group-wide access

It is recommended to install only once all programs commonly used in your group/lab. In this case, it is necessary to have the group’s working directory available to all lab members. Such a shared location is a good place for a group-wide installation, making software accessible by all qualifying users.

1. Create a SOFTWARE folder in your group’s working directory on the cluster:

mkdir /project/<group-wide folder>/SOFTWARE

2. For any future software, create a subdirectory there, where you download the source code and perform the installation.

How to get the software?

Finding and acquiring software typically involves a few different methods depending on the type of software you’re looking for, its source (commercial, free, open-source) and the platform you’re using (Windows, macOS, Linux or mobile).

Here’s a general overview of how you can find and get software:

1. App Stores and Digital Distribution Platforms
Windows: Microsoft Store provides apps and games for Windows users.
macOS: The Mac App Store is the official source for macOS apps.
Linux: Linux distributions often come with a package manager and associated app store (like Ubuntu Software Center) that offers a vast array of software from official repositories.
2. Official Websites
For most commercial and many free applications, the safest and most straightforward method is to download software directly from the official website of the developer or company.
This ensures you're getting the genuine article and often the latest version.
3. Third-Party Download Sites
Websites like SourceForge, Tucows, SciCrunch, Bioinformatics.org and others offer a wide range of software.
However, you should be cautious with these as they may bundle software with unwanted extras or adware.
Always opt for custom installation to avoid installing unnecessary or potentially harmful add-ons.
4. Open Source Repositories

For open-source software, platforms like GitHub, GitLab, and Bitbucket are common. These sites host the source code, and often compiled versions, of the software. Downloading from these sources sometimes requires a bit more technical know-how, especially if you need to compile the software yourself.

5. Software-as-a-Service (SaaS)

Many applications, especially in business or institutional environments are available directly through a web browser. Services like Adobe Creative Cloud, Microsoft Office 365, and Google Workspace operate on a subscription model and don’t require traditional downloads.

6. Specialty Software Stores or Services
Certain types of software, especially those catering to specific professions (like engineering, design, or geospatial analysis), might be available through specialized platforms or stores dedicated to those industries.

Explore the following tutorials in this workbook:

How to decompress the archive?

Packages are usually compressed in many different ways for easy handling. Typically before proceeding to installation, it must be unpacked.
For most cases tar -xf will do the trick:

tar -xf package.tar.gz

Although tar can auto detect the compression type and decompress the archive with the -xf options, you can also specify what type compressed files you’re providing. Look at the archive extension to recognize the type of compression. Then use the corresponding commands to unpack.

.tar .rar zip
tar -xvf package.tar unrar -x package.rar unzip package.zip
.tar.gz .tgz .gz
tar -xvzf package.tar.gz tar -xvzf package.tgz gunzip package.gz
.tar.bz2 .tbz2 .bz2
tar -xvjf package.tar.bz2 tar -xvjf package.tbz2 bunzip2 package.bz2
.Z .7z
uncompress package.Z 7z x package.7z

How to install regular packages?

The http://pkgs.org ⤴ website lists all RPMs available for all Unix-based operating systems.

On Linux-based HPC systems, the most common format is .rpm for Red Hat-based systems, such as CentOS and Fedora with YUM package manager. The .tar.gz file format is also commonly used on HPC systems to install software from the source code. This process can be more time-consuming and complex than installing packages from a package manager, but it provides more control over the installation process, and can be the only option for installing software that is not available in the package manager’s repositories.

The table contains a list of the most common package file formats with corresponding package managers and operating systems.

package file format package manager operating system notes
.rpm YUM RHEL, Fedora, CentOS Unix-based, typically on HPC
.deb APT Debian, Ubuntu, Linux Mint Unix-based, typically on personal machine
.pkg MacOS Installer MacOS typically on personal machine
.msi Microsoft Installer Microsoft Windows typically on personal machine
.tar.gz installed manually any compressed archive files of the source code

use package file: .rpm

The .rpm package files are a type of software distribution format used by some Linux-based operating systems. It is a single file of the compressed archive that contain the software package and its dependencies, as well as installation instructions for the package manager appropriate for the operating system.

These all-in-one .rpm files allow users to easily install, manage, and update software on their systems, without having to manually download, compile, and install software from source code.

The YUM package manager extracts the contents of the .rpm archive, verifies that all dependencies are satisfied, and installs the software in the appropriate location on the system. The package manager also keeps track of the installed packages, so that they can be easily updated or removed as needed.

Follow the steps to extract and install software from .rpm file:

1. Find the RPM package correct for your system. The http://pkgs.org ⤴ website lists all RPMs available, and they are free to download.

All CentOS and Fedora RPM’s work on Red Hat (RHEL).
Download the Source Package (not Binary) because as a reugular user you can’t use the yum install command. Instead, you can extract the source package and use a custom installation.

2. Extract the package:

rpm2cpio package.rpm | cpio -idmv

You should see a *tar.gz or other type of compressed program, if this completes successfully.

3. Change into the directory containing the extracted files:

cd path/to/extracted/files

In rare cases, when you have patches (extracted from RPM), you might have to apply them before you install.

patch -Np1 -i path/to/file.patch

4. Configure the package with a custom installation prefix:

./configure --prefix=$HOME/local

5. Build and install the package:

make
make install

This installs the package into the $HOME/local directory. This allows you to install the software without root access. However, keep in mind that the installed software will only be accessible from within your home directory and may not be visible to other users on the system.

You can install software to another location in the file system (where you have write access to) and then create symbolic links (sym-links) to the executables in your home directory, allowing you to easily access them. 4’-5’. Configure the package with a custom installation prefix:

./configure --prefix=/project/{group-wide folder}/SOFTWARE/package_name
make
make install

6. Create sym-links to the executables in your $HOME directory:

cd $HOME
ln -s project/{group-wide folder}/SOFTWARE/

use .configure file

Many programs are distributed as a compressed archive (.tar.gz) that contains the source code of the software package. To install software packaged in this form, you will typically need to extract the files and compile the source code manually on the HPC system. Such software distributions usualy comes with a standard set of files that lets you install programs with ease. After unpacking, if you see the .configure file in decompressed directory, use the following approach.

0. Enter the desired location on HPC (e.g., preferred subdirectory in the working directory).

1. Download the source code for “myprogram”, e.g., using the wget command followed by the link:

wget [download-link]

2. Extract the source code, if needed.

tar xvf myprogram-1.0.tar.gz

…about decompressing archive in section: How to decompress the archive?

3. Change to the directory containing the source code:

cd myprogram-1.0

4. Compile the program [configure the buid, build the software, install the software]:

./configure --prefix=$HOME/myprogram
make
make install

5. Add the following line to your shell startup file (e.g., ~/.bashrc or ~/.bash_profile):

export PATH=$HOME/myprogram/bin:$PATH

6. Reload the .bashrc to update the changes in the environment:

source ~/.bashrc

7. Now you should be able to run myprogram from the terminal.

myprogram -help

Follow the aletrantive steps 5-8 (below), if you want to create a custom module for a new software.

…in section Create custom module

5’. Create a module file for “myprogram” in your environment modules directory (e.g., ~/custom_modules):

# myprogram modulefile

setenv MYPROGRAM_HOME $HOME/myprogram
prepend-path PATH $MYPROGRAM_HOME/bin

6’. Add (only once) the following line to your shell startup file (e.g., ~/.bashrc):

module use ~/custom_modules

7’. Reload the .bashrc to update the changes in the environment:

source ~/.bashrc

8’. Now you should be able to load the “myprogram” module using the following command:

module load myprogram

In case something goes wrong or you get an error saying that you need ‘package x’ before installing, then you can undo these steps before attempting installation again, using:

make clean

If the program doesn’t work as intended or something goes wrong after installation, many programs can be safely uninstalled:

make uninstall

It is good idea to run all the above commands in a build directory inside the package directory, so that if something doesn’t work you can easily delete the build directory to start over.

use Makefile file

Some programs don’t have .configure but they already have a Makefile. These programs do not need the first step (i.e., executing .configure), and you can simply install them by typing:

make
make install

The executables are generally created either in the same directory or in the bin directory, within the package directory. Sometimes these packages will allow you to install other locations as well.

Consult the README or INSTALL files that came with the program or edit the Makefile to hard code the installation directory. In some cases, setting PREFIX variable to the desired installation location will also do the trick.

PREFIX=/custom/installation/location  make

use cmake command

If the README file says that you need to use camke command, then use these steps to install:

# after extraction, cd to the package
cd package
mkdir build
cd build
cmake ..
# if you want it in a different directory, then
cmake -DCMAKE_INSTALL_PREFIX:PATH=/location/for/installation ..
make
# if this completes successfully, you will see a bin folder above this current directory
# that will have the executables

Create custom module

You can install software packages in your home directory or in the group’s working directories. Then you can create your own module file to set environment variables, update PATH and LD_LIBRARY_PATH, and perform any other necessary setup. The module file is a simple script that can be written in any scripting language such as bash, tcl, or python.

1. Create directory for all custom modules:

mkdir /path/to/custom_modules

2. In the custom_modules directory, make a directory for each software:

mkdir /path/to/custom_modules/app_name

3. In the app_name directory, create a text file, i.e., custom module file:
^file name should indicate version number and, if applicable, the compiler version

#!/bin/bash

#-- Example custom module file --#

# Set environment variables:
export MY_APP_DIR=/absolute/path/to/your/software

# Update PATH:
prepend-path PATH $MY_APP_DIR/bin

# Update LD_LIBRARY_PATH:
prepend-path LD_LIBRARY_PATH $MY_APP_DIR/lib

# Other setup:
alias myapp='$MY_APP_DIR/bin/my_app.sh'

This custom module file sets an environment variable MY_APP_DIR to the location of your personal installation of the software. It also updates PATH and LD_LIBRARY_PATH to include the bin and lib directories in MY_APP_DIR. Finally, it sets an alias for running the my_app.sh script in the bin directory.

4. Once you have your custom modules, you can add the directory containing the modules to your module search path using the module use command. Then use the module load command to load the module, which is the name of the directory for selected software in the custom_modules location.

module use /path/to/custom_modules
module load app_name

By using the module use command, you can temporarily modify the module search path for your current shell session without affecting the module search path for other users.

The module use command in a high performance computing system allows you to temporarily modify the search path for modules. This command allows you to add or prepend one or more directories to the existing module search path, so that when you run the module avail or module load commands, it will search the newly added directories first. This can be useful if you have your own custom modules or if you want to use a different version of a module than what is available in the default module search path.

5. To set the modules path to be available on login, add the module use command to your ~/.bashrc file:

module use /path/to/custom_modules

Install language-dedicated packages

Python packages

Using our own python will allow writing/installing modules to it as needed. After unpacking, cd to the package, and install it as follows:

module load python
python setup.py install # all executables will be stored in python/bin (not in package directory)

If in case if you need to test out something and not install it as module, you can install in a personal location as well:

python setup.py install --local=/home/username/mydir

or simply as

python setup.py install --user # executable's will be in ~/.local/bin directory

Any package available at PyPi can be managed using these commands as well:

module load python
pip install SomePackage # installs a python package
pip show --files SomePackage # shows what files are installed for the particular package
pip list --outdated # lists what packages are outdated
pip install --upgrade SomePackage # upgrades a package
pip uninstall SomePackage # uninstalls a package
pip freeze # lists all the packages that are currently installed and their version

Conda

One of the easiest ways you can install custom Python software in your home or project directory is through the Conda package manager. Thousands of biological packages and their dependencies can be installed with a single command using the Bioconda repository for the Conda package manager.

R packages

Installing R libraries for the group is really easy since you don’t have to do anything different from the way you install packages to your home directory. HPC infrastructure typically has its own R version installed as module and it is configured such that it will automatically install the R-related package in the correct location, when you are using this module.

module avail r
---------------------- /apps/modulefiles/core ----------------------
   r/4.0.2
   r/4.2.0
   r/4.3.2          (D)
   rstudio/1.3.1073

-------------------- /apps/licensed/modulefiles --------------------
Where:
 D:  Default Module

To load the selected R version, type:

module load r/4.3.2

To activate the interactive R session, type:

R

The R command prompt will appear:

 R














If you are the novice to R, start with exploring demo() and help.start(): demo()

Demos in package ‘base’:

error.catching          More examples on catching and handling errors
is.things               Explore some properties of R objects and
                        is.FOO() functions. Not for newbies!
recursion               Using recursion for adaptive integration
scoping                 An illustration of lexical scoping.


Use ‘demo(package = .packages(all.available = TRUE))’
to list the demos in all *available* packages.

Installing CRAN R Packages

CRAN packages are by far the easiest. From within R prompt, type:

module load r
R
# R command prompt will appear

install.packages("some_package") If it prompts to select the closest mirror, choose IA, which is 77.

Once installed, you will be back at R prompt.
Load the installed package to see is everything is fine. library(some_package) This should load the package and return without any error message.

Install manually downloaded R Package

Some packages that aren’t in CRAN but are available from the author directly, can be installed for group as well. Download the the package.tar.gz from the author’s website.

module load r
R CMD INSTALL package.tar.gz

This will install the package for the group. Check to see if it works:

R
# R command prompt will appear

library(package)

This should load the package and return without any error message.

Installing Bioconductor Modules

For Bioconductor packages, follow these steps:

module load R
R
# R command prompt will appear

source("http://www.bioconductor.org/biocLite.R") biocLite(c("package_name"), dependencies=TRUE) library(package_name) This should load the package and return without any error message.

Managing R packages

List available packages

To get a complete list of packages that are already installed, load the R module and enter the R prompt. From there, type the following command: library()

To get all packages installed along with their version number, type installed.packages()[,c("Package","Version")]

Upgrade R packages

For the R packages that you installed from CRAN can all be upgrades in single commands:

  • upgrades all packages: update.packages()
  • chack package status: package.status() Says if ‘ok’ (no updates), ‘upgrade’ (needs update) or ‘unavailable’ (package removed from repository).

Other useful option to check the status of all packages currently installed is: inst <- packageStatus()$inst inst[inst$Status != "ok", c("Package", "Version", "Status")]

Uninstall R package

Packages can be uninstalled easily using remove.packages command: remove.packages("package_name")

Perl modules

Once the module is loaded, use the following set of commands to install any perl modules.

module load perl
  • If there is a Makefile.PL:
perl Makefile.PL PREFIX=/home/users/dag   # makes the system specific makefile
make          # builds all the libaries
make test     # runs a short test
make install  # installs the package correctly.
  • If there is a Build.PL:
perl Buil.PL
./Build test
./Build install

The module will be installed in the group’s perl folder (not in the package directory). So, like you did in Python you need to set up a dummy module file that load Perl.

Java programs

Precompiled java programs that come as .jar files, can be placed in any directory and can be called from there. For using it with environment modulefile, you need to follow these steps:

  1. First, create directory (program name) and sub-directory (version number).
  2. Place the .jar file in this sub-directory. <brWithin this create another directory and call it as bin.
  3. For all .jar files in /programname/version/ create a text file in /programname/version/bin.
    This text file will just have a single line, something like:
    java program_name.jar
    
  4. Change permission for these text files so that they can be executed.
    chmod +x -R /programname/version/bin
    
  5. In your module file, you need to add this line:
    prepend-path PATH /programname/version/bin
    
  6. Now the .jar files can be simply called as programname (once module is loaded).
    No need to add java in front.