DataScience Workbook / 06. High-Performance Computing (HPC) / 7. Introduction to Containers


What are Containers?

At its core, container is a technology that enables you to package and run applications in isolated environments. A container image is a file that contains the application code, libraries, and dependencies required to run an application. It can be used directly via a container platform to run an application without installation. Containers are a form of operating system virtualization that allows multiple applications to run on a single host system.

Containers are a powerful tool that are widely used in various fields including Genomics, High-performance Computing, and Machine Learning. For example, a researcher might use containers for doing a simple blast locally, or for running an entire genome assembly pipeline. This page provides an introduction to container technology and you can refer to the Singularity ⤴ and Docker ⤴ tutorial pages in this workbook ⤴ to get started with using containers with these platforms.

Benefits of Containers

  • Platform independent: The best benefit of using containers is the portability between different systems. Containers can be used on any system that supports the container runtime without making changes to the container.
  • Consistency: Provide a consistent environment for an application, including its dependencies, ensuring that it runs the same way on different systems.
  • Efficiency: Reduced hardware requirements as they do not require a separate operating system (OS) for each application making them efficient and fast.
  • Isolation: Provide isolation between applications, ensuring that they do not interfere with each other and do not have access to each other’s resources. If one container fails, it does not affect others running on the same system.
  • Scalability: Containers are easily scalable, making them ideal for application development.

Container Platforms

A container platform is a software platform that provides an environment for running and managing containerized applications. Some of these are also workflow management tools that are used to automate tasks, improve efficiency, and ensure consistency in processes.

  • Singularity: is an open-source platform for high-performance computing clusters. It is designed to meet the specific needs of HPC, providing improved performance, security, and compatibility with HPC environments.
  • Docker: is an open-source platform that automates the deployment of applications inside containers. It provides a way to package, deploy, and run applications in isolated environments using containers.
  • Nextflow: is a workflow platform for automating data-driven pipelines. It provides a way to define and execute pipelines using a simple, high-level language, and provides built-in support for containers and cloud computing.
  • Kubernetes: is an open-source platform for automating the deployment, scaling, and management of containerized applications. It provides a unified platform for deploying, scaling, and managing containers, and provides a way to orchestrate and manage the entire container lifecycle.
  • Other tools used for containers include LXD ⤴, rkt ⤴, Podman ⤴, etc.

Virtual Machines vs Containers

Virtual Machines (VMs) and Containers are software technologies that can be used to run software in isolated environments and manage applications. Container sofware examples like Docker were discussed above, some examples of VM software include VMWare, VirtualBox, Parallels Desktop, Citrix, and Hyper-V. VMs and containers differ in several key ways:

  • Resource Utilization: Virtual machines run a complete OS for each application, which can consume a large amount of system resources such as memory and storage. In contrast, containers share the host operating system making them much more resource-efficient.
  • Isolation: VMs provide full isolation, as each machine has its own OS and file system. Containers provide process-level isolation, meaning they share the host OS kernel but have their isolated file system.
  • Portability: Containers are much more portable, as can run on any system that supports container runtime, compared to VMs which require a compatible hypervisor/Virtual Machine Monitor (VMM) to run.
  • Start-up Time: Virtual machines have longer start-up time, as they need to boot a complete OS but Containers do not need an OS boot.
  • Scalability: Containers are designed to be easily scalable, making it simple to add more resources to an application as it grows but Virtual machines can be more challenging to scale.

In summary, containers offer a more resource-efficient and portable solution, with faster start-up times and easier scalability than virtual machines. However, virtual machines provide a higher level of isolation, as they run a complete operating system for each application. The choice between containers and virtual machines depends on the specific requirements of the application and the deployment environment.

Further Reading

Homepage Section Index Previous Next top of page