DataScience Workbook / 06. High-Performance Computing (HPC) / 2. Remote Access to HPC Resources


Introduction

Remote access to the contents stored on remote machines refers to the ability for a user to access, view, and manipulate files, data, and software available on another computer, including servers, databases, and HPC systems. This can be accomplished from anywhere over a network connection such as the Internet. Though, the user must have the necessary permissions and credentials to access the remote machine and its contents. In particular, there are several ways to access remotely the resources available on high-performance computing (HPC) clusters.

WARNING:
It is important to note that accessing data on an HPC cluster remotely can be slower than accessing data locally, due to the added latency of transmitting data over the network.

In addition, users may need to be granted access to the HPC cluster in order to use it remotely.

1. VPN (Virtual Private Network)

VPN is a technology that allows users to securely access a private network over the Internet. VPNs can be used to access remote files and data stored on remote computers within the same network. VPNs protect users data from being intercepted or monitored by unauthorized parties while login from off-campus. Learn more from the hands-on tutorial Virtual Private Network (VPN) Connection ⤴ in this workbook.

Jump to solution to get started with:

2. SSH (Secure Shell connection)

With SSH, a cryptographic network protocol, users can connect to the cluster and then browse, manipulate, and execute files as if they were sitting at the terminal of a computer on the cluster. Learn more from the hands-on tutorials in the Secure Shell Connection (SSH) ⤴ section of this workbook.

Jump to solution to get started with:

3. Remote web-based access

Some HPC clusters may also provide web-based interfaces (e.g., Open OnDemand ⤴) for remotely accessing and managing data. It also allows users to submit computing jobs to the HPC queueing system through a web interface or API, without the need for command line login to the underlying infrastructure. Learn more from the hands-on tutorial Open On Demand (OOD) Connection ⤴ in this workbook.

Jump to solution to get started with:

4. Remote desktop software

VNC (Virtual Network Computing ⤴) or RDP (Remote Desktop Protocol ⤴ by Microsoft) allow users to remotely access and control a desktop (graphical user interface) on another computer, including some clusters.

5. RFS (Remote File System)

RFS protocol ⤴ is often used in computing clusters to connect multiple nodes together over a high-speed network. By using an RFS protocol, nodes in a cluster can access data stored on other nodes as if it were stored locally, which simplifies data access and eliminates the need to physically transfer large amounts of data between nodes. This can improve the performance and scalability of the cluster, and allows the nodes to work together more efficiently.

RFS protocol allows users to access files stored on a remote computer, without having to physically transfer the files to their local machine. Users can remotely access, read, write, and modify files as if they were stored locally on their own computer. Some HPC systems may have the Remote File System (RFS) pre-installed and configured, while others may not.

Learn more from the hands-on tutorials in the 07. Data Acquisition and Wrangling: Remote Data Access: Remote Data Preview (without Downloading) ⤴ section of this workbook:

PRO TIP:
If the HPC system already has RFS pre-configured, the user may simply need to follow the appropriate steps to access the remote file system, such as mounting the file system and logging in with their credentials. The specific steps and commands required to access the RFS will vary depending on the operating system and RFS implementation being used.



To learn more about remote access and data manipulation, go to Section 07. Data Acquisition and Wrangling ⤴ in this workbook:


Further Reading


Homepage Section Index Previous Next top of page