DataScience Workbook / 07. Data Acquisition and Wrangling / 1. Remote Data Access / 1.3 Remote Data Preview without Downloading
Remote data preview refers to the ability to view data that is stored in a remote location without having to download or transfer the entire dataset to a local machine. In general, there are two main approaches for remote data preview:
using graphical interfaces, including:
Many cloud-based platforms and data storage systems offer web-based interfaces that allow users to preview data stored remotely through a web browser. These interfaces may include visualizations, charts, or other tools that allow users to explore the data in real-time.
To learn more visit the tutorial web-based Open OnDemand (OOD) Connection to HPC
Users can remotely access a desktop or computer that is physically located where the data is stored, allowing them to work with the data as if it were stored locally. This requires a stable internet connection and may require additional software or configurations.
Some cloud-based platforms offer virtual machines that allow users to access and interact with data stored remotely through a virtual desktop. This can be especially useful for users who need to work with large datasets or complex analysis tools that require significant processing power.
using command line interfaces, including:
- Viewing Text Files using UNIX commands on a remote machine
- Viewing PDF Files using X11 SSH connection
- Viewing Graphic Files on remote machine as the ASCII art in the terminal
- Mounting Remote Folder on local machine using SSH connection
This section will focus on command line approaches for previewing remote data, as they are universally applicable and reliable for accessing data stored across a variety of High-Performance Computing (HPC) systems. We encourage readers to explore these command line approaches by participating in hands-on tutorials listed in the Further Reading section.
Why it’s good to preview remotely?
Command line approaches for previewing remote data can be especially useful for users in scientific fields, where large datasets are common, and access to High-Performance Computing (HPC) systems is necessary. Remote viewing offers significant advantages in terms of speed, efficiency, and security. Here are some examples of cases where using command line approaches for previewing data stored remotely can be practical:
Previewing PDF files
In scientific research, it’s common to generate reports, publications, and other documents in PDF format. Command line approaches like X11 SSH connection allow users to preview PDF files remotely without downloading or transferring them to a local machine.
To learn more visit the tutorial Viewing PDF Files using X11 SSH connection ⤴
- Previewing graphic files
Graphics are often used in scientific research, including charts, graphs, and images. Command line approaches like mounting a remote folder using sshfs allow users to view graphic files remotely without downloading or transferring them to a local machine.
To learn more visit the tutorials:
- Viewing Graphic Files on remote machine as the ASCII art in the terminal ⤴</span>
Quick view to find information needed
When working on remote HPC systems, it can be challenging to find the specific information needed quickly. Command line approaches allow users to preview the data and code, and use text processing tools to quickly search and extract relevant information.
When collaborating with colleagues in remote locations, it can be difficult to share and access data. Command line approaches allow users to preview data stored remotely and collaborate in real-time on the same data copy.
Exploring large datasets
When working with large datasets, downloading or transferring the entire dataset to a local machine for analysis can be impractical or impossible. Command line approaches allow users to quickly preview and explore the data without the need to download or transfer the entire dataset.
Working with sensitive data
Some datasets may be sensitive and cannot be transferred or downloaded to a local machine. Command line approaches allow users to access and preview the data remotely.
- Viewing remote logs or error messages
When running jobs on HPC systems, you may want to review job status, issues, or error messages remotely to troubleshoot issues or resubmit the task.
- 1.3.1 Viewing Text Files using UNIX commands
- 1.3.2 Viewing PDF Files using X11 SSH connection
- 1.3.3 Viewing Graphic Files using ASCII art in the termianl
- 2. Data Manipulation
- 3. Data Wrangling: ready-made apps