DataScience Workbook / 06. High-Performance Computing (HPC) / 1. Introduction to HPC infrastructure / 1.2 SCINet Network / 1.2.3 Juno Storage
Introduction
Juno Archive Storage
Juno is a Storage Device located in Beltsville (MD), being part of SCINet HPC System, funded by USDA-ARS. The SCINet initiative’s mission is to grow USDA’s research capacity by providing scientists with high-performance computing and professional training support.
Explore the resources to learn more:
- SCINet website: https://scinet.usda.gov ⤴
- USDA-ARS website: https://www.ars.usda.gov/ ⤴
- Introduction to SCINet HPC in this workbook: What is SCINet? ⤴
What is Juno used for?
In addition to its powerful computing capabilities, the SCINet HPC system also offers data storage solutions to efficiently manage and store data and results:
- Tier 1 Storage, short-term, not backed up storage on each computing cluster (Atlas ⤴, Ceres ⤴) for storing code, data, and intermediate results while performing a series of computational jobs
- Juno storage, with a large, multi-petabyte ARS long-term storage, periodically backed up to tape device.
Learn more about SCINet Data and Storage recommended procedures from the guide ⤴, provided by SCINet VRSC.
Benefits of using Juno
There are a few reasons why it is a good practice to move final results that are difficult to easily recreate to backed up Juno archive storage:
-
Data security:
Archiving final results in a backed up storage system helps to protect against data loss due to hardware failure or other unforeseen events. -
Data preservation:
Archiving final results ensures that the data will be preserved for long-term use and will not be lost due to changes in technology or file formats. -
Collaboration:
Archiving final results allows for easier sharing and collaboration with other researchers, as the data will be stored in a centralized location that is easily accessible. -
Reproducibility:
Archiving final results helps to ensure the reproducibility of research findings, as other researchers will be able to access the original data and results.
Juno access points
Juno transfer node: @nal-dtn.scinet.usda.gov
Juno end point via Globus: “NAL DTN 0” (recommended)
*SCINet account is required to get access
To obtain a SCINet account, a SCINet Account Request must be submitted. To learn more, visit the official Sign up for a SCINet account ⤴ guide or use links provided below:
Copy your data to Juno
Globus Online ⤴ is the recommended method for transferring data to and from the SCINet clusters. It provides faster data transfer speeds compared to
scp
, has a graphical interface, and does not require a GA verification code for every file transfer.
• using Globus (preferred)
Follow the step-by-step guide: Globus Data Transfer ⤴ to learn how to transfer data to and from Juno storage.
Juno end point via Globus: “NAL DTN 0”
• using command line
For small datat transfers it is allowed to move data to Juno storage using command line approches, such as scp
and rsync
.
- First, use terminal window on your local machine to log in to the transfer node on one of the SCINet clusters:
- Atlas:
ssh <user.name>@atlas-login.hpc.msstate.edu
- Ceres:
ssh <user.name>@ceres-dtn.scinet.usda.gov
- Then, use
rsync
command to synchronize (move new content or update changes) in yourproject_name
directory:
rsync -avz --no-p --no-g ttt nal-dtn.scinet.usda.gov:/LTS/project/<project_name>/
Note, the organization of the file system is slightly different on computing clusters: /project/project_name and Juno storage: /LTS/project/project_name .
Further Reading
- 2. Remote Access to HPC Resources
- 3. Setting up Your Home Directory for Data Analysis
- 4. Software Available on HPC
- 5. Introduction to Job Scheduling
- 6. Introduction to GNU Parallel
- 7. Introduction to Containers