Introduction
Copying data using SSH (Secure Shell connection) provides a secure way to transfer data between two computers. The data is encrypted while it is being transmitted, providing protection against eavesdropping and tampering. By establishing an encrypted connection and verifying the identity of the user, SSH protocol ensures that the data is transmitted securely.
The data can be copied or synchronized between two computers using a command line tools such as:
scp
(secure copy), recommended for transferring individual files [go to the section] orrsync
(secure synchronization), recommended to update the differences between the corresponding directories [go to the section]
What you need to get started?
All you need is a terminal window providing the command line interface and your access credentials to the remote machine. Typically, these include:
hostname
of the remote machine- your
username
- your access
password
- multifactor
authentication code
A hostname
is a label that is assigned to a computer on a network, and it is used to identify the computer and its location on the network. The specific format of a hostname can vary, but it must be unique on the network in order to function correctly.
Here are some examples of hostnames:
- example.com, A domain name that is used to identify a website or a network of computers.
- www.example.com, The hostname of the web server that serves a website at the domain example.com.
- ftp.example.com, The hostname of the FTP server that serves files for the domain example.com.
- mail.example.com, The hostname of the mail server that handles email for the domain example.com.
- 192.168.1.100, An IP address that is used to identify a computer on a local network.
- my-computer, A hostname that is assigned to a computer on a local network.
Command SYNTAX
The command syntax for both command line tools, scp
and rsync
, use similar components:
scp <source> <destination> or rsync <source> <destination>
e.g.,
scp /local/directory/file.txt username@remote-hostname:/remote/directory/
where:
file.txt
- is a data file you want to transfer/local/directory/
- is a relative or absolute path on your local machine to data locationusername
- is the name of your user account on the remote machine@
- is a linker in the username@hostname syntaxremote-hostname
- is the a label that is assigned to a remote computer/remote/directory/
- is the relative or absolute path on a remote machine
A file path is used to specify the location of a file or directory on the computer’s file system. There are two types of file paths: absolute paths and relative paths.
- absolute path:
An absolute path is a complete path to a file or directory that starts from the root directory of the file system. Absolute paths provide a complete and unambiguous reference to a file or directory, and they always start with a/
character.
For example,/home/user/documents/file.txt
is an absolute path to a file in a directory on the file system. - relative path:
A relative path, on the other hand, is a path to a file or directory that is relative to the current working directory. Relative paths do NOT start with a/
character, and they are interpreted relative to the current working directory.
For example,./documents/file.txt
is a relative path to a file in a directory that is located in the current working directory.- current directory:
./
- one directory above:
../
- two directories above:
../../
…and so on
- current directory:
SCP (secure copy)
scp
(secure copy) is a command line tool for copying files between computers using SSH (Secure Shell) protocol for data transfer. It works by establishing an encrypted ssh
connection between two computers and copying the data over this connection.
SCP is usually available on Linux and Mac in terminal, and on Windows 10 in Windows PowerShell.
Getting started:
Open terminal window on your local machine and copy-paste the command example (provided below), while adjusting paths and credentials to your needs (according to directions from the Command SYNTAX section).
Copy file: local to remote
scp /local/directory/file.txt username@remote-hostname:/remote/directory/
Copy file: remote to local
scp username@remote-hostname:/remote/directory/file.txt /local/directory/
Copy a directory
If you want to copy the entire directory, use the scp -r
command, where the -r
flag tells copy the directory and its contents recursively.
- from local to remote
scp -r /local/directory/file.txt username@remote-hostname:/remote/directory/
- from remote to local
scp -r username@remote-hostname:/remote/directory/file.txt /local/directory/
Admins of some HPC systems, e.g. SCINet Scientific Computing
recommend to use scp
to transfer a single file only. So please be aware of this note:
"It is not advised to usescp -r
command to transfer directories to Ceres, since the setgid bit on directories at destination is not inherited. This is not a problem if directories are copied to/home/$USER
but is a problem when copying to/project
area and usually results in quota exceeded errors."
If you decide to use scp
to transfer directories to Ceres cluster follow the instructions provided at SCINet website: Small Data Transfer Using scp ⤴.
…about scp
command and all available options from the man scp
command.
Here are some options most commonly used with the scp
command:
option | description |
---|---|
-r |
Recursively copy the entire contents of a directory, including subdirectories and files. |
-v |
Verbose output. Display the progress of the transfer and any error messages. |
-P 8080 |
Specify the port to use for the connection, 8080 is just an example. |
-C |
Compression during transfer. |
-q |
Quiet mode. Suppress output, including error messages. |
Example 1: Recursively copy a directory and its contents
scp -r ~/data user@example-hostname:~/backup
Example 2: Display verbose output during the transfer
scp -v ~/data user@example-hostname:~/backup
Example 3: Specify the port to use for the connection
scp -p 8080 ~/data user@example-hostname:~/backup
Example 4: Enable data compression during transfer
scp -C ~/data user@example-hostname:~/backup
Example 5: Suppress output, including error messages
scp -q ~/data user@example-hostname:~/backup
RSYNC (secure synchro)
rsync
(secure synchronization) is a command line tool for efficiently transferring and synchronizing files between computers using SSH (Secure Shell) protocol for data transfer. It works by establishing an encrypted ssh
connection between two computers and copying the data over this connection. This tool is commonly used for backup, data replication, and file distribution.
RSYNC is usually available on Linux and Mac in terminal, and on Windows 10 in Windows PowerShell.
rsync
works by comparing the source and destination files and only transferring the differences, making it much more efficient than other file transfer tools, such as cp
or scp
, when the source and destination files are similar. This makes rsync particularly useful for transferring large files or large collections of files that change only slightly over time, as it can significantly reduce the amount of data that needs to be transferred.
In addition to its efficiency, rsync
also provides a number of features that make it a versatile tool for file transfer and synchronization, such as:
- support for preserving file permissions and attributes,
- excluding files based on patterns,
- and transferring files over an encrypted ssh connection.
Getting started:
Open terminal window on your local machine and copy-paste the command example (provided below), while adjusting paths and credentials to your needs (according to directions from the Command SYNTAX section).
The general syntax for synchronization requires to provide the source and destination locations. You can synchronize locations on a single machine or between different computers.
rsync <source> <destination>
It can be practical to use the rsync
command with -avz
flags:
-a
- preserves file attributes such as permissions and ownership-v
- provides verbose output-z
- compresses the data during transfer
On the first transfer with rsync
all data will be copied, while on future uses only the differences will be updated.
Synchronize local to remote
rsync -avz /local/directory username@remote-hostname:/remote/directory
Synchronize remote to local
rsync -avz username@remote-hostname:/remote/directory /local/directory
Synchronize File or Dir
If you wanted to synchronize the file file.txt
stored in your home directory (~/
) from your local computer to a remote computer with the hostname example-hostname
and place it in the directory ~/backup
, you could run the following command:
rsync ~/file.txt user@example-hostname:~/backup
If you wanted to synchronize the directory ~/data
from your local computer to a remote computer with the hostname example-hostname
and place it in the directory ~/backup
, you could run the following command:
rsync -avz ~/data user@example-hostname:~/backup
Using -avz flags will also:
- preserve file attributes, 2. provide verbose output, 3. compress the data during transfer.
…about rsync
command and all available options from the man rsync
command.
Here are some options most commonly used with the rsync
command:
option | description |
---|---|
-a |
Archive mode. A shorthand for a set of options that preserve file attributes such as permissions, ownership, timestamps, and symbolic links. |
-v |
Verbose output. Display the progress of the transfer and a list of the files being transferred. |
-z |
Compress the data during transfer. |
-r |
Recursively copy the entire contents of a directory, including subdirectories and files. |
-n |
Dry run. Perform a test run without actually transferring any files. |
-u |
Update only. Transfer only files that are newer on the source than on the destination. |
--exclude='*.log' |
Exclude files or directories from the transfer based on a pattern, ‘.log’* is an example value for the option. |
Example 1: Transfer files in archive mode
rsync -a ~/data user@example-hostname:~/backup
Example 2: Display verbose output during the transfer
rsync -v ~/data user@example-hostname:~/backup
Example 3: Compress the data during transfer
rsync -z ~/data user@example-hostname:~/backup
Example 4: Recursively copy a directory and its contents
rsync -r ~/data user@example-hostname:~/backup
Example 5: Perform a dry run without transferring any files
rsync -n ~/data user@example-hostname:~/backup
Example 6: Update only files that are newer on the source than on the destination
rsync -u ~/data user@example-hostname:~/backup
Example 7: Exclude files or directories based on a pattern
rsync --exclude='*.log' ~/data user@example-hostname:~/backup
Further Reading
File transfer using irodsRemote data download
Downloading Online Data using WGET
Downloading online data using API
Downloading online data using Python-based web scraping
Downloading online repos using GIT: [GitHub, Bitbucket, SourceForge]
Downloading a single folder or file from GitHub
Remote data preview (without downloading)
Viewing text files using UNIX commands
Viewing PDF and PNG files using X11 SSH connection
Viewing graphics in a terminal as the text-based ASCII art
Mounting remote folder on a local machine
Data manipulation
Data wrangling: use ready-made apps
MODULE 08: Data Visualization