DataScience Workbook / 07: Data Acquisition and Wrangling / 1. Remote data access / 1.1. Remote data transfer / 1.1.2. Copying Data via SSH using Command Line: scp, rsync

Introduction

Copying data using SSH (Secure Shell connection) provides a secure way to transfer data between two computers. The data is encrypted while it is being transmitted, providing protection against eavesdropping and tampering. By establishing an encrypted connection and verifying the identity of the user, SSH protocol ensures that the data is transmitted securely.

The data can be copied or synchronized between two computers using a command line tools such as:

  • scp (secure copy), recommended for transferring individual files [go to the section] or
  • rsync (secure synchronization), recommended to update the differences between the corresponding directories [go to the section]

What you need to get started?

All you need is a terminal window providing the command line interface and your access credentials to the remote machine. Typically, these include:

  • hostname of the remote machine
  • your username
  • your access password
  • multifactor authentication code

A hostname is a label that is assigned to a computer on a network, and it is used to identify the computer and its location on the network. The specific format of a hostname can vary, but it must be unique on the network in order to function correctly.
Here are some examples of hostnames:

  • example.com, A domain name that is used to identify a website or a network of computers.
  • www.example.com, The hostname of the web server that serves a website at the domain example.com.
  • ftp.example.com, The hostname of the FTP server that serves files for the domain example.com.
  • mail.example.com, The hostname of the mail server that handles email for the domain example.com.
  • 192.168.1.100, An IP address that is used to identify a computer on a local network.
  • my-computer, A hostname that is assigned to a computer on a local network.

Command SYNTAX

The command syntax for both command line tools, scp and rsync, use similar components:

scp <source> <destination>  or  rsync <source> <destination>

e.g.,

scp /local/directory/file.txt username@remote-hostname:/remote/directory/

where:

  • file.txt - is a data file you want to transfer
  • /local/directory/ - is a relative or absolute path on your local machine to data location
  • username - is the name of your user account on the remote machine
  • @ - is a linker in the username@hostname syntax
  • remote-hostname - is the a label that is assigned to a remote computer
  • /remote/directory/ - is the relative or absolute path on a remote machine

A file path is used to specify the location of a file or directory on the computer’s file system. There are two types of file paths: absolute paths and relative paths.

  • absolute path:
    An absolute path is a complete path to a file or directory that starts from the root directory of the file system. Absolute paths provide a complete and unambiguous reference to a file or directory, and they always start with a / character.
    For example, /home/user/documents/file.txt is an absolute path to a file in a directory on the file system.
  • relative path:
    A relative path, on the other hand, is a path to a file or directory that is relative to the current working directory. Relative paths do NOT start with a / character, and they are interpreted relative to the current working directory.
    For example, ./documents/file.txt is a relative path to a file in a directory that is located in the current working directory.
    • current directory: ./
    • one directory above: ../
    • two directories above: ../../ …and so on

SCP (secure copy)

scp (secure copy) is a command line tool for copying files between computers using SSH (Secure Shell) protocol for data transfer. It works by establishing an encrypted ssh connection between two computers and copying the data over this connection.

SCP is usually available on Linux and Mac in terminal, and on Windows 10 in Windows PowerShell.

Getting started:
Open terminal window on your local machine and copy-paste the command example (provided below), while adjusting paths and credentials to your needs (according to directions from the Command SYNTAX section).

Copy file: local to remote

scp /local/directory/file.txt username@remote-hostname:/remote/directory/

Copy file: remote to local

scp username@remote-hostname:/remote/directory/file.txt /local/directory/

Copy a directory

If you want to copy the entire directory, use the scp -r command, where the -r flag tells copy the directory and its contents recursively.

- from local to remote

scp -r /local/directory/file.txt username@remote-hostname:/remote/directory/

- from remote to local

scp -r username@remote-hostname:/remote/directory/file.txt /local/directory/

Admins of some HPC systems, e.g. SCINet Scientific Computing

recommend to use scp to transfer a single file only. So please be aware of this note:

"It is not advised to use scp -r command to transfer directories to Ceres, since the setgid bit on directories at destination is not inherited. This is not a problem if directories are copied to /home/$USER but is a problem when copying to /project area and usually results in quota exceeded errors."

If you decide to use scp to transfer directories to Ceres cluster follow the instructions provided at SCINet website: Small Data Transfer Using scp ⤴.

…about scp command and all available options from the man scp command.

Here are some options most commonly used with the scp command:

option description
-r Recursively copy the entire contents of a directory, including subdirectories and files.
-v Verbose output. Display the progress of the transfer and any error messages.
-P 8080 Specify the port to use for the connection, 8080 is just an example.
-C Compression during transfer.
-q Quiet mode. Suppress output, including error messages.

Example 1: Recursively copy a directory and its contents

scp -r ~/data user@example-hostname:~/backup

Example 2: Display verbose output during the transfer

scp -v ~/data user@example-hostname:~/backup

Example 3: Specify the port to use for the connection

scp -p 8080 ~/data user@example-hostname:~/backup

Example 4: Enable data compression during transfer

scp -C ~/data user@example-hostname:~/backup

Example 5: Suppress output, including error messages

scp -q ~/data user@example-hostname:~/backup

RSYNC (secure synchro)

rsync (secure synchronization) is a command line tool for efficiently transferring and synchronizing files between computers using SSH (Secure Shell) protocol for data transfer. It works by establishing an encrypted ssh connection between two computers and copying the data over this connection. This tool is commonly used for backup, data replication, and file distribution.

RSYNC is usually available on Linux and Mac in terminal, and on Windows 10 in Windows PowerShell.

rsync works by comparing the source and destination files and only transferring the differences, making it much more efficient than other file transfer tools, such as cp or scp, when the source and destination files are similar. This makes rsync particularly useful for transferring large files or large collections of files that change only slightly over time, as it can significantly reduce the amount of data that needs to be transferred.

In addition to its efficiency, rsync also provides a number of features that make it a versatile tool for file transfer and synchronization, such as:

  • support for preserving file permissions and attributes,
  • excluding files based on patterns,
  • and transferring files over an encrypted ssh connection.

Getting started:
Open terminal window on your local machine and copy-paste the command example (provided below), while adjusting paths and credentials to your needs (according to directions from the Command SYNTAX section).

The general syntax for synchronization requires to provide the source and destination locations. You can synchronize locations on a single machine or between different computers.

rsync <source> <destination>

It can be practical to use the rsync command with -avz flags:

  • -a - preserves file attributes such as permissions and ownership
  • -v - provides verbose output
  • -z - compresses the data during transfer

On the first transfer with rsync all data will be copied, while on future uses only the differences will be updated.

Synchronize local to remote

rsync -avz /local/directory username@remote-hostname:/remote/directory

Synchronize remote to local

rsync -avz username@remote-hostname:/remote/directory /local/directory

Synchronize File or Dir

If you wanted to synchronize the file file.txt stored in your home directory (~/) from your local computer to a remote computer with the hostname example-hostname and place it in the directory ~/backup, you could run the following command:

rsync ~/file.txt user@example-hostname:~/backup

If you wanted to synchronize the directory ~/data from your local computer to a remote computer with the hostname example-hostname and place it in the directory ~/backup, you could run the following command:

rsync -avz ~/data user@example-hostname:~/backup

Using -avz flags will also:

  1. preserve file attributes,   2. provide verbose output,   3. compress the data during transfer.

…about rsync command and all available options from the man rsync command.

Here are some options most commonly used with the rsync command:

option description
-a Archive mode. A shorthand for a set of options that preserve file attributes such as permissions, ownership, timestamps, and symbolic links.
-v Verbose output. Display the progress of the transfer and a list of the files being transferred.
-z Compress the data during transfer.
-r Recursively copy the entire contents of a directory, including subdirectories and files.
-n Dry run. Perform a test run without actually transferring any files.
-u Update only. Transfer only files that are newer on the source than on the destination.
--exclude='*.log' Exclude files or directories from the transfer based on a pattern, .log’* is an example value for the option.

Example 1: Transfer files in archive mode

rsync -a ~/data user@example-hostname:~/backup

Example 2: Display verbose output during the transfer

rsync -v ~/data user@example-hostname:~/backup

Example 3: Compress the data during transfer

rsync -z ~/data user@example-hostname:~/backup

Example 4: Recursively copy a directory and its contents

rsync -r ~/data user@example-hostname:~/backup

Example 5: Perform a dry run without transferring any files

rsync -n ~/data user@example-hostname:~/backup

Example 6: Update only files that are newer on the source than on the destination

rsync -u ~/data user@example-hostname:~/backup

Example 7: Exclude files or directories based on a pattern

rsync --exclude='*.log' ~/data user@example-hostname:~/backup