DataScience Workbook / 09. Project Management / 2. Storage & Version Control / 2.2 Online Hosting Platforms for GIT Repositories / 2.2.1 Introduction to GitHub
Github is a code hosting platform for version control and collaboration. There are three primary use cases for using a version control system like github in science.
- Sharing of bioinformatics scripts
- Bioinformatic Program development
- Documentation of bioinformatic analyses
Probably the most important use-case for new users is documentation. For those transitioning from a wet-lab, git-hub repos can be thought of as the equivalent to a web-lab notebook, where every command performed in a bioinformatics analysis is recorded with an explanation as to why it was performed, when it was performed (date) and where it was performed (pwd). Github documents can be written using markdown (See markdown tutorial for an introduction).
How to get a Github account
Signing up for an account is very easy. Just go to the signup webpage and fill out the form.
Most choose the free Unlimited public repositories option and don’t set up an organization right away.
After you have an account, and if you are a researcher or educator you can request a free upgrade at about-github-education-for-educators-and-researchers/.
Setting up your Authentication key to streamline remote access
Authentication with a SSH key
Passwords are not always secure and can be annoying to type.
- SSH keys are much more secure and allow you to log in without typing your password (or just a different, simpler passphrase).
- When you generate a key, you create two things: a public key and a private key.
- You place the public key on any server and then unlock it by connecting to it with a client that already has the private key.
- When the two match up, the system unlocks without the need for a password.
SSH keys are also very important for using Git with remote hosts (e.g., GitHub)
Setup Authentication with a SSH key
Log in to the
or open a terminal on your local machine.
Create the key pair in your home directory:
$ ssh-keygen -t rsa
ssh-keygen command has been issued, you will be asked a few questions:
Enter file in which to save the key (/home/yourID/.ssh/id_rsa):
You can just hit enter here and it should save it to the file path given.
Enter passphrase (empty for no passphrase):
Here is where you decide if you want to password-protect your key. The downside, to having a passphrase, is then having to type it in each time you use the Key Pair.
Create the SSH key
$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/userid/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /home/userid/.ssh/id_rsa. Your public key has been saved in /home/userid/.ssh/id_rsa.pub. The key fingerprint is: 4a:dd:0a:c6:35:4e:3f:ed:27:38:8c:74:44:4d:93:67 userid@Arrow The key's randomart image is: +--[ RSA 2048]----+ | .oo. | | . o.E | | + . o | | . = = . | | = S = . | | o + = + | | . o + o . | | . o | | | +-----------------+
Copy SSH key to GitHub
You now have a file called
id_rsa.pub in your
[userid@hpc-class ~]$ cat .ssh/id_rsa.pub ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQC8QgicqcpFPeyYZhJFW5lBTAdAjHBaYzLwH3l7+lrdmpEKMMMhXMZV5ucxN5WzWU/ERYviYQvQ8NBzkSuHo+SgNJkufF92UqeHIfI/KqgVEGbQn6NGfa5WFBgWZAJAjMzUUrAhJ2qsBez4M1f70os1S2SNcNfoFAJRdWEGE8SX2lpww8+VdCOY6ONw3AYbZbrZtn/ua2hJg+XjYb3T04ggV6TNyV4nnN5r2pRIjJA5OBX1TWcB9HOE4ZIGZoZlk5OYuUJ5rOfuzVLqanWayB3ujuPxW3IUmI6XJt7uDc1N5iVNs2FusjSZmuggWtzCw/pb7EExvNxYMYOxCsewjE0L userid@<remotehost>
Copy the entire contents of this file and add it to your GitHub account.
How to Add SSH key to GitHub Repo
Add your new ssh key to your GitHub account by going to SSH and GPG keys in your profile Settings. You will need to be logged into GitHub for the above link to work.
image source: https://codenvy.com/docs/user-guide/git-svn/index.html
Version Control Systems
What do they allow you to do?
- Track changes made to each file
- Revert the entire project or a single file to a previous version
- Review changes made over time
- View who modified the file (and
blamethem for something if necessary)
- Collaborate with others without overwriting their work or risk file corruption, etc.
- Have multiple independent branches of the same repository and make changes without effecting others’ work.
- And more…
Why to ♥ Git?
Git manages a filesystem as a set of snapshots
Snapshots are called commits
(image source: https://git-scm.com)
♥ Git ♥
Almost every interaction with Git happens locally
(image source https://git-scm.com)
♥ Git ♥
A remote host adds an additional level to a Git repository
Also, allows for collaboration and back-up.
(image source https://www.git-tower.com)
Contibutions for this markdown document came from Matt Hufford and was modified with permission from his BCB546 course. Check out his amazing power point he created at the above link.
- 3. Documentation Improvement Tools
- 4. Collaboration & Communication
- 5. Resource Management & Productivity