Introduction

Contibutions for this markdown document came from Matt Hufford and was modified with permission from his BCB546 course.

Get Help

No matter the problem (with Git or anything else in this workbook), someone else has encountered it already.

  • Google is an immensely powerful tool for troubleshooting computational problems
    • If you can articulate your problem in the form of a Google search query, you will likely find the answer online
    • Good answers will primarily be on Stack Overflow, like my recent search for how to include Unicode emoji characters in my Markdown presentation 😎 💻

GitHub Repo - Best Practices

  • Commit often. Keep commits small and frequent. This also helps you have informative commit messages
    • Make sure every commit “works”. Never commit if it doesn’t compile, runs with errors, or requires files that only exist in your workspace. Since commits are snapshots, this should mean a working snapshot.
    • Write commit messages that will be readable and useful to others and future-you. (This is really difficult.) Also, remember when your repository is public, anyone can see your commit messages. gource
  • Use multiple branches. This is one of Git’s most powerful features and enhances collaboration and reproducibility. You should also decide on a workflow (like this or this).

  • Pull from the remote before you do anything in your repository. This reduces potential merge conflicts. So before you make changes to any files git pull, after you make a commit and before you push git pull.

  • Don’t commit unnecessary files. Often these are files that may be generated by your project and can lead to merge conflicts. You can keep these files in your working directory without being tracked using the .gitignore file (or the ~/.gitignore_global file in your home directory).

  • Review your changes before committing.

  • Use aliases. You can add specific aliases for Git commands in the .gitconfig file that lives in your home directory.

    [alias]
    hist = log --graph --pretty=format:'%h %ad | %s%d [%an]' --date=short
    last = log -1 HEAD
    ci = commit
    st = status
    

GitHub Pages

GitHub provides hosting for websites (https://pages.github.com)

You can create a website for any GitHub project repository, simply by enabling GitHub Pages in your settings. If you purchase your own domain, it is simple to redirect your GitHub Pages site to the new domain.

GitHub Project Example

A grant proposal that involved 5-6 contributors working on several documents:

3111123 Aug  2  2015 NSF_Project_Description_Aug2-4PM.pdf
2824531 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_v2_JN.docx
2366621 Aug  2  2015 NSF EAH final revisions 2015-08-02_fin.docx
1581164 Aug  2  2015 NSF Panama Grant Draft 2015-08-01_TAH_working.docx
2908139 Aug  2  2015 NSF EAH revisions 2015-08-02.docx
  32538 Aug  2  2015 Hypothesis4edited by Aafke 010815 RAR.docx
 164430 Aug  1  2015 Hypothesis4edited by Aafke 010815.docx
2905973 Aug  1  2015 NSF Panama Grant Draft 2015-08-01.docx
 157777 Aug  1  2015 Chemical volatile objectives.docx
 708115 Jul 31  2015 NSF Panama Grant Draft 2015-07-31 alt fig.docx
1086603 Jul 31  2015 NSF Panama Grant Draft 2015-07-30 alt fig.docx
 157636 Jul 31  2015 H2 Text.docx
 134487 Jul 30  2015 H1 Text.docx
 688048 Jul 30  2015 NSF Panama Grant Draft 2015-07-30.docx
 509467 Jul 30  2015 NSF Panama Grant Draft 2015-07-29.docx
 482474 Jul 28  2015 NSF Panama Grant Draft 2015-07-28.docx
 241316 Jul 27  2015 NSF Panama Grant Draft 2015-07-27.docx
 258099 Jul 25  2015 NSF Panama Grant Draft 2015-07-25.docx
 265654 Jul 22  2015 NSF Panama Grant Draft 2015-07-22.docx

Tracking Your Science

  • What are the potential shortcomings of this approach?

  • With a version control system, the file history looks a lot different.

Version Control

  • A version control system improves organization and collaboration gource

  • Imagine a project with many more collaborators…

A snapshot of RevBayes: 28 contributors; 47,938 files; 5,200,936 lines of code

gource

Remote Git repository hosting services

  • There are several options for remote hosts

  • You can set up your own server and host all of your repositories privately using Gitolite or Gitosis (not recommended)

  • You can use a web-based Git host

    • GitHub Drawing: free public repositories & paid private repositories, with repositories over 1 GB discouraged

    • Bitbucket Drawing: unlimited free public & free private repositories, limited at 1-2 GB/repo (register with your .edu email to get unlimited collaborators on private repositories)

    • GitLab Drawing: unlimited free public & free private repositories with no limits on collaborators, but limited at 10 GB/repo

Git repo does have some limitations

Most relevant to this course and our fields are:

  • Repository size: If your repository gets very large, working within it can be a problem. The network speed will be the main bottleneck. This is why the online Git hosts discourage repos over 1-2 GB.

  • File size: A single large file can be problematic, particularly if it is frequently being modified. This can also lead to swollen repositories. GitHub will not allow any file over 100 MB.

  • File type: Git works best with text files, you can have binary files in your repository, but you lose some functionality of version control (like diff). Binary files are also often very large. Thus, it is recommended that you keep binary files to a minimum. (This means that it is not practical to use Git to collaborate on MS Word documents.)

Demo: Clone a Repository

You become familiar with the concepts by using Git. We will start by cloning ipyrad.

ipyrad is a python program for interactive assembly and analysis of RAD-seq data sets

Start by going to https://github.com/dereneaton/ipyrad to get the URL.

$ git clone git@github.com:dereneaton/ipyrad.git
Cloning into 'ipyrad'...
remote: Counting objects: 11341, done.
remote: Compressing objects: 100% (304/304), done.
remote: Total 11341 (delta 199), reused 0 (delta 0), pack-reused 11037
Receiving objects: 100% (11341/11341), 70.75 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (8275/8275), done.
Checking connectivity... done.

Note that using the url git@github... instead of https://github... uses . If you haven’t configured your key yet, this may not work.

Some helpful commands for the cloned repository

xkcd

(image source https://xkcd.com/1597)

git pull

Always pull from the repository before doing anything with the contents:

$ git pull origin master

git pull <commit>

Pull to update from h5step7

$ git pull origin h5step7

git status

Check the status of your local files:

$ git status

git log

See the log of the snapshots and their commit messages:

$ git log

git diff

Compare the differences a file you have modified and the last commit:

$ git diff README.rst

git checkout <file>

Replace a file you modified with the most recent commit using checkout:

$ git checkout README.rst

git branch

Find out which branch you’re on:

$ git branch

git checkout <branch>

Change to a different branch:

$ git checkout h5step7

What can you do with someone else’s GitHub repository?

  • If you do not have push rights
    • You can only clone the repository and make changes locally
    • You can fork their repository and develop it independently
    • You can submit a pull request to their repository if you want to contribute to the original project
    • You can contact the owner of the repository and ask them to include you as a contributor and give you push rights (it is recommended that you discuss the nature of your collaboration with them first)

Creating a Repository

Let’s create a new Git repository and host it on GitHub using our accounts (for the demo, I will create the repo in the EEOB-BioData organization so that everyone will have easy access to it).

But first, let’s tell Git who you are:

$ git config --global user.name "Ada Lovelace"
$ git config --global user.email "adal@analyticalengine.com"

Some helpful commands for your new repository

git init

Initialize a new Git repository:

$ git init

git add

After a file has been added or modified, you can stage the file:

$ git add README.md

git commit

Commit the file to your local repository and write a message:

$ git commit -m "initial commit (README.md)"

After you have made your commit, the repository is up-to-date locally. Next you need to connect your local repo to the remote.

git remote

Add the remote:

$ git remote add origin git@github.com:username/repo-name.git

git push

Push your snapshot to the remote:

$ git push -u origin master