Introduction
Contibutions for this markdown document came from Matt Hufford and was modified with permission from his BCB546 course.
Get Help
No matter the problem (with Git or anything else in this workbook), someone else has encountered it already.
- Google is an immensely powerful tool for troubleshooting computational problems
- If you can articulate your problem in the form of a Google search query, you will likely find the answer online
- Good answers will primarily be on Stack Overflow, like my recent search for how to include Unicode emoji characters in my Markdown presentation 😎 💻
GitHub Repo - Best Practices
- Commit often. Keep commits small and frequent. This also helps you have informative commit messages
- Make sure every commit “works”. Never commit if it doesn’t compile, runs with errors, or requires files that only exist in your workspace. Since commits are snapshots, this should mean a working snapshot.
- Write commit messages that will be readable and useful to others and future-you. (This is really difficult.) Also, remember when your repository is public, anyone can see your commit messages.
-
Use multiple branches. This is one of Git’s most powerful features and enhances collaboration and reproducibility. You should also decide on a workflow (like this or this).
-
Pull from the remote before you do anything in your repository. This reduces potential merge conflicts. So before you make changes to any files
git pull
, after you make a commit and before you pushgit pull
. -
Don’t commit unnecessary files. Often these are files that may be generated by your project and can lead to merge conflicts. You can keep these files in your working directory without being tracked using the
.gitignore
file (or the~/.gitignore_global
file in your home directory). -
Review your changes before committing.
-
Use aliases. You can add specific aliases for Git commands in the
.gitconfig
file that lives in your home directory.[alias] hist = log --graph --pretty=format:'%h %ad | %s%d [%an]' --date=short last = log -1 HEAD ci = commit st = status
GitHub Pages
GitHub provides hosting for websites (https://pages.github.com)
- You can create a free website hosted on GitHub simply by creating a repository in your account called:
username.github.io
- for example, Wade Dismukes (a student in EEB) created his site in GitHub pages: https://wadedismukes.github.io
- another website (http://phyloworks.org) is hosted on GitHub, using a Bootstrap CSS template.
You can create a website for any GitHub project repository, simply by enabling GitHub Pages in your settings.
GitHub Project Example
A grant proposal that involved 5-6 contributors working on several documents:
3111123 Aug 2 2015 NSF_Project_Description_Aug2-4PM.pdf
2824531 Aug 2 2015 NSF Panama Grant Draft 2015-08-01_v2_JN.docx
2366621 Aug 2 2015 NSF EAH final revisions 2015-08-02_fin.docx
1581164 Aug 2 2015 NSF Panama Grant Draft 2015-08-01_TAH_working.docx
2908139 Aug 2 2015 NSF EAH revisions 2015-08-02.docx
32538 Aug 2 2015 Hypothesis4edited by Aafke 010815 RAR.docx
164430 Aug 1 2015 Hypothesis4edited by Aafke 010815.docx
2905973 Aug 1 2015 NSF Panama Grant Draft 2015-08-01.docx
157777 Aug 1 2015 Chemical volatile objectives.docx
708115 Jul 31 2015 NSF Panama Grant Draft 2015-07-31 alt fig.docx
1086603 Jul 31 2015 NSF Panama Grant Draft 2015-07-30 alt fig.docx
157636 Jul 31 2015 H2 Text.docx
134487 Jul 30 2015 H1 Text.docx
688048 Jul 30 2015 NSF Panama Grant Draft 2015-07-30.docx
509467 Jul 30 2015 NSF Panama Grant Draft 2015-07-29.docx
482474 Jul 28 2015 NSF Panama Grant Draft 2015-07-28.docx
241316 Jul 27 2015 NSF Panama Grant Draft 2015-07-27.docx
258099 Jul 25 2015 NSF Panama Grant Draft 2015-07-25.docx
265654 Jul 22 2015 NSF Panama Grant Draft 2015-07-22.docx
Tracking Your Science
-
What are the potential shortcomings of this approach?
-
With a version control system, the file history looks a lot different.
Version Control
-
A version control system improves organization and collaboration
-
Imagine a project with many more collaborators…
A snapshot of RevBayes: 28 contributors; 47,938 files; 5,200,936 lines of code
Remote Git repository hosting services
-
There are several options for remote hosts
-
You can set up your own server and host all of your repositories privately using Gitolite or Gitosis (not recommended)
-
You can use a web-based Git host
-
GitHub : free public repositories & paid private repositories, with repositories over 1 GB discouraged
-
Bitbucket : unlimited free public & free private repositories, limited at 1-2 GB/repo (register with your .edu email to get unlimited collaborators on private repositories)
-
GitLab : unlimited free public & free private repositories with no limits on collaborators, but limited at 10 GB/repo
-
Git repo does have some limitations
Most relevant to this course and our fields are:
-
Repository size: If your repository gets very large, working within it can be a problem. The network speed will be the main bottleneck. This is why the online Git hosts discourage repos over 1-2 GB.
-
File size: A single large file can be problematic, particularly if it is frequently being modified. This can also lead to swollen repositories. GitHub will not allow any file over 100 MB.
-
File type: Git works best with text files, you can have binary files in your repository, but you lose some functionality of version control (like
diff
). Binary files are also often very large. Thus, it is recommended that you keep binary files to a minimum. (This means that it is not practical to use Git to collaborate on MS Word documents.)
Demo: Clone a Repository
You become familiar with the concepts by using Git. We will start by cloning ipyrad.
ipyrad
is a python program for interactive assembly and analysis of RAD-seq data sets
Start by going to https://github.com/dereneaton/ipyrad to get the URL.
$ git clone git@github.com:dereneaton/ipyrad.git
Cloning into 'ipyrad'...
remote: Counting objects: 11341, done.
remote: Compressing objects: 100% (304/304), done.
remote: Total 11341 (delta 199), reused 0 (delta 0), pack-reused 11037
Receiving objects: 100% (11341/11341), 70.75 MiB | 5.91 MiB/s, done.
Resolving deltas: 100% (8275/8275), done.
Checking connectivity... done.
Note that using the url git@github...
instead of https://github...
uses .
If you haven’t configured your key yet, this may not work.
Some helpful commands for the cloned repository
(image source https://xkcd.com/1597)
git pull
Always pull
from the repository before doing anything with the contents:
$ git pull origin master
git pull <commit>
Pull to update from h5step7
$ git pull origin h5step7
git status
Check the status
of your local files:
$ git status
git log
See the log
of the snapshots and their commit messages:
$ git log
git diff
Compare the diff
erences a file you have modified and the last commit:
$ git diff README.rst
git checkout <file>
Replace a file you modified with the most recent commit using checkout
:
$ git checkout README.rst
git branch
Find out which branch
you’re on:
$ git branch
git checkout <branch>
Change to a different branch:
$ git checkout h5step7
What can you do with someone else’s GitHub repository?
- If you do not have push rights
- You can only clone the repository and make changes locally
- You can fork their repository and develop it independently
- You can submit a pull request to their repository if you want to contribute to the original project
- You can contact the owner of the repository and ask them to include you as a contributor and give you push rights (it is recommended that you discuss the nature of your collaboration with them first)
Creating a Repository
Let’s create a new Git repository and host it on GitHub using our accounts (for the demo, I will create the repo in the EEOB-BioData organization so that everyone will have easy access to it).
But first, let’s tell Git who you are:
$ git config --global user.name "Ada Lovelace"
$ git config --global user.email "adal@analyticalengine.com"
Some helpful commands for your new repository
git init
Initialize a new Git repository:
$ git init
git add
After a file has been added or modified, you can stage the file:
$ git add README.md
git commit
Commit the file to your local repository and write a message:
$ git commit -m "initial commit (README.md)"
After you have made your commit, the repository is up-to-date locally. Next you need to connect your local repo to the remote.
git remote
Add the remote:
$ git remote add origin git@github.com:username/repo-name.git
git push
Push your snapshot to the remote:
$ git push -u origin master