DataScience Workbook / 09: Project Management / 3. Resource Management / 3.3. Storage & Version Control

Introduction

Version control is a system for managing changes to software projects, documents, or any other set of files. It provides a history of changes, so you can track who made changes and when, and it allows multiple people to collaborate on a project while ensuring that changes are properly managed and coordinated.

Version Control Tools

In general, Git ⤴ is considered to be the most popular version control system today, due to its wide adoption, powerful features, and active community of users. There are a few other version control tools available, including Subversion (SVN) ⤴ and Mercurial ⤴, that are used for managing code and data in software development and other projects.

GIT

Git ⤴ is a distributed version control system, which means that each user has a full copy of the code repository on their own computer, rather than relying on a central repository. Git is widely used by software developers and organizations. This makes it ideal for teams working on large projects or for individuals who need to work offline. It was created by Linus Torvalds in 2005, and it has since become one of the most popular version control systems in use today. Git is known for its speed, reliability, and flexibility, and it is used by organizations of all sizes, from small open-source projects to large multinational corporations.

If you are planning to contribute to any GIT-based repository, it is worthwhile for you to learn more about the useful options offered by git. In this case, follow the hands-on tutorials:


In the next section: Types of GIT-based repositories, you will learn which popular online platforms hosting repositories use Git-based version control.

Subversion (SVN)

Subversion (SVN) ⤴ is a centralized version control system, which means that all data is stored in a central repository. It is designed to be easy to use and provides a number of features that make it a popular choice for many teams, including:

  • version history,
  • branching and merging,
  • and easy collaboration between multiple users.

Subversion is also known for its stability, scalability, and compatibility with a wide range of platforms and tools.

Mercurial

Mercurial ⤴ s a distributed version control system that is similar to Git. It was created in 2005, and it is widely used by software developers and organizations, especially those who host their projects at Bitbucket ⤴. Mercurial is designed to be fast and lightweight, and provides a number of features for managing complex development workflows, such as branching, merging, and issue tracking. It is also highly customizable and can be extended through the use of plugins and extensions. Mercurial is known for its simplicity, performance, and ease of use.

Types of GIT-based repositories

Git ⤴ can be used to manage and version control any type of resource that is stored in a file and changes over time, whether it’s code, data, documentation, or something else. The key benefit of using Git is that it allows you to keep track of changes over time & revert back to previous versions if needed, and collaborate with others on a project.

Git can be used to manage and version control a wide variety of online resources, including:

resource DESCRIPTION
CODE Git is primarily used for version control of software code and applications. This includes source code written in a variety of programming languages such as Python, Java, C++, etc. ISUgenomics/data_wrangling repo hosted on GitHub, contains mini python apps for common tasks in data processing
WEB DEVELOPMENT Git can be used to manage changes to HTML, CSS, and JavaScript files that make up a website or web application. ISUgenomics/datascience-workbook repo hosted on GitHub, contains source code of this workbook (rendered as GitHub Pages)
DATA Git can be used to manage and version control data files, such as CSV, JSON, or Excel files. This is particularly useful in data science projects where you might want to keep track of changes to your data sets over time. ISUgenomics/ideogram_db repo hosted on GitHub, contains database of chromosome bands data files
DOCUMENTATION Git can be used to manage documentation, such as project reports, user manuals, and technical specifications. ISUgenomics/2021_workshop_transcriptomics repo hosted on GitHub, contains transcriptomic analyses presented during workshop
CONFIGURATION FILES Git can be used to manage configuration files for servers, applications, and other types of systems. This allows you to keep track of changes to your configuration files and collaborate with other people on the configuration of a system. ohmyzsh/ohmyzsh repo hosted on GitHub, contains community-driven framework for managing your zsh configuration
BINARY FILES Git can also be used to manage and version control binary files, such as images, videos, and audio files. Tencent/tencent-ml-images repo hosted on GitHub, contains open-source multi-label image database
BOOKS Git can be used to manage book projects, academic publications and technical manuals. This helps in tracking changes, collaborating with multiple authors, and maintaining versions of the manuscript. christophM/christophm.github.io repo hosted on GitHub, contains a book "Interpretable Machine Learning" rendered via GitHub Pages