DataScience Workbook / 09. Project Management / 1. Introduction to Project Management


Project management involves a variety of tasks, that help to plan, organize, and manage the resources needed to successfully complete a project. The specific activities involved will vary depending on the nature of the project, the size and complexity of the organization, and other factors. By nformed and targeted managing different aspects of a project, organizations can increase the likelihood of project success and minimize the risk of delays, cost overruns, or other problems.

There are a few key sections of project management:

    This involves defining the project scope, goals, and objectives, creating a project timeline, and determining the resources required to complete the project. This helps to ensure that the project stays on track, that resources are allocated effectively, and that the project is completed on time and within budget.

    This involves allocating the people, materials, and equipment needed to complete the project, as well as managing and tracking the use of these resources.

    This involves organizing and storing the data collected during a scientific project, and ensuring that the data is accessible, secure, and well-documented. This is essential for ensuring the reproducibility of scientific results and for maintaining the credibility of the research.

    This is a critical aspect of scientific project management, as it allows scientists to keep track of the changes made to their code or data over time. It helps to ensure that scientists can collaborate effectively, keep their work organized, and maintain the integrity of their research.

    Scientific projects often involve multiple researchers and institutions, and effective collaboration and communication are key to ensuring the success of the project. This includes using tools such as shared data repositories, wikis, and project management software to keep everyone informed and up-to-date on the project’s progress.

    This involves ensuring that the data, code, and methods used in a scientific project meet the highest standards for quality and accuracy. This includes regular code and data reviews, audits, and other quality control measures, which help to minimize the risk of errors and improve the overall quality of the research.

    This involves finalizing the project and making the data and results accessible to the wider scientific community. This includes publishing the results in peer-reviewed journals or online repositories, ensuring that the data is properly documented, and making it available for long-term access and use. This helps to promote transparency, encourage collaboration, and ensure that the results of the project are accessible to future generations of scientists.

Version Control

Version control is a system for managing changes to software projects, documents, or any other set of files. It provides a history of changes, so you can track who made changes and when, and it allows multiple people to collaborate on a project while ensuring that changes are properly managed and coordinated.

In general, Git ⤴ is considered to be the most popular version control system today, due to its wide adoption, powerful features, and active community of users. There are a few other version control tools available, including Subversion (SVN) ⤴ and Mercurial ⤴, that are used for managing code and data in software development and other projects.


Git ⤴ is a distributed version control system, which means that each user has a full copy of the code repository on their own computer, rather than relying on a central repository. Git is widely used by software developers and organizations. This makes it ideal for teams working on large projects or for individuals who need to work offline. It was created by Linus Torvalds in 2005, and it has since become one of the most popular version control systems in use today. Git is known for its speed, reliability, and flexibility, and it is used by organizations of all sizes, from small open-source projects to large multinational corporations.

If you are planning to contribute to any GIT-based repository, it is worthwhile for you to learn more about the useful options offered by git. In this case, use the hands-on tutorials that provide instructions for more advanced git commands:

In the next section: Types of GIT-based repositories, you will learn which popular online platforms hosting repositories use Git-based version control.

Subversion (SVN)

Subversion (SVN) ⤴ is a centralized version control system, which means that all data is stored in a central repository. It is designed to be easy to use and provides a number of features that make it a popular choice for many teams, including:

  • version history,
  • branching and merging,
  • and easy collaboration between multiple users.

Subversion is also known for its stability, scalability, and compatibility with a wide range of platforms and tools.


Mercurial ⤴ s a distributed version control system that is similar to Git. It was created in 2005, and it is widely used by software developers and organizations, especially those who host their projects at Bitbucket ⤴. Mercurial is designed to be fast and lightweight, and provides a number of features for managing complex development workflows, such as branching, merging, and issue tracking. It is also highly customizable and can be extended through the use of plugins and extensions. Mercurial is known for its simplicity, performance, and ease of use.

Types of GIT-based repositories

Git ⤴ can be used to manage and version control any type of resource that is stored in a file and changes over time, whether it’s code, data, documentation, or something else. The key benefit of using Git is that it allows you to keep track of changes over time & revert back to previous versions if needed, and collaborate with others on a project.

Git can be used to manage and version control a wide variety of online resources, including:

  • CODE
    Git is primarily used for version control of software code and applications. This includes source code written in a variety of programming languages such as Python, Java, C++, etc.
    Example: ISUgenomics/data_wrangling ⤴ repo hosted on GitHub, contains mini python apps for common tasks in data processing

    Git can be used to manage changes to HTML, CSS, and JavaScript files that make up a website or web application.
    Example: ISUgenomics/datascience-workbook ⤴ repo hosted on GitHub, contains source code of this workbook (rendered as GitHub Pages)

  • DATA
    Git can be used to manage and version control data files, such as CSV, JSON, or Excel files. This is particularly useful in data science projects where you might want to keep track of changes to your data sets over time.
    Example: ISUgenomics/ideogram_db ⤴ repo hosted on GitHub, contains database of chromosome bands data files

    Git can be used to manage documentation, such as project reports, user manuals, and technical specifications.
    Example: ISUgenomics/2021_workshop_transcriptomics ⤴ repo hosted on GitHub, contains transcriptomic analyses presented during workshop

    Git can be used to manage configuration files for servers, applications, and other types of systems. This allows you to keep track of changes to your configuration files and collaborate with other people on the configuration of a system.
    Example: ohmyzsh/ohmyzsh ⤴ repo hosted on GitHub, contains community-driven framework for managing your zsh configuration

    Git can also be used to manage and version control binary files, such as images, videos, and audio files.
    Example: Tencent/tencent-ml-images ⤴ repo hosted on GitHub, contains open-source multi-label image database

    Example: christophM/ ⤴ repo hosted on GitHub, contains a book “Interpretable Machine Learning” rendered via GitHub Pages ⤴

Online hosting of Git repos

Git repositories do NOT have to be hosted on online platforms. Git is a distributed version control system, which means that a Git repository can be stored and managed locally on a single machine or on a network of machines.

However, hosting Git repositories on an online platform such as GitHub ⤴, GitLab ⤴, or Bitbucket ⤴ provides several benefits. For example, hosting your repository on an online platform makes it easier to collaborate with others, as you can give other people access to your repository and work on it together. Additionally, online platforms provide a backup of your repository in case your local machine fails, which helps to ensure that your content is safe and secure.

Each platform provides its own set of features and tools for working with Git, and the choice of platform often depends on the specific needs of your project and organization. For example, GitHub and BitBucket are popular choices for open-source projects, while GitLab and VSTS are more commonly used for enterprise-level projects.

Not all repositories in your account need to be public. It depends on the online platform you use and your account type.
For example, in GitHub, you have the option to make repositories public or private. If you have a free account, you can create unlimited public repositories, but you are limited to a certain number of private repositories. If you have a paid account, you can create unlimited private repositories as well.
Similarly, in GitLab and Bitbucket, you have the option to make repositories public or private. In these platforms, private repositories are usually only available with paid accounts.


GitHub ⤴ is a web-based platform that provides hosting for Git repositories. It allows you to create public or private repositories, collaborate with other people on a project, and track changes over time. GitHub is widely used by developers and organizations to manage their code and is one of the most popular Git hosting services available.

Learn more from the hands-on tutorial Introduction to GitHub ⤴, provided in this section of the workbook.


Bitbucket ⤴ is a code hosting service that supports multiple version control systems, including both Git and Mercurial. It provides a range of tools for managing teams and projects, including permissions management, role-based access control, support for large files & binary files, and collaboration features. Bitbucket is part of the Atlassian suite of tools, and it integrates well with other Atlassian products such as Jira ⤴ (issue tracking tool) and Confluence ⤴ (collaboration and documentation platform).

Learn more from the hands-on tutorial Introduction to BitBucket ⤴, provided in this section of the workbook.


GitLab ⤴ is another web-based platform for hosting Git repositories. It provides similar features to GitHub and is popular for its integration with Continuous Integration/Continuous Deployment (CI/CD) pipelines.


SourceForge ⤴ is a web-based platform that provides hosting for Git and other types of version control systems. It is popular among open-source projects and provides a variety of tools for project management and collaboration, with a focus on community support. SourceForge supports a wide range of programming languages, including C/C++, Java, Python, and more.


GitKraken ⤴ is a Git client that provides a graphical user interface for working with Git repositories. It integrates with a variety of Git hosting services and provides features for collaboration, code review, and project management.

Visual Studio Team Services

Visual Studio Team Services (VSTS) ⤴ is a cloud-based development platform provided by Microsoft. It provides Git version control, continuous integration, and a variety of other tools for software development teams.

AWS CodeCommit

AWS CodeCommit ⤴ is a Git-based source control service that is part of the Amazon Web Services (AWS) cloud platform. It provides a secure, scalable, and managed solution for version control that integrates with other AWS services.

Further Reading

Homepage Section Index Next top of page