Version Control (Basics)

In this unit we'll cover the basic of version control systems (VCS) on the example of git. We'll start by a brief motivation on why traditional file storage solutions are inadequate for managing code, illustrate common scenarios for managing code with a VCS and cover how to safely naviguate these basic situations with git. The lecture concludes with a short cheat-sheet recapitulation of commands for the most standard situations.

Lecture upshot

Version Control Systems (VCS) allow multiple developers to simultaneously work on the same codebase in a coordinated and conflict-free manner. Git is the de-facto standard VCS and uses a graph-model to maintain code versions.

Motivation

Who do we not just place our code in a google drive / dropbox / etc ?

Software development is a highly collaborative discipline.
- Most corporate software projects have more than one developer, constantly developing on the same code.
Computer programs are densely coupled
- Each occurance of a variable or function must be correctly spelled. A single change in one place easily breaks the entire program
- Human language is a lot more robust, it is ok for several people to edit a google docs in multiple places at the same time.
So it is important that simultaneous edits of multiple developers to not interfere.

Can you think of a simple solution to prevent inconsistent code ?

Locking: You can set up a mechanism that allows only one developer at a time to edit the codebase. If there's no concurrence, and only one code instance there can be no consistency conflicts. However, for practical reasons this approach is no longer acceptable.

VCS Repositories

All version tracking systems need a repository, that is: a database to track the historic evolution of code. Every tracked state is also called a "commit".

Progress is tracked as a series of states
Work can be easily reverted to earlier states
- This is especially useful to revert back to a stable state for software demos.

However, there are two repository setups: Central and decentralized repositories

Central repository

Earlier VCSs like CVS (Concurrent Version System) and Subversion set on a central repository.

There is exactly one server with all historic code versions (repository)
Developers have a local workspace with project files and exchange their modifications over the server
- Developers implicitly send their work to the server, when they create a new commit
- Developers do not have a local copy of all historic code versions

Decentralized repositories

The nowadays de-facto standard VCS git sets on decentralized repositories.

There may, or may not be a server. Even a USB drive can be used to exchange commits
All developers hold a full copy of all history code versions (repository)
- Developers do not implicitly send their work anywhere, when they create a new commit

Commit vs Push

Note how the exchange between repositories only shows push/pull instead of commit. In git, commits refer to snapshots (like a photo of code at a given moment in time), but they are not exchanged until a developer decides to push then to a server or pull them from another repository.

Git's graph model

Git internally sets on a graph model, where the commits define nodes.

Each commit, or node represents one version of your code
The latest commit, also referred to as HEAD, is the most recent version tracked by git
Think of HEAD like a pointer to the most recent commit, or graph node

Navigating the graph

You easily navigate the graph, that is switch the visible content on disk to that of another commits

When you switch to another commit, it is as if you set a pointer in the graph to an earlier commit.
This does not change the graph structure.
- Nothing is lost when you switch between commits.
- You can go backward, or forward. The code you then see in your file-system matches the state of the corresponding commit. But you can at any time go back to where you were before in the graph.
Switching the pointer from one commit to another is called to checkout a commit.

Commits are backups

You can perfectly use git just for your personal backups. Your local git repository does not need a server to function. While working you your project, you can regularly commit to build a linear trace of your project history. If you later need to revert, you can simply checkout to an earlier commit, to load a backup.

How does it work ?

Every git repository has a hidden folder .git
- Git uses it to store the graph structure an information about all commits
It's best not to ever manually interfere with the contents of the .git folder
Especially do not remove or rename the .git folder

Local graph extension

When a developer has coded something new, e.g. a function, they can create a new commit and add it to the graph.

Over time, as the project advances, git's graph continuously grows
In the simples case the graph is entirely linear, i.e. with every commit, the line gets a bit longer:

Pushing

Commits are always created within the scope of one repo, and are not automatically sent to other repos, i.e. the server.

When a developer extended their local repository graph by a new commit, they still need to push the new node to the server's graph
In detail, the process is:
1. A developer is synced up with the server
2. They then add some new functionality, i.e. create a new local commit
3. Finally, they share their commit, by pushing it back to the server

Branches

It is also possible to deliberately create multiple versions of your software, e.g. to work on independent functionality, and afterwards reunite the work. This can happen entirely locally, and in called branching and merge ing.

In this case the graph is no longer a linear alignment of commits, but becomes an actual graph:

Branches are an advanced git concept, which we will delve into in a future lecture. For now it is sufficient to retain that gits graph is build out of commits as nodes, and can be more complex than a linear sequence.

Git commands

Whether you want to initialize a project, create a commit, push work with a server or checkout to an earlier commit, the default way of interacting with git is via commands.

It is important to understand the underlying graph model, and think what a git command does before you type it.
Inexperienced users often just try out memorized commands, without understanding what they do.
- Often this causes a corrupted repository state and complicated error messages
- The naive fix is then often to copy paste the code somewhere else, delete the project, and download a new copy

Image credits: XKCD

Warning

There are many graphical tools to interact with git. It is ok to use these tools, after you have become proficient with the command line. Git is complex and you need a sound understanding of how it works.

Initializing a project

Any folder can be turned into a git repository.

While git is usually used for software projects, we can use git for any project

Let's imagine I want to use git as backup system for my personal love poems

So I have a folder MyLovePoems, and a single file keksli.txt inside:
```
MyLovePoems
└── keksli.txt
```

The poem is a short and sentimental piece of art:

Roses are red
Violets are blue
My dearest Keksli
I love you          ,_     _
                    |\\_,-~/
                    / _  _ |    ,--.
                   (  @  @ )   / ,-'
                    \  _T_/-._( (
                    /         `. \
                   |         _  \ |
                    \ \ ,  /      |
                     || |-_\__   /
                    ((_/`(____,-'

Credits: asciiart.eu

Obviously I am regularly moved to tears by the emotional intimacy of my poem. So I want to make sure I never lose it, when I experiment with new endings and sequels.

Therefore, I decided to use git to keep track of my poem's evolution !
This is as simple as going inside my project folder and typing: git init

The command creates a new hidden folder .git:

MyLovePoems
├── .git
│   ├── HEAD
│   ├── config
│   ├── description
│   ├── hooks
│   │   ├── applypatch-msg.sample
│   │   ├── commit-msg.sample
│   │   ├── fsmonitor-watchman.sample
│   │   ├── post-update.sample
│   │   ├── pre-applypatch.sample
│   │   ├── pre-commit.sample
│   │   ├── pre-merge-commit.sample
│   │   ├── pre-push.sample
│   │   ├── pre-rebase.sample
│   │   ├── pre-receive.sample
│   │   ├── prepare-commit-msg.sample
│   │   ├── push-to-checkout.sample
│   │   └── update.sample
│   ├── info
│   │   └── exclude
│   ├── objects
│   │   ├── info
│   │   └── pack
│   └── refs
│       ├── heads
│       └── tags
└── poem.txt

Never touch the .git folder

Git stores all it's data in this directory, notably the complete commit graph, and the "pointer" to the node we're currently working with. You can delete the entire .git folder, if you are sure you no longer need version tracking. Otherwise, hands off the .git folder!

Note that git init only puts me in the position to begin tracking versions. That means:

The git graph is still empty
My poem is not yet protected
We can verify this by asking git to list the latest nodes in the commit graph:
```
$ git log
fatal: your current branch 'main' does not have any commits yet
```

First time setup

Before we continue using git, we should add a username and email.
Every contribution to a project is done by a person, so let's tell git who we are and how to reach us:
- git config --global user.name "Maximilian Schiedermeier"
- git config --global user.email "schiedermeier.maximilian@uqam.ca"

The --global flag tells git to store the settings system-wide, so we don't need to repeat these steps for every new repository.

Commit

Commits are the nodes in our git graph. Creating a new commit takes three steps:

Change something on the file system. This can be...
- Changing the contents of a file
- Adding a new file
- Deleting a file
Telling git which files to consider for the next commit:
- The graph is your ticket to navigate the code history
- The more fine-grained your commits, the more precisely you can navigate
Creating the commit
- Actually create a node in the graph
- Contain all considered files (and ignore all the others)
- Add a short descriptive comment

Tip

The git status command is a useful helper throughout the process. The command provides you helpful information for step 1 and 2: Which files have changes, and which files are tracked by git.

We'll now illustrate the process on two scenarios.

Adding a new file

To create out first commit, we want to add our existing poem to a new node in the graph.

Let's start by a lookup, to see which files are around, and which files are tracked:
- git status
```
On branch main

No commits yet

Untracked files:
  (use "git add <file>..." to include in what will be committed)
    poem.txt
```
- Interpretation: git tells us there is a file poem.txt that is not tracked, i.e. is not yet contained anywhere in the graph.
Next we want to tell git, that we wish to consider this file for the next commit.
- git add poem.txt (git actually already helped us with that command)
- Let's check again what has changed: git status
```
On branch main

No commits yet

Changes to be committed:
 (use "git rm --cached <file>..." to unstage)
   new file:   poem.txt
```
- Interpretation: All good, git tells us the next commit will contain our poem.txt
Finally, we want to actually create our commit:
- git commit -m "First version of my poem"
```
[main (root-commit) 3fc2807] First poem version
 1 file changed, 14 insertions(+)
 create mode 100644 poem.txt
```
- Interpretation: A first commit has been added to the graph
- (The -m arguments allows us to add a comment for the commit, so later it is easier to remember what we contributed.)

Always a good idea to check with git log if our first commit appears in the graph history:

commit 3fc28074200a8503409234f60c1b0ad30fce4d4f (HEAD -> main)
Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca>
Date:   Sat Aug 24 07:56:19 2024 -0400

    First poem version

Looks good, we've created a first commit, with id 3fc2807

Checkums

Commit IDs are actually not random strings, but SHA-1 checksums. That is, they provide a direct integrity check for the contents of a commit.

Changing a file.

Most of the time we're working on existing files, e.g. to change a line of code.

Let's try to change something in our poem and create a new commit with the changes.

First, I'll make the poem a bit more cat appropriate:

Salmons are red
Tuna is blue
My dearest Keksli
I love you          ,_     _
                    |\\_,-~/
                    / _  _ |    ,--.
                   (  @  @ )   / ,-'
                    \  _T_/-._( (
                    /         `. \
                   |         _  \ |
                    \ \ ,  /      |
                     || |-_\__   /
                    ((_/`(____,-'

Once more we can ask git what it makes of these changes with git status

On branch main
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
    modified:   poem.txt

no changes added to commit (use "git add" and/or "git commit -a")

Interpretation: git detected that we modified a file that is already in the graph

Unless we tell git so, the changes to poem.txt won't be included in the next commit.
- Just like before, we tell git to consider the poem file:
- git add poem.txt
- git status
```
On branch main
Changes to be committed:
  (use "git restore --staged <file>..." to unstage)
    modified:   poem.txt
```
- Just like before, we still need to actually create the new commit:
- git commit -m "Improved poem for cat context
```
[main 2cdfc6f] Improved poem for cat context
 1 file changed, 2 insertions(+), 2 deletions(-)
```

Finally, we can check if our new commit appears in the log:

git log --graph

* commit 2cdfc6ff084400b42ebe2b93f0fe282b313b73ad (HEAD -> main)
| Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca>
| Date:   Sat Aug 24 10:22:41 2024 -0400
|         
|     Improved poem for cat context
|         
* commit 3fc28074200a8503409234f60c1b0ad30fce4d4f
  Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca>
  Date:   Sat Aug 24 07:56:19 2024 -0400

     First poem version

Tip

Use git log with the --graph argument to get a textual visualization of the git graph.

Amazing, we've successfully extended our commit graph! It now looks like this:

Message typos

Sometimes we make typos in our commit messages.

That's pretty easy to fix, with:

git commit --amend -m "New commit message"

Checkout

When we extended git's commit graph with commit, we implicitly advanced to pointer, i.e. we already navigated the graph.

The checkout command allows you to navigate the graph without modifying it.

git checkout 3fc2807

Note: switching to '3fc2807'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

git switch -c <new-branch-name>

Or undo this operation with:

git switch -

Turn off this advice by setting config variable advice.detachedHead to false

Interpretation: It worked, we are now at the previous commit in the graph. The warning means that we are not at the end of a "line", or `HEAD.

Nonetheless, the content of poem.txt is back to our first "backup"

Roses are red
Violets are blue
My dearest Keksli
I love you          ,_     _
                    |\\_,-~/
                    / _  _ |    ,--.
                   (  @  @ )   / ,-'
                    \  _T_/-._( (
                    /         `. \
                   |         _  \ |
                    \ \ ,  /      |
                     || |-_\__   /
                    ((_/`(____,-'

We also use checkout to get back to the end of the main line: git checkout main

Moving to the end of a line

Always use the name of the line to move to the end. Intuitively we could think that using the commit ID of the last commit in line does the same, but git will only re-attach HEAD when we specify the branch name, e.g. main. So in our case git checkout 2cdfc6f is not the same as git checkout main.

Detached commits

Detaches commits are nodes that do not belong to any line (branch).

HEAD paraphrases to "the latest commit on a line".
Detached head means, the code we are viewing (checked out), is not at the end of a line.
- Illustration: the square is the commit we've checked out. HEAD is the end of the line
Previously we've seen that the git commit command adds a new commit to the end of a line.
This is why git keeps telling us we're off track, when we create commits while in detached HEAD:

$ echo "We love Keksli. <3 <3 <3" >> poem.txt
$ git add poem.txt
$ git commit -m "Commit in detached HEAD state"
[detached HEAD 3fc2807] Trying to commit in detached HEAD state
 1 file changed, 1 insertion(+)

Commits made in detached head are not connected to any "line" (branch), and git does not know what to do with them.
- Especially git does not allow us to abandon them (which is a good thing, we might lose important work!)
- While in detached head we cannot move around, i.e. get back to HEAD.

For now: Do not commit in detached HEAD

We'll soon see how to properly create "branches" to create commits on "new lines".

Corrections

Sometimes it happens that we're a bit to hasty and make a little mistake

Undo git add

Let's assume you just used git add secret-poem.txt and told git to include a file for the next commit.
But you made mistake and I actually did not want to include the file.
Then you can just remove the file again with: git reset secret-poem.txt

Heads up

This only works if you have not yet finalized the next commit.

Undo git commit

Let's assume you just used git to add and commit the wrong files
Then you can use: git reset HEAD~
The tilde (~) tells git to remove the last commit from the graph, but keep the files of the commit untouched.

What's the different to a checkout of the previous commit?

Checkout sets the graph pointer to the previous commit, but does not modify the graph. Reset removes the last commit from the graph.

Gitlab

Previously we've learned that commits can be sent and retrieved from repository on a git server.
Your university has a git server running for you: Gitlab
A bit pedantic, but be precise with the terminology:
- GitLab is not git. GitLab is a commercial server software, git is a command-line tool to manage the commit graph !
- GitLab is not the same as a single online git repository. Many different repositories are maintained on the same GitLab server instance.

Online repos

Git servers like GitLab have two main purposes:

Provide a backup of your work
Serve as synchronization hubs for work in teams

We'll cover the work in teams in an extra lecture, for now let's look at how to use a git server as backup system.

For setup, there are two main scenarios:

You already have a local git repository, that is, you already ran git init on your computer, and have some code.
- The goal is to create an online repository that mirrors the content and graph of your local repository.
You do not yet have a local git repository, that is you have not yet run git init on your computer, and you have no code.
- The goal is to create a local copy (or clone) of a git repository, made available on the server.

Setting up a new online repo

Let's try to bring a copy of my poems, along with the entire git commit graph to the server.

Use your UQAM credentials to log into your personal GitLab account
Afterwards, you can create your own online repository:
- Click the + sign in the top left
- Select New Project / Repository
- Select Create blank project
- Give it a name, e.g. MyPoems
- Give it a namespace, usually just your username
- Select a visibility, e.g. Private, so only admitted users will have access to your code
- Uncheck the create README box (we'll handle that later)

At this point the repository is created, but empty. We'll now want to locally tell git about the empty online repository and send code and graph to the server:

Tell git about the online repository:

After creation, the GitLab page shows a blue Code button.
Click it and copy the first line to your clipboard. It will be something like: git@gitlab.info.uqam.ca:max/MyPoems.git
Afterwards, use the command line to tell git about this repository:
```
git remote add origin git@gitlab.info.uqam.ca:max/MyPoems.git
```

So far we've only told git that the repository exists. Now we have to actually bring our code and the git commit graph to the server:

$ git push --set-upstream origin main
Enumerating objects: 6, done.
Counting objects: 100% (6/6), done.
Delta compression using up to 8 threads
Compressing objects: 100% (4/4), done.
Writing objects: 100% (6/6), 624 bytes | 624.00 KiB/s, done.
Total 6 (delta 2), reused 0 (delta 0), pack-reused 0
To gitlab.info.uqam.ca:max/MyPoems.git
* [new branch]      main -> main
  branch 'main' set up to track 'origin/main'.
  schieder@imac:MyPoems $

Tip

The --set upstream origin main addemdum is only needed the first time. From here on our local git repository knows about the associated server. New local commits can be sent to the server with a simple git push.

Sending a new commit to the server

You only need to set up the server once.
If I make further changes to my poem, e.g.: echo "<3 <3 <3 <3 <3 <3" > poem.txt

I can send follow-up commits to the server with a simple push:

$ cat poem.txt
Salmons are red
Tuna is blue
My dearest Keksli
I love you          ,_     _
                    |\\_,-~/
                    / _  _ |    ,--.
                   (  @  @ )   / ,-'
                    \  _T_/-._( (
                    /         `. \
                   |         _  \ |
                    \ \ ,  /      |
                     || |-_\__   /
                    ((_/`(____,-'
<3 <3 <3 <3 <3 <3
$ git add poem.txt
$ git commit -m "Added more hearts"
[main f411bbd] Added more hearts
 1 file changed, 1 insertion(+)
$ git push
Enumerating objects: 5, done.
Counting objects: 100% (5/5), done.
Delta compression using up to 8 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (3/3), 292 bytes | 292.00 KiB/s, done.
Total 3 (delta 1), reused 0 (delta 0), pack-reused 0
To gitlab.info.uqam.ca:max/MyPoems.git
   5e2bd29..f411bbd  main -> main

Cloning an existing repo

Cloning a repo is the inverse of setting up an online repo. You clone, when:

The online repository exists, and...
You want a local repository copy of the code and git commit graph

It is either init or clone, but never both combined. Why?

Git init creates an entirely new graph, clone tells to make a linked copy. It is not possible to create a graph that is at one entirely new, and a copy of an existing graph.

To make a local repository clone of an existing online repository:

Visit the project page, e.g. the MyPoems project on GitLab
Click the blue Code button and copy one of the URLs

Open a terminal and use git clone with the URL:

$ git clone git@gitlab.info.uqam.ca:max/MyPoems.git
Cloning into 'MyPoems'...
remote: Enumerating objects: 9, done.
remote: Counting objects: 100% (9/9), done.
remote: Compressing objects: 100% (6/6), done.
remote: Total 9 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0)
Receiving objects: 100% (9/9), done.
Resolving deltas: 100% (3/3), done.

No need to set server

When you cloned a repo, git already knows about the server (makes sense, you cloned it from the server). Therefore, when you made new commits, you can direclty use git push, without any further configuration.

Getting more commits form the server

Likely the code on server side evolves, and new commits are added on server side.

If your own local graph has not evolved since you cloned, you can sync up with git pull.

Good practices

Git is a complex tool, so it is important to respect some good practices:

THINK about what you want to do in the graph, before you type
Commit often (commits are checkpoints)
- Commit fine-grained: avoid git add * or git add . Only add the specific files you are interested in.
Write meaningful comments
- Do not skip the commit message dialogue, to not name your commits toto, Another commit, ...
Don't clutter the repo
- Git can only make sense of textual files
- Do not put non-textual files (aka binaries) into the repo, notably:
  - PDFs
  - PNGs / JPGs
- Do not put generated files into the repo, norably:
  - Compiled code / class files
  - JAR files
  - Maven's target folder
- Do not put irrelevant files into your project:
  - MacOS tends to create .DS_Store files everywhere
  - VIM creates cache files
  - ...
- The less trash you have in your repo, the better

Gitignore

Adding files by accident is too easy, luckily there is a protective mechanism.
You can tell git about file types, i.e. file endings that should never be part of the repo.
It is as simple as adding a file: .gitignore
Example:
- If the .gitignore file contains a line *.class, git will not let you add compiled java classes to a commit.

Use a .gitignore generator

Most of the time it's not obvious which binary or generated files appear in a project, but you know which technologies you will be workgin with, e.g. "MacOS, VIM, Java". You can use generators, like Toptal, to create a holistic .gitignore file, tailored for the technologies you are working with.

Literature

Inspiration and further reads for the curious minds: