Version Control (Advanced)

In this unit we'll cover advances functionality of version control systems (VCS) on the example of git. We'll begin with a short recap of the previously learned basics and afterwards look into branches and how they can be best used to efficiently collaborate.

Lecture upshot

Branches allow working on multiple software features in parallel. If used effectively, your software only gains in fucntionality, and never regresses. Notably you'll always have a stable and presentable version of your product.

Git basics recap

The model

Git manages software versions using a graph model.

Every note in the graph is a commit, i.e. one snapshot version of the codebase.
Adding functionality corresponds to adding new commits at a leaf node, i.e. extending the graph.
The file system always shows contents based on a given commit. The commit you're currently working with is also called HEAD, i.e. where you are in the commit graph.
To switch the file system representation to another graph node, use the git checkout command.

Repositories

Every git repository contains a git graph.

A repository can be local, or on a remote server
Git servers are useful for sharing work, but git can be used without a server, e.g. a communal USB-drive can be used for sharing work.

Syncing commits

When synchronizing work between repositories, e.g. local and remote (server), you're effectively trying to combine graph models.

If the graph model differences are purely additional, the graph combining runs through automatically.
If diverging changes were made to the graph models, and the changes affect the same files, manual resolving of the conflict is required.

Creating branches

So far you've been working with a single branch, i.e. the graph was actually just a long line of commits:

---
title: "Single branch graph model"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "HEAD" type: HIGHLIGHT

That's actually a dangerous practice:

While you're working on a feature, your code might not at all be in a working state.
Tests could fail, or the program might not even compile
At once, it would be good practice to commit regularly...
... on the other hand you do not want regressing commits (with a broken software state) on your production line.

Here is where branches come to play:

Branches allow you to fork from any existing commit, and then developing in all tranquility on a specific feature.
If you have intermediate "broken" software states and commit that's ok: The new commits are on a separate branch and do not interfere with anyone else needing a proper main branch.
Example:
- While you're working on your halma controller, you might also work on something else, e.g. an alternative UI, or an AI robot player.
- An additional branch allows you to develop on one feature, while leaving the main "line" untouched.

Like everything else around git there's a command to create new branches:

To create and also place the HEAD on a new branch: git checkout -b user-interface
Let's take the command apart:
- checkout: go to a different commit
- -b (for branch): the commit is actually on a new branch
- user-interface: the name of your new branch
Note that new branches always spawn from your HEAD, i.e. the commit in the graph which you most recently selected.

Use the kebap convention for branch names

Branch names should be readable, but not contain spaces. Somehow everyone agreed on all lowercase words, separated by a hyphen. This notation is also known as kebap notation, paraphrasing the meat (or tofu) on a stick.

Note: There's also the option to create a branch, without directly placing the HEAD. In my experience this is usually not what you want, but fore completeness: git branch user-interface causes just the branch creation.

Branching from main

In most cases you'll want to develop new features, based on the most recent code version.
- In that case, the first step is always to verify that you're on main's most recent commit. You want your HEAD to be at the end of the main line.
- To be sure, you can start with git checkout main.
- In any case check your git status. You should see this:
```
  $ git status
  On branch main <--- This is good. :)
```

Next, you can safely create a new branch with git checkout -b user-interface

---
title: "Creating a branch from main"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    commit id: "HEAD" type: HIGHLIGHT

What's the effect?
- You can develop new functionality, based on the most recent commit.
- Whatever experimental progress you make on the user-interface branch, main stays stable.
- If you're approaching a deadline, et least with main you have something to submit that will somewhat work for a demo / submission - even if things go horribly wrong on the user-interface branch.

Branching from detached head

Sometimes you want to add new functionality based on an earlier code version
This translates to: you want to create a new branch, but you want it to be based on an earlier commit in the graph.
But careful, having a detached HEAD is usually not what you want. You'll effectively working with outdated code. Yet it can make sense, if...
- You suspect a severe bug in the most recent commit on main, and are pretty sure it will be reverted.
- The last commit introduced severe refactoring, and you cannot mentally deal with both refactoring and adding new functionality. But for some reason the functionality work must not be delayed.

What's a detached HEAD, again ?

"HEAD" is simply a metaphor for where you are in the graph (on which commit you're pointing). "Detached" means, you're not pointing to the most recent commit of the branch.

Creating a branch from an earlier commit would take place like this:

  $ git checkout cf48b16    <--- Here you intentionally detach the HEAD.
  $ git status
  HEAD detached at cf48b16  <--- This is usually not what you want. THINK TWICE :/
  $ git branch -b user-interface

Will result conceptually in this commit graph:

---
title: "Creating a branch from detached HEAD"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    branch "user-interface"
    checkout "main"
    commit id: "3"
    checkout "user-interface"
    commit id: "4"
    commit id: "HEAD" type: HIGHLIGHT

Branching from a branch

Sometimes you want to create a new feature, that needs another feature currently under development.
In that case you want to create a branch from another branch.
Just like before the trick is to first place the HEAD on the branch you want to deviate from.
```
git checkout user-interface
git checkout -b some-extra-feature
```

The above code can be represented as:

---
title: "Creating a branch from main"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    commit id: "5"
    branch "some-extra-feature"
    commit id: "HEAD" type: HIGHLIGHT

Logging branches

As you've already learned in the git basics, it's always good to inspect the graph regularly, before typing more git commands.

However, the standard git log only shows commits on the branch you're currently on.
- When working with branches, this gives you an incomplete, and sometimes misleading representation.
If you truly want to know what's going on, add some parameters:
- --all: to see all commits on all branches
- --decorate: print branch information
- --oneline: to represent each commit on a single line, and get a more compact visualization
- --graph: get a visualization of the full graph layout
Example:

# Setup repo
git init

# Create initial commit on main
echo "Keksli is cute" > keksli.txt
git add keksli.txt
git commit -m "initial commit"

# Add two commits to other branch
git checkout -b funny-branch
echo "We <3 Keksli" >> keksli.txt
git add keksli.txt
git commit -m "branch commit 1"
echo "We love him so so much" >> keksli.txt
git add keksli.txt
git commit -m "branch commit 2"

# Add another commit on main
git checkout main
echo "He is so cute" >> keksli.txt
git add keksli.txt
git commit -m "main commit 2"

git log --all --decorate --oneline --graph

What should a wholesome git log print ?

Answer: It should print a textual representation of the entire commit graph:

* 0f2fa81 (funny-branch) branch commit 2
* 2bd16e1 branch commit 1
| * 498c055 (HEAD -> main) main commit 2
|/
* 36415eb initial commit

A dog

The command is actually easy to remember ! Just think "A Dog!"

Image credits: Stackoverflow

Create an alias

If you find that line too long to remember, create an alias, with: git config --global alias.adog "log --all --decorate --oneline --graph". Then you can inspect the tree with git adog

Switching branches

You've previously already seen the checkout command, and used it to navigate your HEAD to earlier commits on the main branch.

For example, the below commands set the HEAD back to the most-recent commit:

git log --oneline
    02dd18a (HEAD -> main, origin/main, origin/HEAD) most recent commit
    c764ed0 some commit before
    cd1062f some commit even earlier before
    c5434da the first commit
git checkout c764ed0

The checkout command actually lets you navigate to any commit on the graph, it does not need to be a commit on the main graph.

So in the following example, you could move your HEAD to any commit of any branch, using the checkout command:

---
title: "Moving to another commit"
---
gitGraph:
    commit id: "1"
    commit id: "HEAD" type: HIGHLIGHT
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    commit id: "5"

git checkout 4

gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "HEAD" type: HIGHLIGHT
    commit id: "5"

Keeping the HEAD on your shoulders

Most of the time you want to move your HEAD to the end of a branch.
- Branch names are easier to remembers, than commit numbers. But also changing to a branch name gives you a guarantee to keep your HEAD attached.
- You can either just provide the branch name:
  git checkout user-interface
- But to avoid confusion with changing to a specific commit, there's also the switch command:
  git switch user-interface
  (Some prefer this. I have no preference. Use whatever you prefer.)
```
---
title: "Moving to the \"end\" of a branch"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    commit id: "HEAD" type: HIGHLIGHT
```

Extending branches

Commits are always added to the commit you're currently on.

Ideally your HEAD is attached to the end of a branch.

In that case you're just extending whatever branch you most recently checked out.

Example:

# Make sure to place HEAD at end of the user-interface branch
git checkout user-interface
echo "Some new file content" > some-new-file
git add some-new-file
git commit -m "Added some new file"
# Here we've added a new commit to the end of the user-interface branch.

If you are not at the end of a branch, e.g. because you placed HEAD on a previous commit, you're implicitly creating a new anonymous branch.

Avoid this! Create a new branch if you need one, but don't create unnamed extensions.
You can still turn them into "actual branches", but the intermediate state is confusing.

Example:

# Initial state: two commits, and HEAD is detached
git adog
  * a99d339 (main) added whiskers
  * 6a66966 (HEAD) initial commit
echo "..." >> keksli
git add keksli
git commit -m "added more ..."
  [detached HEAD 767f84c] added more ...
   1 file changed, 1 insertion(+)
git adog
  * 767f84c (HEAD) added more ...
  | * a99d339 (main) added whiskers
  |/
  * 6a66966 initial commit

Merging branches

So far you've seen how to create new branches, i.e. new lines of commits.
But whatever new functionality you implement, at the end of the day you want it back on the main, not on an isolated feature branch. That is:
- Once you've reached a stable state, you want fuse your new code (the new commits on your feature branch) back with the main branch.
- Especially you do not want to fuse code while it is in the making. Code must be in a stable state, and all your new code is compiling, properly tested and commented.
This process of bringing functionality back from one branch to another is called merging.

More precisely: Merging is the attempt to combine the work of different local branches.

The basic syntax is:

# Go to the branch that wants to receive the commits:
git checkout main
# Merge from the branch that has the commits to receive:
git merge user-interface

Illustration

---
title: "Merging branch into main"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    commit id: "5"
    checkout "main"
    merge "user-interface"
    commit id: "HEAD" type: HIGHLIGHT

Merge scenarios

Depending on the graph structure, merging can be easy and fully automated, or require some manual intervention:

If only the branch you're pulling from has new commits: The merge is straightforward, a new commit is created, combining the outcome of all merged commits.
If both branches have new commits...
- if they do not affect the same files: There is no conflict, a new commit is created, combining the outcome of all merged commits.
- if they do affect the same files, but not the same lines: There is no conflict, a new commit is created, combining the outcome of all merged commits.
- if they do affect the same files, and the same lines: There is a merge conflict. You have to manually tell which line version should win. Afterwards, a new commit is created.

This is why you want a clutter-free repo

Merge conflicts are manageable for code, but extremely tedious for binary or generated files, because there is no meaningful way to manually resolve them at line-level. That's why you do not want clutter in your repo. Keep unrelated files out of your repo, you do not want to waste your time with nonesensical merge conflicts!

Frequent merging main into feature

Merging branches tends the longer you wait.
- You wait longer, it becomes more likely that there are more commits on both branches, higher changes of changes in the same files, and in the same lines.
If developing your feature takes a while, you can regularly merge main into your branch, before you eventually merge back into main.
Multiple, finer grained merges are easier than massive, rare merges!
Here's an illustration of merging main into a feature branch:

---
title: "Merging branch into main"
---
gitGraph:
    commit id: "1"
    commit id: "2"
    commit id: "3"
    branch "user-interface"
    checkout "user-interface"
    commit id: "4"
    checkout "main"
    commit id: "5"
    checkout "user-interface"
    merge "main"
    checkout "main"
    commit id: "6"
    checkout "user-interface"
    commit id: "7"
    checkout "main"
    merge "user-interface"
    commit id: "HEAD" type: HIGHLIGHT

Frequent merging does not contradict frequent commits!

While you do want to wait with merging until your code is in a good state, you should keep frequent commits on the feature branch. Avoid mega-commits, will be a pain in the neck to merge.

Syncing branches across repositories

Unlike a dropbox, icloud, or google-drive, git repositories do not automatically synchronize.
- (Auto-syncing would be horrible, you would constantly fear the broken work-in-progress updates of peers !)
In the VCS Basics lecture you've already learned that commits must be actively pushed or pulled to and from a remote server.
- git commit only creates a local commit.
- The commit is not yet on the server. It remains strictly on your machine until you also push the commit to the server.
Branches work just the same!
- git checkout -b my-new-branch creates a new local branch.
- The branch is not yet on the server. It remains strictly on your machine until you push it to the server.

Sending branches to remote

Pushing a new branch to the server is just a longer push command:
git push --set-upstream origin my-new-branch

You don't need to remember that

It's not necessary to remember that special push command. If you attempt a normal push, git will remind you of the correct syntax.

Pulling branches to local

Often you want to work on a branch that's on the remote repository, but not yet on your local repo.
- This happens e.g. when a colleague created a new branch on their machine and later pushed the branch to the remote server.
Before you can use their branch, and contribute new commits, you need a copy of that branch in your repo.
Example:
- You start by looking up your local branches
```
# Look up local branches:
git branch -a 
  * main
```
- You know your colleague has started working on a branch user-interface, but it is not there.
- You do NOT want to create a new branch yourself, you want to receive the branch from your colleague!
There are two principal ways to retrieve branches from the server, and they are closely related but have a subtle difference

Fetch

In the easiest case, you just want to retrieve the new branches from the remote.
You can do so with git fetch Example:

You start by looking up your local branches:

    # Look up local branches:
    git branch -a 
    * main

The branch you're looking for is not there, so you use fetch to retrieve new commits from the server:

    # Get new commits from remote and add to local graph:
    git fetch
    # Then you look up local branches again:
    git branch -a 
    * main
    * user-interface
    # Great the new branch is there, let's switch:
    git checkout user-interface

Fetch does not auto-merge!

An important detail about fetch is that it does not attempt to merge, but only retrieves new commits.

The commits retrieved by fetch are always parked on extra branches, i.e. the new commits are not combined with whatever branches you might already have.
In the previous example that was not an issue, because you wanted to retrieve an entirely new branch.
But most of the time you retrieve new commits or branches, not for the joy of having them parked on unused local branches, but you also want to integrate them with your own new commits.
Unless you were interested in an entirely new branch, fetch is always followed by a merge, so you can actually integrate the new commits with your own progress.

Pull

In the first VCS lecture and lab you've already briefly seen the pull command.
pull is actually not new, but just the combination of fetch + merge
I rarely use fetch. New commits are not much good unless also merged with your own progress.

Pull vs Fetch

Pull and fetch both retrieve commits from remote, but pull implicitly also attempts to merge new commits (tries to combine and move forward your HEAD in the current branch), while fetch only retrieves the commit without moving your HEAD. In most cases you want to pull.

Removing branches

At some points branches have served their purpose.
E.g. if a feature is finalized and the branch has been merged into master, you do not need the branch any more.
It's a good practice to also delete the branch in that case, so it does not distract you when you inspect your repo with git branch -a

One does not imply the other

Removing a local branch does not imply remote branch deletion.
Neither does removing a remote branch imply local branch deletion.

Local

To remove a local branch, just use either of:
- git branch --delete user-interface
- git branch -d user-interface
To prevent you from accidentally deleting unsafed work, git won't let you delete unmerged branches. You can force your way with either of:
- git branch --delete --force user-interface
- git branch -D user-interface

Remote

Removing a branch from a remote repo requires pushing your graph changes.
The corresponding command is: git push -d origin user-interface

Rebase

In general, you want developers to commit regularly, and for fine-grained progress.
- This gives you more freedom ro backtrack in the event of issues.
However, many people working on the same git graph, and pushing to many branches at once leads to "diamonds" in the graph history:

The reason is that deleting a branch does not actually delete its commits, only the branch name associated with. That being said, every merged branch leaves another diamond behind, no matter if the branch was subsequently deleted or not.

It would be nicer to magically have branches that pick up from one specific commit on main and immediately afterwards push their (finalized) progress back to main.
- But that's almost impossible, because developing new features takes time
- The bigger the team the more new content will be constantly added to main

Illustration of main evolving, throughout development of user-interface:

---
title: main evolving in parallel to user-interface
---
gitGraph
    commit id: "1"
    commit id: "2"
    branch user-interface
    checkout user-interface
    commit id: "3"
    commit id: "4"
    commit id: "5" type: HIGHLIGHT
    checkout main
    commit id: "6"

Rebase is a trick, to belatedly modify the commit graph, so it pretends commits on a branch to have occurred sequentially rather than in parallel.

In the above example, the graph is changed to resemble:

---
title: Commit graph after rebase
---
gitGraph
    commit id: "1"
    commit id: "2"
    commit id: "6"
    branch user-interface
    checkout user-interface
    commit id: "3R"
    commit id: "4R"
    commit id: "5R" type: HIGHLIGHT
    checkout main

Syntax

To make commits of a feature branch rebase from the most recent main commit (see illustration above):

    # place the HEAD on the user-interface branch
    git checkout user-interface
    # make the commits on user-interface branch rebase from the most recent main commit
    git rebase main

Most likely you then what to bring your changes back to main (without diamonds !)

You just rebased, to you are guaranteed to have no merge conflicts

    # go back to the main branch
    git checkout main
    # merge the rebased feature branch into main
    git merge user-interface

Rebase internals

How does rebase actually work?

The idea of relocating commits is somewhat contrived, as the very idea of a commit is to define a certain position the commit graph.
Rebasing does not actually move commits but rather:
Hide the original commits of a given series, e.g. a branch.
Replay each of the original commits, i.e.:
1. Create a copy somewhere else in the graph
2. Imitate the exact same file modifications as in the original commit
3. Issue a new merge conflict for any newly encountered issue
Optional: Remove the hidden commits

Cherry picking

Merge attempts to combine the work of all commits on a different branch until the specified commit.
In certain situations that is too coarse grained:
Imagine some colleague working on a feature branch also fixed a little bug that is relevant to you.
Then you do not necessarily merge all of their changes, just to get that little bugfix.
The cherry-pick command allows you to combine work more selectively.
More precisely, cherrypicking (another commit being the cherry) allows you to merge with one specific commit, regardless of what happened before on that branch.
Example:

# trace the commits of all branches
git adog
  * 9ff78e3 (HEAD -> main) some-recent-commit
  | * 89afd4f (some-other-branch) commit-to-cherrypick
  | * c081b9e some-non-relevant-commit
  |/
  * 151c4c6 first-commit
# just making sure the HEAD is attached to main
git checkout main
# go get that commit from another branch!
git cherry-pick 89afd4f

Conclusive thoughts

Q: I have a git button in my IDE, why should I bother learning the commands.
A: First off, feel free to use whatever tool works best for you. But here's my take at gui tools for novices:

"git is complex, and you want to think twice before any action. Graphical tools make things faster, but easily lead to a "Try and error" approach, a bit like: "I'll just click this and that button and hope it "works". (Without any understanding of what \"it works\" actually means or what exactly to expect)". That's not how you gain proficiency with any complex technology, and that not what you'll be hired for in a company. You do not learn by copy-pasting from ChatGPT, and no one will pay an engineer who can only copy past from ChatGPT. Actual understanding however is values. But learning takes time and practice. If you type out the git commands yourself, you have at least more time to THINK about what they serve for. When you feel comfortable with the commands, of course go ahead and try new tools."

Literature

Inspiration and further reads for the curious minds: