Version Control (Advanced)
In this unit we'll cover advances functionality of version control systems (VCS) on the example of git
.
We'll begin with a short recap of the previously learned basics and afterwards look into branches and how they can be
best used to efficiently collaborate.
Lecture upshot
Branches allow working on multiple software features in parallel. If used effectively, your software only gains in fucntionality, and never regresses. Notably you'll always have a stable and presentable version of your product.
Git basics recap
The model
Git manages software versions using a graph model.
- Every note in the graph is a commit, i.e. one snapshot version of the codebase.
- Adding functionality corresponds to adding new commits at a leaf node, i.e. extending the graph.
- The file system always shows contents based on a given commit. The commit you're currently working with is also
called
HEAD
, i.e. where you are in the commit graph. - To switch the file system representation to another graph node, use the
git checkout
command.
Repositories
Every git repository contains a git graph.
- A repository can be local, or on a remote server
- Git servers are useful for sharing work, but git can be used without a server, e.g. a communal USB-drive can be used for sharing work.
Syncing commits
When synchronizing work between repositories, e.g. local and remote (server), you're effectively trying to combine graph models.
- If the graph model differences are purely additional, the graph combining runs through automatically.
- If diverging changes were made to the graph models, and the changes affect the same files, manual resolving of the conflict is required.
Creating branches
So far you've been working with a single branch, i.e. the graph was actually just a long line of commits:
---
title: "Single branch graph model"
---
gitGraph:
commit id: "1"
commit id: "2"
commit id: "HEAD" type: HIGHLIGHT
That's actually a dangerous practice:
- While you're working on a feature, your code might not at all be in a working state.
- Tests could fail, or the program might not even compile
- At once, it would be good practice to commit regularly...
- ... on the other hand you do not want regressing commits (with a broken software state) on your production line.
Here is where branches come to play:
- Branches allow you to fork from any existing commit, and then developing in all tranquility on a specific feature.
- If you have intermediate "broken" software states and commit that's ok: The new commits are on a separate branch
and do not interfere with anyone else needing a proper
main
branch. - Example:
- While you're working on your halma controller, you might also work on something else, e.g. an alternative UI, or an AI robot player.
- An additional branch allows you to develop on one feature, while leaving the
main
"line" untouched.
Like everything else around git
there's a command to create new branches:
- To create and also place the HEAD on a new branch:
git checkout -b user-interface
- Let's take the command apart:
checkout
: go to a different commit-b
(for branch): the commit is actually on a new branchuser-interface
: the name of your new branch
- Note that new branches always spawn from your HEAD, i.e. the commit in the graph which you most recently selected.
Use the kebap convention for branch names
Branch names should be readable, but not contain spaces. Somehow everyone agreed on all lowercase words, separated by a hyphen. This notation is also known as kebap notation, paraphrasing the meat (or tofu) on a stick.
Note: There's also the option to create a branch, without directly placing the
HEAD
. In my experience this is usually not what you want, but fore completeness:git branch user-interface
causes just the branch creation.
Branching from main
-
In most cases you'll want to develop new features, based on the most recent code version.
- In that case, the first step is always to verify that you're on
main
's most recent commit. You want yourHEAD
to be at the end of themain
line. - To be sure, you can start with
git checkout main
. - In any case check your
git status
. You should see this:
- In that case, the first step is always to verify that you're on
-
Next, you can safely create a new branch with
git checkout -b user-interface
--- title: "Creating a branch from main" --- gitGraph: commit id: "1" commit id: "2" commit id: "3" branch "user-interface" checkout "user-interface" commit id: "4" commit id: "HEAD" type: HIGHLIGHT
-
What's the effect?
- You can develop new functionality, based on the most recent commit.
- Whatever experimental progress you make on the
user-interface
branch,main
stays stable. - If you're approaching a deadline, et least with
main
you have something to submit that will somewhat work for a demo / submission - even if things go horribly wrong on theuser-interface
branch.
Branching from detached head
- Sometimes you want to add new functionality based on an earlier code version
- This translates to: you want to create a new branch, but you want it to be based on an earlier commit in the graph.
- But careful, having a detached HEAD is usually not what you want. You'll effectively working with outdated code. Yet
it can make sense, if...
- You suspect a severe bug in the most recent commit on
main
, and are pretty sure it will be reverted. - The last commit introduced severe refactoring, and you cannot mentally deal with both refactoring and adding new functionality. But for some reason the functionality work must not be delayed.
- You suspect a severe bug in the most recent commit on
What's a detached HEAD, again ?
"HEAD" is simply a metaphor for where you are in the graph (on which commit you're pointing). "Detached" means, you're not pointing to the most recent commit of the branch.
- Creating a branch from an earlier commit would take place like this:
Will result conceptually in this commit graph:
---
title: "Creating a branch from detached HEAD"
---
gitGraph:
commit id: "1"
commit id: "2"
branch "user-interface"
checkout "main"
commit id: "3"
checkout "user-interface"
commit id: "4"
commit id: "HEAD" type: HIGHLIGHT
Branching from a branch
- Sometimes you want to create a new feature, that needs another feature currently under development.
- In that case you want to create a branch from another branch.
- Just like before the trick is to first place the HEAD on the branch you want to deviate from.
- The above code can be represented as:
--- title: "Creating a branch from main" --- gitGraph: commit id: "1" commit id: "2" commit id: "3" branch "user-interface" checkout "user-interface" commit id: "4" commit id: "5" branch "some-extra-feature" commit id: "HEAD" type: HIGHLIGHT
Logging branches
As you've already learned in the git basics, it's always good to inspect the graph regularly, before typing more git commands.
- However, the standard
git log
only shows commits on the branch you're currently on.- When working with branches, this gives you an incomplete, and sometimes misleading representation.
- If you truly want to know what's going on, add some parameters:
--all
: to see all commits on all branches--decorate
: print branch information--oneline
: to represent each commit on a single line, and get a more compact visualization--graph
: get a visualization of the full graph layout
- Example:
# Setup repo
git init
# Create initial commit on main
echo "Keksli is cute" > keksli.txt
git add keksli.txt
git commit -m "initial commit"
# Add two commits to other branch
git checkout -b funny-branch
echo "We <3 Keksli" >> keksli.txt
git add keksli.txt
git commit -m "branch commit 1"
echo "We love him so so much" >> keksli.txt
git add keksli.txt
git commit -m "branch commit 2"
# Add another commit on main
git checkout main
echo "He is so cute" >> keksli.txt
git add keksli.txt
git commit -m "main commit 2"
git log --all --decorate --oneline --graph
What should a wholesome git log
print ?
Answer: It should print a textual representation of the entire commit graph:
A dog
- The command is actually easy to remember ! Just think "A Dog!"
Create an alias
If you find that line too long to remember, create an alias, with: git config --global alias.adog "log --all --decorate --oneline --graph"
. Then you can inspect the tree with git adog
Switching branches
- You've previously already seen the
checkout
command, and used it to navigate your HEAD to earlier commits on themain
branch.- For example, the below commands set the HEAD back to the most-recent commit:
- The
checkout
command actually lets you navigate to any commit on the graph, it does not need to be a commit on themain
graph. - So in the following example, you could move your HEAD to any commit of any branch, using the checkout command:
--- title: "Moving to another commit" --- gitGraph: commit id: "1" commit id: "HEAD" type: HIGHLIGHT commit id: "3" branch "user-interface" checkout "user-interface" commit id: "4" commit id: "5"
git checkout 4
gitGraph: commit id: "1" commit id: "2" commit id: "3" branch "user-interface" checkout "user-interface" commit id: "HEAD" type: HIGHLIGHT commit id: "5"
Keeping the HEAD on your shoulders
- Most of the time you want to move your HEAD to the end of a branch.
- Branch names are easier to remembers, than commit numbers. But also changing to a branch name gives you a guarantee to keep your HEAD attached.
- You can either just provide the branch name:
git checkout user-interface
- But to avoid confusion with changing to a specific commit, there's also the switch command:
git switch user-interface
(Some prefer this. I have no preference. Use whatever you prefer.)--- title: "Moving to the \"end\" of a branch" --- gitGraph: commit id: "1" commit id: "2" commit id: "3" branch "user-interface" checkout "user-interface" commit id: "4" commit id: "HEAD" type: HIGHLIGHT
Extending branches
Commits are always added to the commit you're currently on.
- Ideally your HEAD is attached to the end of a branch.
- In that case you're just extending whatever branch you most recently checked out.
- Example:
- If you are not at the end of a branch, e.g. because you placed HEAD on a previous commit, you're implicitly creating a new anonymous branch.
- Avoid this! Create a new branch if you need one, but don't create unnamed extensions.
- You can still turn them into "actual branches", but the intermediate state is confusing.
- Example:
# Initial state: two commits, and HEAD is detached git adog * a99d339 (main) added whiskers * 6a66966 (HEAD) initial commit echo "..." >> keksli git add keksli git commit -m "added more ..." [detached HEAD 767f84c] added more ... 1 file changed, 1 insertion(+) git adog * 767f84c (HEAD) added more ... | * a99d339 (main) added whiskers |/ * 6a66966 initial commit
Merging branches
- So far you've seen how to create new branches, i.e. new lines of commits.
- But whatever new functionality you implement, at the end of the day you want it back on the
main
, not on an isolated feature branch. That is:- Once you've reached a stable state, you want fuse your new code (the new commits on your feature branch) back
with the
main
branch. - Especially you do not want to fuse code while it is in the making. Code must be in a stable state, and all your new code is compiling, properly tested and commented.
- Once you've reached a stable state, you want fuse your new code (the new commits on your feature branch) back
with the
- This process of bringing functionality back from one branch to another is called merging.
- More precisely: Merging is the attempt to combine the work of different local branches.
- The basic syntax is:
- Illustration
--- title: "Merging branch into main" --- gitGraph: commit id: "1" commit id: "2" commit id: "3" branch "user-interface" checkout "user-interface" commit id: "4" commit id: "5" checkout "main" merge "user-interface" commit id: "HEAD" type: HIGHLIGHT
Merge scenarios
Depending on the graph structure, merging can be easy and fully automated, or require some manual intervention:
- If only the branch you're pulling from has new commits: The merge is straightforward, a new commit is created, combining the outcome of all merged commits.
- If both branches have new commits...
- if they do not affect the same files: There is no conflict, a new commit is created, combining the outcome of all merged commits.
- if they do affect the same files, but not the same lines: There is no conflict, a new commit is created, combining the outcome of all merged commits.
- if they do affect the same files, and the same lines: There is a merge conflict. You have to manually tell which line version should win. Afterwards, a new commit is created.
This is why you want a clutter-free repo
Merge conflicts are manageable for code, but extremely tedious for binary or generated files, because there is no meaningful way to manually resolve them at line-level. That's why you do not want clutter in your repo. Keep unrelated files out of your repo, you do not want to waste your time with nonesensical merge conflicts!
Frequent merging main into feature
- Merging branches tends the longer you wait.
- You wait longer, it becomes more likely that there are more commits on both branches, higher changes of changes in the same files, and in the same lines.
- If developing your feature takes a while, you can regularly merge
main
into your branch, before you eventually merge back intomain
. - Multiple, finer grained merges are easier than massive, rare merges!
- Here's an illustration of merging
main
into a feature branch:
---
title: "Merging branch into main"
---
gitGraph:
commit id: "1"
commit id: "2"
commit id: "3"
branch "user-interface"
checkout "user-interface"
commit id: "4"
checkout "main"
commit id: "5"
checkout "user-interface"
merge "main"
checkout "main"
commit id: "6"
checkout "user-interface"
commit id: "7"
checkout "main"
merge "user-interface"
commit id: "HEAD" type: HIGHLIGHT
Frequent merging does not contradict frequent commits!
While you do want to wait with merging until your code is in a good state, you should keep frequent commits on the feature branch. Avoid mega-commits, will be a pain in the neck to merge.
Syncing branches across repositories
- Unlike a dropbox, icloud, or google-drive, git repositories do not automatically synchronize.
- (Auto-syncing would be horrible, you would constantly fear the broken work-in-progress updates of peers !)
- In the VCS Basics lecture you've already learned that commits must be actively pushed or pulled to and from a
remote
server.
git commit
only creates a local commit.- The commit is not yet on the server. It remains strictly on your machine until you also
push
the commit to the server.
- Branches work just the same!
git checkout -b my-new-branch
creates a new local branch.- The branch is not yet on the server. It remains strictly on your machine until you
push
it to the server.
Sending branches to remote
- Pushing a new branch to the server is just a longer
push
command:
git push --set-upstream origin my-new-branch
You don't need to remember that
It's not necessary to remember that special push
command. If you attempt a normal push, git will remind you of the correct syntax.
Pulling branches to local
- Often you want to work on a branch that's on the remote repository, but not yet on your local repo.
- This happens e.g. when a colleague created a new branch on their machine and later pushed the branch to the remote server.
- Before you can use their branch, and contribute new commits, you need a copy of that branch in your repo.
- Example:
- You start by looking up your local branches
- You know your colleague has started working on a branch
user-interface
, but it is not there. - You do NOT want to create a new branch yourself, you want to receive the branch from your colleague!
- There are two principal ways to retrieve branches from the server, and they are closely related but have a subtle difference
Fetch
- In the easiest case, you just want to retrieve the new branches from the remote.
- You can do so with
git fetch
Example: - You start by looking up your local branches:
- The branch you're looking for is not there, so you use
fetch
to retrieve new commits from the server:
Fetch does not auto-merge!
An important detail about fetch
is that it does not attempt to merge, but only retrieves new commits.
- The commits retrieved by fetch are always parked on extra branches, i.e. the new commits are not combined with whatever branches you might already have.
- In the previous example that was not an issue, because you wanted to retrieve an entirely new branch.
- But most of the time you retrieve new commits or branches, not for the joy of having them parked on unused local branches, but you also want to integrate them with your own new commits.
- Unless you were interested in an entirely new branch,
fetch
is always followed by amerge
, so you can actually integrate the new commits with your own progress.
Pull
- In the first VCS lecture and lab you've already briefly seen the
pull
command. pull
is actually not new, but just the combination offetch
+merge
- I rarely use
fetch
. New commits are not much good unless also merged with your own progress.
Pull vs Fetch
Pull and fetch both retrieve commits from remote, but pull
implicitly also attempts to merge
new commits (tries to combine and move forward your HEAD
in the current branch), while fetch
only retrieves the commit without moving your HEAD
. In most cases you want to pull
.
Removing branches
- At some points branches have served their purpose.
- E.g. if a feature is finalized and the branch has been merged into master, you do not need the branch any more.
- It's a good practice to also delete the branch in that case, so it does not distract you when you inspect your repo with
git branch -a
One does not imply the other
Removing a local branch does not imply remote branch deletion.
Neither does removing a remote branch imply local branch deletion.
Local
-
To remove a local branch, just use either of:
git branch --delete user-interface
git branch -d user-interface
-
To prevent you from accidentally deleting unsafed work, git won't let you delete unmerged branches. You can force your way with either of:
git branch --delete --force user-interface
git branch -D user-interface
Remote
- Removing a branch from a remote repo requires
push
ing your graph changes. - The corresponding command is:
git push -d origin user-interface
Rebase
- In general, you want developers to commit regularly, and for fine-grained progress.
- This gives you more freedom ro backtrack in the event of issues.
- However, many people working on the same git graph, and pushing to many branches at once leads to "diamonds" in the graph history:
The reason is that deleting a branch does not actually delete its commits, only the branch name associated with. That being said, every merged branch leaves another diamond behind, no matter if the branch was subsequently deleted or not.
- It would be nicer to magically have branches that pick up from one specific commit on
main
and immediately afterwards push their (finalized) progress back tomain
.- But that's almost impossible, because developing new features takes time
- The bigger the team the more new content will be constantly added to
main
-
Illustration of
main
evolving, throughout development ofuser-interface
:--- title: main evolving in parallel to user-interface --- gitGraph commit id: "1" commit id: "2" branch user-interface checkout user-interface commit id: "3" commit id: "4" commit id: "5" type: HIGHLIGHT checkout main commit id: "6"
-
Rebase is a trick, to belatedly modify the commit graph, so it pretends commits on a branch to have occurred sequentially rather than in parallel.
- In the above example, the graph is changed to resemble:
--- title: Commit graph after rebase --- gitGraph commit id: "1" commit id: "2" commit id: "6" branch user-interface checkout user-interface commit id: "3R" commit id: "4R" commit id: "5R" type: HIGHLIGHT checkout main
Syntax
-
To make commits of a feature branch rebase from the most recent main commit (see illustration above):
-
Most likely you then what to bring your changes back to
main
(without diamonds !) - You just rebased, to you are guaranteed to have no merge conflicts
Rebase internals
How does rebase actually work?
- The idea of relocating commits is somewhat contrived, as the very idea of a commit is to define a certain position the commit graph.
- Rebasing does not actually move commits but rather:
- Hide the original commits of a given series, e.g. a branch.
- Replay each of the original commits, i.e.:
- Create a copy somewhere else in the graph
- Imitate the exact same file modifications as in the original commit
- Issue a new merge conflict for any newly encountered issue
- Optional: Remove the hidden commits
Cherry picking
- Merge attempts to combine the work of all commits on a different branch until the specified commit.
- In certain situations that is too coarse grained:
- Imagine some colleague working on a feature branch also fixed a little bug that is relevant to you.
- Then you do not necessarily merge all of their changes, just to get that little bugfix.
- The
cherry-pick
command allows you to combine work more selectively. - More precisely, cherrypicking (another commit being the cherry) allows you to merge with one specific commit, regardless of what happened before on that branch.
- Example:
# trace the commits of all branches
git adog
* 9ff78e3 (HEAD -> main) some-recent-commit
| * 89afd4f (some-other-branch) commit-to-cherrypick
| * c081b9e some-non-relevant-commit
|/
* 151c4c6 first-commit
# just making sure the HEAD is attached to main
git checkout main
# go get that commit from another branch!
git cherry-pick 89afd4f
Tags
- Tags allow you to label a specific commit, so you can find it again more easily later.
- Tags are often used for major releases, e.g.
Halma-1.4.0
Soft tags
- To tag the most recent commit (where your HEAD is pointing), use:
git tag Halma-1.4.0
- Alternatively, you can provide a longer description with:
git tag -m "A new release of Halma, showcasing some AI bot players." Halma-1.5.0
Showing tags
- As soon as you have tags, you can inspect the list with:
git tag
Publishing tags
- Like everything else of your local repo, tags remain local until you actively share them, by pushing them to the repo:
git push origin Halma-1.5.0
Conclusive thoughts
- Q: I have a
git
button in my IDE, why should I bother learning the commands. - A: First off, feel free to use whatever tool works best for you. But here's my take at gui tools for novices:
" git
is complex, and you want to think twice before any action. Graphical tools make things faster, but easily lead to a "Try and error" approach, a bit like: "I'll just click this and that button and hope it "works". (Without any understanding of what \"it works\" actually means or what exactly to expect)". That's not how you gain proficiency with any complex technology, and that not what you'll be hired for in a company. You do not learn by copy-pasting from ChatGPT, and no one will pay an engineer who can only copy past from ChatGPT. Actual understanding however is values. But learning takes time and practice. If you type out the git commands yourself, you have at least more time to THINK about what they serve for. When you feel comfortable with the commands, of course go ahead and try new tools."
Literature
Inspiration and further reads for the curious minds:
- Learn git branching
- Git Cheatsheet (basic)
- Pro Git (free)
- MIT, the missing semester: Git