Version Control (Basics)
In this unit we'll cover the basic of version control systems (VCS) on the example of git
.
We'll start by a brief motivation on why traditional file storage solutions are inadequate for managing code, illustrate
common scenarios for managing code with a VCS and cover how to safely naviguate these basic situations with git. The
lecture concludes with a short cheat-sheet recapitulation of commands for the most standard situations.
Lecture upshot
Version Control Systems (VCS) allow multiple developers to simultaneously work on the same codebase in a coordinated and conflict-free manner. Git is the de-facto standard VCS and uses a graph-model to maintain code versions.
Motivation
Who do we not just place our code in a google drive / dropbox / etc ?
- Software development is a highly collaborative discipline.
- Most corporate software projects have more than one developer, constantly developing on the same code.
- Computer programs are densely coupled
- Each occurance of a variable or function must be correctly spelled. A single change in one place easily breaks the entire program
- Human language is a lot more robust, it is ok for several people to edit a google docs in multiple places at the same time.
- So it is important that simultaneous edits of multiple developers to not interfere.
Can you think of a simple solution to prevent inconsistent code ?
Locking: You can set up a mechanism that allows only one developer at a time to edit the codebase. If there's no concurrence, and only one code instance there can be no consistency conflicts. However, for practical reasons this approach is no longer acceptable.
VCS Repositories
All version tracking systems need a repository, that is: a database to track the historic evolution of code. Every tracked state is also called a "commit".
- Progress is tracked as a series of states
- Work can be easily reverted to earlier states
- This is especially useful to revert back to a stable state for software demos.
However, there are two repository setups: Central and decentralized repositories
Central repository
Earlier VCSs like CVS (Concurrent Version System) and Subversion set on a central repository.
- There is exactly one server with all historic code versions (repository)
- Developers have a local workspace with project files and exchange their modifications over the server
- Developers implicitly send their work to the server, when they create a new commit
- Developers do not have a local copy of all historic code versions
Decentralized repositories
The nowadays de-facto standard VCS git sets on decentralized repositories.
- There may, or may not be a server. Even a USB drive can be used to exchange commits
- All developers hold a full copy of all history code versions (repository)
- Developers do not implicitly send their work anywhere, when they create a new commit
Commit vs Push
Note how the exchange between repositories only shows push/pull instead of commit. In git, commits refer to snapshots (like a photo of code at a given moment in time), but they are not exchanged until a developer decides to push then to a server or pull them from another repository.
Git's graph model
Git internally sets on a graph model, where the commits define nodes.
- Each commit, or node represents one version of your code
- The latest commit, also referred to as
HEAD
, is the most recent version tracked by git - Think of
HEAD
like a pointer to the most recent commit, or graph node
Navigating the graph
You easily navigate the graph, that is switch the visible content on disk to that of another commits
- When you switch to another commit, it is as if you set a pointer in the graph to an earlier commit.
- This does not change the graph structure.
- Nothing is lost when you switch between commits.
- You can go backward, or forward. The code you then see in your file-system matches the state of the corresponding commit. But you can at any time go back to where you were before in the graph.
- Switching the pointer from one commit to another is called to
checkout
a commit.
Commits are backups
You can perfectly use git just for your personal backups. Your local git repository does not need a server to function.
While working you your project, you can regularly commit
to build a linear trace of your project history. If you later
need to revert, you can simply checkout to an earlier commit, to load a backup.
How does it work ?
- Every git repository has a hidden folder
.git
- Git uses it to store the graph structure an information about all commits
- It's best not to ever manually interfere with the contents of the
.git
folder - Especially do not remove or rename the
.git
folder
Local graph extension
When a developer has coded something new, e.g. a function, they can create a new commit and add it to the graph.
- Over time, as the project advances, git's graph continuously grows
- In the simples case the graph is entirely linear, i.e. with every commit, the line gets a bit longer:
Pushing
Commits are always created within the scope of one repo, and are not automatically sent to other repos, i.e. the server.
- When a developer extended their local repository graph by a new commit, they still need to
push
the new node to the server's graph - In detail, the process is:
- A developer is synced up with the server
- They then add some new functionality, i.e. create a new local
commit
- Finally, they share their
commit
, bypush
ing it back to the server
Branches
It is also possible to deliberately create multiple versions of your software, e.g. to work on independent
functionality, and afterwards reunite the work. This can happen entirely locally, and in called branch
ing and merge
ing.
In this case the graph is no longer a linear alignment of commits, but becomes an actual graph:
Branches are an advanced git concept, which we will delve into in a future lecture. For now it is sufficient to retain that
git
s graph is build out of commits as nodes, and can be more complex than a linear sequence.
Git commands
Whether you want to init
ialize a project, create a commit
, push
work with a server or checkout
to an earlier
commit, the
default way of interacting with git is via commands.
- It is important to understand the underlying graph model, and think what a git command does before you type it.
- Inexperienced users often just try out memorized commands, without understanding what they do.
- Often this causes a corrupted repository state and complicated error messages
- The naive fix is then often to copy paste the code somewhere else, delete the project, and download a new copy
Image credits: XKCD
Warning
There are many graphical tools to interact with git. It is ok to use these tools, after you have become proficient with the command line. Git is complex and you need a sound understanding of how it works.
Initializing a project
Any folder can be turned into a git repository.
- While git is usually used for software projects, we can use git for any project
- Let's imagine I want to use git as backup system for my personal love poems
- So I have a folder
MyLovePoems
, and a single filekeksli.txt
inside: - The poem is a short and sentimental piece of art:
Roses are red Violets are blue My dearest Keksli I love you ,_ _ |\\_,-~/ / _ _ | ,--. ( @ @ ) / ,-' \ _T_/-._( ( / `. \ | _ \ | \ \ , / | || |-_\__ / ((_/`(____,-'
Credits: asciiart.eu
- So I have a folder
- Obviously I am regularly moved to tears by the emotional intimacy of my poem. So I want to make sure I never lose
it, when I experiment with new endings and sequels.
- Therefore, I decided to use git to keep track of my poem's evolution !
- This is as simple as going inside my project folder and typing:
git init
- The command creates a new hidden folder
.git
:MyLovePoems ├── .git │ ├── HEAD │ ├── config │ ├── description │ ├── hooks │ │ ├── applypatch-msg.sample │ │ ├── commit-msg.sample │ │ ├── fsmonitor-watchman.sample │ │ ├── post-update.sample │ │ ├── pre-applypatch.sample │ │ ├── pre-commit.sample │ │ ├── pre-merge-commit.sample │ │ ├── pre-push.sample │ │ ├── pre-rebase.sample │ │ ├── pre-receive.sample │ │ ├── prepare-commit-msg.sample │ │ ├── push-to-checkout.sample │ │ └── update.sample │ ├── info │ │ └── exclude │ ├── objects │ │ ├── info │ │ └── pack │ └── refs │ ├── heads │ └── tags └── poem.txt
Never touch the .git folder
Git stores all it's data in this directory, notably the complete commit graph, and the "pointer" to the node we're currently working with. You can delete the entire .git folder, if you are sure you no longer need version tracking. Otherwise, hands off the .git folder!
Note that git init
only puts me in the position to begin tracking versions. That means:
- The git graph is still empty
- My poem is not yet protected
- We can verify this by asking git to list the latest nodes in the commit graph:
First time setup
- Before we continue using git, we should add a username and email.
- Every contribution to a project is done by a person, so let's tell git who we are and how to reach us:
git config --global user.name "Maximilian Schiedermeier"
git config --global user.email "schiedermeier.maximilian@uqam.ca"
The
--global
flag tells git to store the settings system-wide, so we don't need to repeat these steps for every new repository.
Commit
Commits are the nodes in our git graph. Creating a new commit takes three steps:
- Change something on the file system. This can be...
- Changing the contents of a file
- Adding a new file
- Deleting a file
- Telling git which files to consider for the next commit:
- The graph is your ticket to navigate the code history
- The more fine-grained your commits, the more precisely you can navigate
- Creating the commit
- Actually create a node in the graph
- Contain all considered files (and ignore all the others)
- Add a short descriptive comment
Tip
The git status
command is a useful helper throughout the process. The command provides you helpful information for step 1 and 2: Which files have changes, and which files are tracked by git.
We'll now illustrate the process on two scenarios.
Adding a new file
To create out first commit, we want to add our existing poem to a new node in the graph.
- Let's start by a lookup, to see which files are around, and which files are tracked:
git status
- Interpretation: git tells us there is a file
poem.txt
that is not tracked, i.e. is not yet contained anywhere in the graph.
-
Next we want to tell git, that we wish to consider this file for the next commit.
git add poem.txt
(git actually already helped us with that command)-
Let's check again what has changed:
git status
-
Interpretation: All good, git tells us the next commit will contain our
poem.txt
-
Finally, we want to actually create our commit:
git commit -m "First version of my poem"
- Interpretation: A first commit has been added to the graph
- (The
-m
arguments allows us to add a comment for the commit, so later it is easier to remember what we contributed.)
Always a good idea to check with git log
if our first commit appears in the graph history:
commit 3fc28074200a8503409234f60c1b0ad30fce4d4f (HEAD -> main)
Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca>
Date: Sat Aug 24 07:56:19 2024 -0400
First poem version
Looks good, we've created a first commit, with id 3fc2807
Checkums
Commit IDs are actually not random strings, but SHA-1 checksums. That is, they provide a direct integrity check for the contents of a commit.
Changing a file.
Most of the time we're working on existing files, e.g. to change a line of code.
-
Let's try to change something in our poem and create a new commit with the changes.
- First, I'll make the poem a bit more cat appropriate:
- Once more we can ask git what it makes of these changes with
git status
- Interpretation: git detected that we modified a file that is already in the graph
-
Unless we tell git so, the changes to
poem.txt
won't be included in the next commit.- Just like before, we tell git to consider the poem file:
git add poem.txt
git status
- Just like before, we still need to actually create the new commit:
git commit -m "Improved poem for cat context
-
Finally, we can check if our new commit appears in the log:
git log --graph
* commit 2cdfc6ff084400b42ebe2b93f0fe282b313b73ad (HEAD -> main) | Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca> | Date: Sat Aug 24 10:22:41 2024 -0400 | | Improved poem for cat context | * commit 3fc28074200a8503409234f60c1b0ad30fce4d4f Author: Maximilian Schiedermeier <schiedermeier.maximilian@uqam.ca> Date: Sat Aug 24 07:56:19 2024 -0400 First poem version
Tip
Use git log
with the --graph
argument to get a textual visualization of the git graph.
Amazing, we've successfully extended our commit graph! It now looks like this:
Message typos
- Sometimes we make typos in our commit messages.
- That's pretty easy to fix, with:
Checkout
- When we extended git's commit graph with
commit
, we implicitly advanced to pointer, i.e. we already navigated the graph. - The
checkout
command allows you to navigate the graph without modifying it.git checkout 3fc2807
Note: switching to '3fc2807'. You are in 'detached HEAD' state. You can look around, make experimental changes and commit them, and you can discard any commits you make in this state without impacting any branches by switching back to a branch. If you want to create a new branch to retain commits you create, you may do so (now or later) by using -c with the switch command. Example: git switch -c <new-branch-name> Or undo this operation with: git switch - Turn off this advice by setting config variable advice.detachedHead to false
- Interpretation: It worked, we are now at the previous commit in the graph. The warning means that we are not at the end of a "line", or `HEAD.
- Nonetheless, the content of
poem.txt
is back to our first "backup"
We also use checkout to get back to the end of the main line: git checkout main
Moving to the end of a line
Always use the name of the line to move to the end. Intuitively we could think that using the commit ID of the last commit in line does the same, but git
will only re-attach HEAD when we specify the branch name, e.g. main
. So in our case git checkout 2cdfc6f
is not the same as git checkout main
.
Detached commits
Detaches commits are nodes that do not belong to any line (branch).
HEAD
paraphrases to "the latest commit on a line".-
Detached head means, the code we are viewing (checked out), is not at the end of a line.
- Illustration: the square is the commit we've checked out. HEAD is the end of the line
-
Previously we've seen that the
git commit
command adds a new commit to the end of a line. - This is why git keeps telling us we're off track, when we create commits while in detached HEAD:
$ echo "We love Keksli. <3 <3 <3" >> poem.txt
$ git add poem.txt
$ git commit -m "Commit in detached HEAD state"
[detached HEAD 3fc2807] Trying to commit in detached HEAD state
1 file changed, 1 insertion(+)
- Commits made in detached head are not connected to any "line" (branch), and git does not know what to do with them.
- Especially git does not allow us to abandon them (which is a good thing, we might lose important work!)
- While in detached head we cannot move around, i.e. get back to
HEAD
.
For now: Do not commit in detached HEAD
We'll soon see how to properly create "branches" to create commits on "new lines".
Corrections
Sometimes it happens that we're a bit to hasty and make a little mistake
Undo git add
- Let's assume you just used
git add secret-poem.txt
and told git to include a file for the next commit. - But you made mistake and I actually did not want to include the file.
- Then you can just remove the file again with:
git reset secret-poem.txt
Heads up
This only works if you have not yet finalized the next commit.
Undo git commit
- Let's assume you just used git to add and commit the wrong files
- Then you can use:
git reset HEAD~
- The tilde (
~
) tells git to remove the last commit from the graph, but keep the files of the commit untouched.
What's the different to a checkout of the previous commit?
Checkout sets the graph pointer to the previous commit, but does not modify the graph. Reset removes the last commit from the graph.
Gitlab
- Previously we've learned that commits can be sent and retrieved from repository on a git server.
- Your university has a git server running for you: Gitlab
- A bit pedantic, but be precise with the terminology:
- GitLab is not git. GitLab is a commercial server software,
git
is a command-line tool to manage the commit graph ! - GitLab is not the same as a single online git repository. Many different repositories are maintained on the same GitLab server instance.
- GitLab is not git. GitLab is a commercial server software,
Online repos
Git servers like GitLab have two main purposes:
- Provide a backup of your work
- Serve as synchronization hubs for work in teams
We'll cover the work in teams in an extra lecture, for now let's look at how to use a git server as backup system.
For setup, there are two main scenarios:
- You already have a local git repository, that is, you already ran
git init
on your computer, and have some code.- The goal is to create an online repository that mirrors the content and graph of your local repository.
- You do not yet have a local git repository, that is you have not yet run
git init
on your computer, and you have no code.- The goal is to create a local copy (or clone) of a git repository, made available on the server.
Setting up a new online repo
Let's try to bring a copy of my poems, along with the entire git commit graph to the server.
- Use your UQAM credentials to log into your personal GitLab account
- Afterwards, you can create your own online repository:
- Click the
+
sign in the top left - Select
New Project / Repository
- Select
Create blank project
- Give it a name, e.g.
MyPoems
- Give it a namespace, usually just your username
- Select a visibility, e.g.
Private
, so only admitted users will have access to your code - Uncheck the
create README
box (we'll handle that later)
- Click the
At this point the repository is created, but empty. We'll now want to locally tell git
about the empty online repository and send code and graph to the server:
- Tell
git
about the online repository:- After creation, the GitLab page shows a blue
Code
button. - Click it and copy the first line to your clipboard. It will be something like:
git@gitlab.info.uqam.ca:max/MyPoems.git
- Afterwards, use the command line to tell
git
about this repository: - So far we've only told git that the repository exists. Now we have to actually bring our code and the git commit graph to the server:
$ git push --set-upstream origin main Enumerating objects: 6, done. Counting objects: 100% (6/6), done. Delta compression using up to 8 threads Compressing objects: 100% (4/4), done. Writing objects: 100% (6/6), 624 bytes | 624.00 KiB/s, done. Total 6 (delta 2), reused 0 (delta 0), pack-reused 0 To gitlab.info.uqam.ca:max/MyPoems.git * [new branch] main -> main branch 'main' set up to track 'origin/main'. schieder@imac:MyPoems $
- After creation, the GitLab page shows a blue
Tip
The --set upstream origin main
addemdum is only needed the first time. From here on our local git repository knows about the associated server. New local commits can be sent to the server with a simple git push
.
Sending a new commit to the server
- You only need to set up the server once.
- If I make further changes to my poem, e.g.:
echo "<3 <3 <3 <3 <3 <3" > poem.txt
- I can send follow-up commits to the server with a simple push:
$ cat poem.txt Salmons are red Tuna is blue My dearest Keksli I love you ,_ _ |\\_,-~/ / _ _ | ,--. ( @ @ ) / ,-' \ _T_/-._( ( / `. \ | _ \ | \ \ , / | || |-_\__ / ((_/`(____,-' <3 <3 <3 <3 <3 <3 $ git add poem.txt $ git commit -m "Added more hearts" [main f411bbd] Added more hearts 1 file changed, 1 insertion(+) $ git push Enumerating objects: 5, done. Counting objects: 100% (5/5), done. Delta compression using up to 8 threads Compressing objects: 100% (2/2), done. Writing objects: 100% (3/3), 292 bytes | 292.00 KiB/s, done. Total 3 (delta 1), reused 0 (delta 0), pack-reused 0 To gitlab.info.uqam.ca:max/MyPoems.git 5e2bd29..f411bbd main -> main
Cloning an existing repo
Cloning a repo is the inverse of setting up an online repo. You clone, when:
- The online repository exists, and...
- You want a local repository copy of the code and git commit graph
It is either init
or clone
, but never both combined. Why?
Git init
creates an entirely new graph, clone
tells to make a linked copy. It is not possible to create a graph that is at one entirely new, and a copy of an existing graph.
To make a local repository clone of an existing online repository:
- Visit the project page, e.g. the MyPoems project on GitLab
- Click the blue
Code
button and copy one of the URLs - Open a terminal and use
git clone
with the URL:$ git clone git@gitlab.info.uqam.ca:max/MyPoems.git Cloning into 'MyPoems'... remote: Enumerating objects: 9, done. remote: Counting objects: 100% (9/9), done. remote: Compressing objects: 100% (6/6), done. remote: Total 9 (delta 3), reused 0 (delta 0), pack-reused 0 (from 0) Receiving objects: 100% (9/9), done. Resolving deltas: 100% (3/3), done.
No need to set server
When you cloned a repo, git already knows about the server (makes sense, you cloned it from the server). Therefore, when you made new commits, you can direclty use git push
, without any further configuration.
Getting more commits form the server
Likely the code on server side evolves, and new commits are added on server side.
- If your own local graph has not evolved since you cloned, you can sync up with
git pull
.
Good practices
Git is a complex tool, so it is important to respect some good practices:
- THINK about what you want to do in the graph, before you type
- Commit often (commits are checkpoints)
- Commit fine-grained: avoid
git add *
orgit add .
Only add the specific files you are interested in.
- Commit fine-grained: avoid
- Write meaningful comments
- Do not skip the commit message dialogue, to not name your commits
toto
,Another commit
,...
- Do not skip the commit message dialogue, to not name your commits
- Don't clutter the repo
- Git can only make sense of textual files
- Do not put non-textual files (aka binaries) into the repo, notably:
- PDFs
- PNGs / JPGs
- Do not put generated files into the repo, norably:
- Compiled code / class files
- JAR files
- Maven's
target
folder
- Do not put irrelevant files into your project:
- MacOS tends to create
.DS_Store
files everywhere - VIM creates cache files
- ...
- MacOS tends to create
- The less trash you have in your repo, the better
Gitignore
- Adding files by accident is too easy, luckily there is a protective mechanism.
- You can tell git about file types, i.e. file endings that should never be part of the repo.
- It is as simple as adding a file:
.gitignore
- Example:
- If the
.gitignore
file contains a line*.class
, git will not let you add compiled java classes to a commit.
- If the
Use a .gitignore
generator
Most of the time it's not obvious which binary or generated files appear in a project, but you know which technologies you will be workgin with, e.g. "MacOS, VIM, Java". You can use generators, like Toptal, to create a holistic .gitignore
file, tailored for the technologies you are working with.
Literature
Inspiration and further reads for the curious minds:
- Git Cheatsheet (basic)
- Pro Git (free)
- MIT, the missing semester: Git