Build Systems (Basics)
In this unit we'll cover the basic of build systems and illustrate their functioning on the example of maven. We'll start be a short recapitulation of language compilation, notably in the context of the java programming language, and then investigate challenges associated with the assembling of binaries. Finally, we take a look at the core features of maven, notably dependency management, and make first attempts to customize build system behaviour with a configuration file.
Lecture upshot
Build systems have one purpose: make sure your source code can be reliably translated into a usable product. Although this sounds straightforward, it is all but a simple task. Buildsystems are a powerful and highly-configurable means to brings some order and reliability into the path from source code to product.
Java compiler recap
For a start, we will take a closer look at how a simple program is executed on a computer.
Executing code
There are two ways to execute code. Which path is taken depends on the programming language.
- Using an interpreter: The computer tries to make sense of your source code, as it processes it, line by line. Examples:
- Python
- Bash
- Javascript
- Using a compiler: The computer does not make sense of your source code, but expects you to translate it first into bytecode, using a compiler. Examples:
- Basic
- C/C++
- Java ???
About binaries
Usually, compiled code is bound to a specific target platform. That is, once a compiler has translated from source code to byte code, the outcome can only be used on specific hardware.
Interpreted VS compiled languages
So which one is better ?
- Interpreted languages:
- Cross-platform compatibility
- Slightly faster development, no need to wait for compiler
- Often "easier" syntax, beginner friendly
- Compiled languages:
- More performant: compiler optimizations for target platform, executed in native code
- Safer: Fewer runtime errors, more compile time errors
Note, that a compiler only needs to its job once, VS an interpreter needs to run every time a program is executed.
And what about Java ?
Java is a special case...
- Java is a compiled language
- Java bytecode works on every platform, because it is interpreted by a virtual machine, the JVM.
- Java unites some advantages of both:
- Cross-platform compatibility
- Performant compiler optimizations for VM
- Security by compiler checks
Java compiler illustration
Step 1: Write java code
A human developer writes human-readable java code:
class HelloWorld {
public static void main(String[] args) {
System.out.println("Hello, World!");
}
}
Step 2: Compile to java bytecode
The java compiler is called: javac HelloWorld.java
and produces bytecode:
CAFE BABE 0000 0042 001D 0A00 0200 0307 0004 0C00 0500 0601 0010 6A61 7661
2F6C 616E 672F 4F62 6A65 6374 0100 063C 696E 6974 3E01 0003 2829 5609 0008
0009 0700 0A0C 000B 000C 0100 106A 6176 612F 6C61 6E67 2F53 7973 7465 6D01
0003 6F75 7401 0015 4C6A 6176 612F 696F 2F50 7269 6E74 5374 7265 616D 3B08
000E 0100 0D48 656C 6C6F 2C20 576F 726C 6421 0A00 1000 1107 0012 0C00 1300
1401 0013 6A61 7661 2F69 6F2F 5072 696E 7453 7472 6561 6D01 0007 7072 696E
746C 6E01 0015 284C 6A61 7661 2F6C 616E 672F 5374 7269 6E67 3B29 5607 0016
0100 0A48 656C 6C6F 576F 726C 6401 0004 436F 6465 0100 0F4C 696E 654E 756D
6265 7254 6162 6C65 0100 046D 6169 6E01 0016 285B 4C6A 6176 612F 6C61 6E67
2F53 7472 696E 673B 2956 0100 0A53 6F75 7263 6546 696C 6501 000F 4865 6C6C
6F57 6F72 6C64 2E6A 6176 6100 2000 1500 0200 0000 0000 0200 0000 0500 0600
0100 1700 0000 1D00 0100 0100 0000 052A B700 01B1 0000 0001 0018 0000 0006
0001 0000 0001 0009 0019 001A 0001 0017 0000 0025 0002 0001 0000 0009 B200
0712 0DB6 000F B100 0000 0100 1800 0000 0A00 0200 0000 0300 0800 0400 0100
1B00 0000 0200 1C
(Hex dump produced with: xxd -u -p HelloWorld.class | sed 's/..../& /g'
)
Do you see something unusual ?
The hex-dump of java bytecode shows that every compiled class starts with CAFEBABE
. Apparently an easter egg, added by the java developers.
Step 3: Run byte code on JVM
Finally, the byte code is distributed to various target systems.
- Note that any system needs a JVM to run java bytecode.
- Other compiled languages do not have this requirement, as they directly produce CPU-executable code.
How is the JVM best described?
The JVM is an intpreter. It reads in java bytecode and immediately sends execution instructions to the host in the CPU's native machine language.
JARs
- In most cases your java program will be more than a single class.
- You could still translate all classes, and ship them, maybe as a zip-file
- But Java already has a file format for that: JARs.
- JAR stands for "Java ARchive"
- JARs are zip files
- JARs have all classes, and a
manifest
, with meta information, notably the entry point to your application
- JARs are still executed by the JVM, and in the best case run on all systems
JAR usage
- Creating a JAR file from sources is relatively simple:
# Compile all java files to *.class files, place them in a new build directory
javac -d ./build *java
# Enter build directory
cd build
# Create a java archive (JAR) file, using all *.class files.
# Add MANIFEST.MF pointing to HelloWorld as launcher class.
jar cfe MyDeliverable.jar HelloWorld *class
-
This produces a JAR file:
MyDeliverable.jar
-
Content:
-
With
MANIFEST.MF
content:
-
-
The JAR file can be directly executed, using the JVM:
java -jar MyDeliverable.jar
JAR files for libraries
JAR files are also a great way to provide functionality to other programmers. Most java libraries are provided as JAR files. Whoever uses your code is most likely only interested in the functionality you offer, not the source code-itself.
Dependencies
Most of the time you do not want to program everything form scratch (See previous lecture on reuse-oriented development)
JSON example
- We will now look at how compiling and execution changes when additional libraries are involved.
-
Imagine we want to serialize (create a machine-readable string representation) of a java object:
-
A student object, as created by
new Student(34, "Maximilian", "Schiedermeier")
should be serialized to:
Manual string creation
-
Of course, I could manually construct a JSON String:
// Create student Student myStudent = new Student(34, "Maximilian", "Schiedermeier"); // Export student String jsonString = "{\n" + "\t\"age\": " + myStudent.getAge() + ",\n\t\"firstName\": \"" + myStudent.getFirstName() + "\", \n\t\"lastName\": \"" + myStudent.getLastName() + "\"\n}"; System.out.println(jsonString);
-
But what if I need to export another object ? What if object structure changes ?
The GSON library
-
A lot easier would be to reuse the existing Google GSON library:
-
However, we are now using code that is not ours, and the compiler, as well as the JDK need to know about this dependency.
- Download Gson library JAR file:
- This time we compile with the
-cp
(classpath) argument, telling the compiler that there are additional classes to consider.javac -cp gson-2.11.0.jar *java
- Same, when running the compiled bytecode, the JVM must know about the GSON library:
java -cp gson-2.11.0.jar:. MainWithGson
What could possibly go wrong?
By re-using the Google GSON library we have created a "dependency". Without that library at hand, our code can be neither compiled, nor executed.
The problem with JARs
JARs are a straightforward way to pass around functionality, but as projects grow, several issues tend to persist:
- The more dependencies you have, the more JARs you carry with you.
- Where to store the JARs? In the repo? What if you need the same JAR in multiple projects, do you store them twice?
- Everytime a new developer joins the project you need to pass on all the JARs and have them manually extend their classpath.
- Just compiling your project becomes somewhat tedious, because you always have to check a long list of dependencies are correctly installed.
- The client complains that your software is not running. Most likely they overlooked to install a JAR, or installed the wrong version. How do you find out which one it is?
- A JAR is a snapshot, it is one fixed version.
- What if a security vulnerability was found in a JAR you've downloaded. How would you know?
- You lost a JAR that you need to build your project, where do you find it again? Which version was it again that works with your project?
A true horror story
In a previous research lab we had a software that was particularly hard to work with.
Before a developer could even write a single line of code, they needed to spend at least 30 minutes to 1 hour of manual project configuration.
The project had even JARs where no-one knew where exaclty they came from, whether they were still needed, or what exaclty they were contributing.
There was some rumor of some intern who once was around 3 years ago, who had created the JARs.
But the intern was long gone and no one had contact information. At the same time these were fat software artefacts that bloated up our software executable.
Countless developer hours were wasted, because of poor dependency management.
Dependency management
Dependency management aims to eliminate all aforementioned issues by rather specifying which dependencies exist (and where to get them), instead of manually managing JAR files.
In essence, the ingredients for any dependency management tool are:
- An online repository, systematically archiving all versions of all libraries
- A local configuration file, describing for every dependency:
- A unique identifier, e.g. "Google GSON library"
- The specific version, e.g. "2.11.0"
Advantages:
- Configuration files are textual and lightweight. They can be stored in the project itself.
- Configuration files are written in a machine-interpretable syntax. A tool can collect all dependencies for you and even modify the classpath when needed.
- You have a clear trace of all exact dependency versions. You can easily scan your project for security vulnerabilities.
- No damage is done if you lose a library JAR, you can easily retrieve it again from the repository.
Maven
Maven is a build system for Java that offers exactly these two components:
- A central repository, with almost every java library ever created: mavencentral.org
- A project configuration file that (among others) lists all project dependencies:
pom.xml
- POM stands for "Project Object Model"
- XML is a machine-readable file format
- A dependency is stated as:
Instead of ourselves downloading JAR files and placing them on the classpath, we ask maven to ensure all listed dependencies are in place.
Never ever
Never ever manually interfere with dependency management in maven-ready project. If you need an additional library, edit the pom.xml
, but never-ever drag-and-drop a JAR file into your project, or edit the classpath.
Repositories
The local repository:
- Maven also maintains a local repository on your computer, the
~/.m2
directory. Every library you ever used is cached in this directory. - The local repository has two purposes:
- Performance: It is faster to reuse a cached JAR file, than to download it from the internet every time
- Offline mode: You might not be online all the time. With the dependencies cached, you can develop without an internet connection
Third party repositories:
- You might encounter situations where you need a library that is not in the official maven central repository.
- Examples:
- Libraries that are not free to use, and therefore not publicly accessible
- Your own libraries, that you do not want to upload
- Anyone can set up their own repository
- An online repository is just a few files accessible over an HTTP webserver
- However, by default maven does not know about third-party repositories. If you want maven to search your own repository, you need to edit the
pom.xml
file and indicate the location of your third party repository..
Mavens dependency resolve algorithm
To build a project, maven tries to satisfy all dependencies with corresponding artifacts (the JAR files, and some metadata). To satisfy a dependency, maven will:
- First check the local
.m2
repository for a cached file. - If not cached, it will check if any thrid-party repo is defined. (Usually there are none defined)
- Contact the official maven repository servers to retrieve the needed artifact
flowchart LR
resolve[\Resolve depdendency/]
resolve --> localcheck{Artifact in local repo ?}
localcheck -. yes .-> done([Success])
localcheck ==>|no| remotecheck{3rd party repo defined ?}
remotecheck -. yes .-> 3rdpartycheck{Artifact in 3rd party ?}
3rdpartycheck -. yes .-> done
3rdpartycheck -. no .-> centralcheck{Artifact in central ?}
remotecheck ==>|no| centralcheck
centralcheck ==>|yes| done
centralcheck -. no .-> fail([Fail])
What happens when a project is built for the second time ?
Maven will already have all dependencies cached. It will take the topmost path.
Maven in action
We'll now cover some basic usage scenarios for maven.
Maven project layout
- Maven projects stipulate a specific internal structure.
- We are not going to create the project structure manually, but use maven to initialize our projects:
mvn archetype:generate \
-DgroupId=ca.uqam.info \
-DartifactId=MavenHelloWorld \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DinteractiveMode=false
Note: Some systems (windows) cannot handle multi-line commands. Remove the
\
and place everything in a single line.
Let's take apart the above command:
archetype
translates to "we want to use a project template"- There are different archetypes, for different purposes. E.g. for a webapp, or server backend we would have used a different
archetypeArtifactId
.
- There are different archetypes, for different purposes. E.g. for a webapp, or server backend we would have used a different
- Similar to any dependencies you might need, your own software should have a unique identifier. Other developers might actually end up using your software as a library!
groupId
represents an organization specific string, usually this is just the revered domain name of the company you are working for. Since we are all at UQAM's computer science department we useca.uqam.info
artifactId
stands for the software you are building. It should be a descriptive name, indicating what your software does.
Once executed the above command will have created the following folder and file structure:
MavenHelloWorld/
├── pom.xml
└── src
├── main
│ └── java
│ └── ca
│ └── uqam
│ └── info
│ └── App.java
└── test
└── java
└── ca
└── uqam
└── info
└── AppTest.java
12 directories, 3 files
For now, we are only interested in the
pom.xml
and the initial class fileApp.java
. We will deal with tests in a later lecture.
Initial App class
The initial pom file is just a stub HelloWorld class:
package ca.uqam.info;
/**
* Hello world!
*
*/
public class App {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
Package structures
Notice how the initial groupId argument has affected to project's package naming and internal folder structure ?
Initial pom file
The initial pom file looks, as created by the as follows:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>ca.uqam.info</groupId>
<artifactId>MavenHelloWorld</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>MavenHelloWorld</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
We already see a first dependency entry, namely for junit
.
- In the spirit of good software development, maven assumed that we will test our software.
- However, junit is not part of standard java. Hence, we need a dependency block.
Anything peculiar about the dependency block ?
The junit dependency block actually has an additional <scope>test<scope>
entry. This is because maven makes a distinction between dependencies needed to build a software, VS dependencies needed to run a software. Junit is not needed at runtime, therefore maven added an additional test
scope tag.
Building with maven
Let's use maven to build the project, that is, create java bytecode. The corresponding command is mvn package
.
-
The first time you run
mvn package
, we'll actually see how maven downloads junit.- There will be some logging messages:
-
Once the command is finished, we'll find a new directory
target
, with the following content: -
Among others, this is exactly the same outcome as we could have created manually, using the java compiler:
- A jar file
- Class files for our source code
Running Maven artifacts
Running the generated artifacts is almost identical to running manually created binaries.
Class files
We can without issues run the generated class files. Note however, that we must be at the package structure's root to call our program:
-
Calling
App.class
program from wrong location: -
Calling
App.class
program from package root location:
Jar files
Running the jar file is not possible without specifying the main class, as by default the manifest
does not contain a reference to launcher class.
-
Trying to run jar file without arguments:
-
When we inspect the jar internal MANIFEST file, we see there is no launcher specified:
-
Running jar file with custom main class as classpath argument:
Note: Maven of course offers a way to integrate a working MANIFEST into the produced jar file. More on that in a bit.
A clean build
The target directory accumulates all artifacts ever built. If you modify your code or pom.xml
and re-build, new files might be added and it can be confusing to distinguish between old and new files.
A good trick is to always use the clean
argument before building, which wipes the entire target
directory: Build your project systematically with **mvn clean package**
Maven plugins
Apart from downloading and caching dependencies, for usage in the local classpath, maven also has a second purpose: Modifying the build pipeline.
- By default, all that happens on
mvn clean package
is the standard compiling of source files (using any specified libraries for the process). - But most of the time you want to do more, e.g. produce a human-readable documentation, run tests, or create a build artifact with all dependencies included.
- Maven's comportment regarding the build-pipeline can be modified with
plugins
.
A plugin is a short (or sometimes not so short) snippet in a dedicated plugins
section of the pom.xml
. There can be as many plugins as you want in the pom.xml
:
<project>
<build>
<plugins>
<!-- First plugin details -->
<plugin>
...
</plugin>
<!-- Second plugin details -->
<plugin>
...
</plugin>
...
</plugins>
</build>
</project>
- Every plugin has a default location in the build pipeline, because most tasks make only sense at a given moment of the process.
- Example: building a jar with all dependencies inside should happen at the end, after all classes are compiled, all tests have passed etc.
We'll look at how plugins work in more detail, and maven's understanding of plugins variation points in the build process in a future lecture, for now we'll look at some short useful plugin examples.
Exec
The exec
plugin lets you specify a main class for your code, that should be called by default when the code is executed.
- This is closest to the infamous green triangle ("
▶ ") - All you need to do is point to the main class to be called on execution:
<!-- Specify main class for exec goal -->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>java</goal>
</goals>
</execution>
</executions>
<configuration>
<mainClass>full.package.name.YourMainClassLauncher</mainClass>
</configuration>
</plugin>
Once the plugin defined, you can conveniently run your program with: mvn clean compile exec:java
Add an IDE maven run configuration
Once the exec plugin defined in your pom.xml
, modify the IDE's "Run Configuration" (a.k.a. what is called when the green triangle is clicked) to simply call maven's exec plugin!
Maven Jar
The Maven jar plugin allows you to add additional information when your program is packaged into a JAR.
- Previously we've seen that a maven produced JAR cannot be launched, without explicitly stating the main class
- The
maven-jar-plugin
allows you to provide a default information, on which main class should be listed in the JAR's manifest.
<!-- specify main class for JAR manifest-->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>3.2.0</version>
<configuration>
<archive>
<manifest>
<mainClass>full.package.name.YourMainClassLauncher</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
JavaDoc
In the second lab session you've learned a command to manually extract all JavaDoc information from your code, to generate a human-readable website. The JavaDoc plugin lets you automatize this step, as standard component of the build process.
- Enabling the JavaDoc plugin is also a good practice, as you directly see whether there are issues in your code style, whenever you compile your code.
- Ideally the plugin is configured to fail on warnings, so no developer is ever tempted to work with or produce undocumented code
- "I'll document that later", easily turns into "I'll document that never."
<!-- Plugin to ensure all functions are commented and generate javadoc -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.4.1</version>
<configuration>
<javadocExecutable>${java.home}/bin/javadoc</javadocExecutable>
<reportOutputDirectory>${project.reporting.outputDirectory}/docs
</reportOutputDirectory>
<failOnWarnings>true</failOnWarnings>
<quiet>true</quiet>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
Use a snippet library
Most developers do not manually create their pom.xml
line by line, but stitch it together from prepared blocks
. Use a snippet library, e.g. https://m5c.github.io/MavenSnippetLibrary/ to rapidly create a working build pipeline.
Literature
Inspiration and further reads for the curious minds: