Build Systems (Basics)
In this unit we'll cover the basic of build systems and illustrate their functioning on the example of maven. We'll start be a short recapitulation of language compilation, notably in the context of the java programming language, and then investigate challenges associated with the assembling of binaries. Finally, we take a look at the core features of maven, notably dependency management, and make first attempts to customize build system behaviour with a configuration file.
Lecture upshot
Build systems have one purpose: make sure your source code can be reliably translated into a usable product. Although this sounds straightforward, it is all but a simple task. Buildsystems are a powerful and highly-configurable means to brings some order and reliability into the path from source code to product.
Recap: Packages, import and classpath
For programming novices the import statement is often perceived as "something I have to put at the start of the
class, so an error message goes away".
To answer the question of why imports are needed in the first place (wouldn't it be easier if everything were
automatically imported, all the time ?) we'll now go through the basic principles of:
- How code context is actually structured into java
packages, and why it matters. - Why you need imports to access some classes, but not others, and some exceptions to the rules.
- Why it is not advised to ever
importmore than is absolutely needed.
Packages
- Java organizes code into packages
- Packages provides context for a set of related classes.
- Example:
Mouse.classmight be a class imitating the behaviour of a biological mouse, or it might be a class to deal with the human interface device. - Surrounding packages
animalsanddeviceswill remove the ambiguity.
- By default, you can only access classes defined within the same package (or
java.lang)- Example: When working in the
animalspackage, the statementnew Mouse()is not ambiguous. It will always create the class imitating behaviour of a biological mouse.
- Example: When working in the
---
title: Example of ambigous context
---
classDiagram
namespace animal {
class Mouse {
+String species
+void squeak()
}
}
namespace devices {
class Ꮇouse {
+String brand
+void click()
+void scroll()
}
}
Imports
- By default, it is not possible to instantiate (or even call) classes outside your current package.
- This protects you from accidental ambiguity.
- Example: When you're working outside the
animalsanddevicespackage, what should happen when you create anew Mouse()? It is not clear what you want.
- Imports allow you to extend a classes current package context by additional classes:
import animals.Mousewill allow you to use the model of the biological mouse (new Mouse()), but not the human interface device.
classDiagram
namespace animals {
class Mouse {
+String species
+void squeak()
}
}
namespace devices {
class Ꮇouse {
+String brand
+void click()
+void scroll()
}
}
class Main
Mouse <.. Main: import
Illustration of
Mainusing animport animals.Mousestatement.
Why are packages and imports are semantic antagonists ?
Packages set context boundaries, imports extend existing boundaries.
What about String
There a some classes that do not need importing, i.e. you can use them in your code, without explicitly adding
an import, although they are not part of the current package.
An example is String. When you wrote your first HelloWorld program, you defined a String and called the System
class, but you did not need any imports:
// No import for String required here...
public class HelloWorld
{
public static void main(String[] args)
{
System.out.println("Hello, world!");
}
}
But we just learned that calling any non-package class requires an import. What's going on here ?
There's one exception to the rule
java.lang.* is auto-imported. The classes of this package are so common-purpose, all classes are automatically accessible: String, System, Math, ...
The asterisk
Imports are not necessarily class based, you can also use the * wildcard to import entire packages:
ca.quam.mgl7010.animals.*will import all animals that might be in the package.- However, using wildcards is generally not recommended.
What could possibly go wrong, using wildcards (*) ?
Wildcards import everything from a package. What if the content of that package changes (new class added) and suddenly there's a conflict with something you already import? Better just import only what you definitely need - avoid unused imports.
Classpath limitations
- The
importstatement gives allows access to classes known to the JVM, which would be otherwise out of context.- "Known to the JVM" means: A class is on the classpath (a variable, pointing to all directories scanned by the JVM for classes.)
Classpath & imports
You can only import what's on the classpath. If you found a useful library on the internet, you cannot import it without first downloading and adding it to the classpath. The JVM cannot scan the internet for your !
Software releases
In most cases, your client is less interested in your source code, as in a single executable file. To them, it does not matter how your program works, it only matters they can conveniently run it.
-
Not convenient:
- Having to install the Java development kit.
- Having to install an IDE.
- Having to compile the sources themselves.
- Having to remember a command to run your code.
-
Convenient:
- A single file the client can just double-click, and your beautiful program starts.
Java has a dedicated file format, just for that: JAR files, or "Java ARchive" files.
JAR files
- JARs are zip files.
- They can contain whatever you put inside. To build a release, you should add:
- All bytecode: (
*classfiles,CAFEBABE...) - A manifest, hinting (amongst others) which of the classes serves as launcher.
- All bytecode: (
- If you add the above, a JVM can readily consume (interpret and run) the JAR file.
- Your client only needs a JVM, not a full Java development environment.
- Since the JAR contains bytecode, it will run on any platform.
Creating a JAR release
-
Let's start with a simple program:
-
Creating a JAR file from sources is relatively simple:
-
This produces a JAR file:
MyDeliverable.jar-
Content:
-
With
MANIFEST.MFcontent:
-
-
The JAR file can be directly executed, using the JVM:
java -jar MyDeliverable.jar
JARs for other purposes
- JAR files are not necessarily self-sustained applications.
- Since the JAR format serves only as zip container, you can also wrap up code as a library. That means:
- The JAR is not a program, only a bundle of useful functionality, or interfaces.
- The JAR does not need to specify a launcher class in its manifest, because it is not intended to be launched.
- The JAR can optionally also contain documentation and source files, so whoever uses the JAR can understand how it internally works.
In general:
- When you build a release for a client, only behaviour matters. You only add bytecode and a manifest specifying the launcher.
- When you build a library for other developers, behaviour and internal functioning matters. You also add source-code and documentation.
Recap
JAR files are just containers. What to include depends on the intended audience.
Software releases with Maven
We could argue, that manually selecting files and wrapping them up to a JAR, is somewhat inconvenient.
Ideally:
- Creating a new release is fast and convenient
- Creating a new release is reliable
This is one of the key motivations for build systems: Getting fast, convenient and reliable from project sources, to something that can be delivered to the client.
In the following we'll take a look at how the build process is realized, using the build tool "Maven".
Maven project layout
Before we start, we have to respect a few constraints imposed by maven:
- Maven projects stipulate a specific internal structure, which is slightly different from standard java projects.
- Luckily, we do not need create the initial project structure manually, but can use maven to initialize our projects:
mvn archetype:generate \
-DgroupId=ca.uqam.info \
-DartifactId=MavenHelloWorld \
-DarchetypeArtifactId=maven-archetype-quickstart \
-DinteractiveMode=false
Note: Some systems (windows) cannot handle multi-line commands. Remove the
\and place everything in a single line.
Let's take apart the above command:
archetypetranslates to "we want to use a project template"- There are different archetypes, for different purposes. E.g. for a webapp, or server backend we would have used a
different
archetypeArtifactId.
- There are different archetypes, for different purposes. E.g. for a webapp, or server backend we would have used a
different
- Similar to any dependencies you might need, your own software should have a unique identifier. Other developers might
actually end up using your software as a library!
groupIdrepresents an organization specific string, usually this is just the revered domain name of the company you are working for. Since we are all at UQAM's computer science department we useca.uqam.infoartifactIdstands for the software you are building. It should be a descriptive name, indicating what your software does.
Once executed the above command will have created the following folder and file structure:
MavenHelloWorld/
├── pom.xml
└── src
├── main
│ └── java
│ └── ca
│ └── uqam
│ └── info
│ └── App.java
└── test
└── java
└── ca
└── uqam
└── info
└── AppTest.java
12 directories, 3 files
For now, we are only interested in the
pom.xmland the initial class fileApp.java. We will deal with tests in a later lecture.
Initial App class
The initial pom file is just a stub HelloWorld class:
package ca.uqam.info;
/**
* Hello world!
*
*/
public class App {
public static void main(String[] args) {
System.out.println("Hello World!");
}
}
Package structures
Notice how the initial groupId argument has affected to project's package naming and internal folder structure ?
Initial pom file
The initial pom file looks, as created by the as follows:
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>ca.uqam.info</groupId>
<artifactId>MavenHelloWorld</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>MavenHelloWorld</name>
<url>http://maven.apache.org</url>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>3.8.1</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
We already see a first dependency entry, namely for junit.
- In the spirit of good software development, maven assumed that we will test our software.
- However, junit is not part of standard java. Hence, we need a dependency block.
Anything peculiar about the dependency block ?
The junit dependency block actually has an additional <scope>test<scope> entry. This is because maven makes a distinction between dependencies needed to build a software, VS dependencies needed to run a software. Junit is not needed at runtime, therefore maven added an additional test scope tag.
Building with maven
Let's use maven to build the project, that is, create java bytecode. The corresponding command is mvn package.
-
The first time you run
mvn package, we'll actually see how maven downloads junit.- There will be some logging messages:
-
Once the command is finished, we'll find a new directory
target, with the following content: -
Among others, this is exactly the same outcome as we could have created manually, using the java compiler:
- A jar file
- Class files for our source code
package produces a build
While the initial setup was a bit tedious, a project only needs to be configured once. From here on we can conveniently produce new builds (JARs) with the corresponding maven command mvn clean package.
Running Maven artifacts
Running the generated artifacts is almost identical to running manually created binaries.
Class files
We can without issues run the generated class files. Note however, that we must be at the package structure's root to call our program:
-
Calling
App.classprogram from wrong location: -
Calling
App.classprogram from package root location:
Jar files
Running the jar file is not possible without specifying the main class, as by default the manifest does not contain a
reference to launcher class.
-
Trying to run jar file without arguments:
-
When we inspect the jar internal MANIFEST file, we see there is no launcher specified:
-
Running jar file with custom main class as classpath argument:
Note: Maven of course offers a way to integrate a working MANIFEST into the produced jar file. More on that in a bit.
A clean build
The target directory accumulates all artifacts ever built. If you modify your code or pom.xml and re-build, new files might be added and it can be confusing to distinguish between old and new files.
A good trick is to always use the clean argument before building, which wipes the entire target directory: Build your project systematically with **mvn clean package**
Dependencies
Most of the time you do not want to program everything form scratch ( See previous lecture on reuse-oriented development)
JSON example
- We will now look at how compiling and execution changes when additional libraries are involved.
-
Imagine we want to serialize (create a machine-readable string representation) of a java object:
-
A student object, as created by
new Student(34, "Maximilian", "Schiedermeier")should be serialized to:
Manual string creation
-
Of course, I could manually construct a JSON String:
// Create student Student myStudent = new Student(34, "Maximilian", "Schiedermeier"); // Export student String jsonString = "{\n" + "\t\"age\": " + myStudent.getAge() + ",\n\t\"firstName\": \"" + myStudent.getFirstName() + "\", \n\t\"lastName\": \"" + myStudent.getLastName() + "\"\n}"; System.out.println(jsonString); -
But what if I need to export another object ? What if object structure changes ?
Using a library
Conversion to JSON is a classic problem, which has been solved many times:
- Why reinvent the wheel when I can just reuse existing code.
- Maybe I can find something on the internet that solves my issue ?
Indeed, a convenient way to simplify the code would be to reuse an existing Google GSON library:
import com.google.gson.Gson;
class MainWithGson {
public static void main(String[] args) {
// Create student
Student myStudent = new Student(34, "Maximilian", "Schiedermeier");
// Export student
String jsonString = new Gson().toJson(myStudent);
System.out.println(jsonString);
}
}
- However, we are now using code that is not ours, and the compiler, as well as the JDK need to know about this
dependency.
- Download Gson library JAR file:
- This time we compile with the
-cp(classpath) argument, telling the compiler that there are additional classes to consider.javac -cp gson-2.11.0.jar *java - Same, when running the compiled bytecode, the JVM must know about the GSON library:
java -cp gson-2.11.0.jar:. MainWithGson
What could possibly go wrong?
By re-using the Google GSON library we have created a "dependency". Without that library at hand, our code can be neither compiled, nor executed.
Dependency management
Dependency management aims to simplify the above procedure, by specifying which dependencies exist (and where to get them), instead of manually managing JAR files.
In essence, the ingredients for any dependency management tool are:
- An online repository, systematically archiving all versions of all libraries
- A local configuration file, describing for every dependency:
- A unique identifier, e.g. "Google GSON library"
- The specific version, e.g. "2.11.0"
Advantages:
- Configuration files are textual and lightweight. They can be stored in the project itself.
- Configuration files are written in a machine-interpretable syntax. A tool can collect all dependencies for you and even modify the classpath when needed.
- You have a clear trace of all exact dependency versions. You can easily scan your project for security vulnerabilities.
- No damage is done if you lose a library JAR, you can easily retrieve it again from the repository.
Maven
Maven is a build system for Java that offers exactly these two components:
- A central repository, with almost every java library ever created: mavencentral.org
- A project configuration file that (among others) lists all project dependencies:
pom.xml- POM stands for "Project Object Model"
- XML is a machine-readable file format
- A dependency is stated as:
Instead of ourselves downloading JAR files and placing them on the classpath, we ask maven to ensure all listed dependencies are in place.
Never ever
Never ever manually interfere with dependency management in maven-ready project. If you need an additional library, edit the pom.xml, but never-ever drag-and-drop a JAR file into your project, or edit the classpath.
Repositories
The local repository:
- Maven also maintains a local repository on your computer, the
~/.m2directory. Every library you ever used is cached in this directory. - The local repository has two purposes:
- Performance: It is faster to reuse a cached JAR file, than to download it from the internet every time
- Offline mode: You might not be online all the time. With the dependencies cached, you can develop without an internet connection
Third party repositories:
- You might encounter situations where you need a library that is not in the official maven central repository.
- Examples:
- Libraries that are not free to use, and therefore not publicly accessible
- Your own libraries, that you do not want to upload
- Anyone can set up their own repository
- An online repository is just a few files accessible over an HTTP webserver
- However, by default maven does not know about third-party repositories. If you want maven to search your own
repository, you need to edit the
pom.xmlfile and indicate the location of your third party repository..
Mavens dependency resolve algorithm
To build a project, maven tries to satisfy all dependencies with corresponding artifacts (the JAR files, and some metadata). To satisfy a dependency, maven will:
- First check the local
.m2repository for a cached file. - If not cached, it will check if any thrid-party repo is defined. (Usually there are none defined)
- Contact the official maven repository servers to retrieve the needed artifact
flowchart LR
resolve[\Resolve depdendency/]
resolve --> localcheck{Artifact in local repo ?}
localcheck -. yes .-> done([Success])
localcheck ==>|no| remotecheck{3rd party repo defined ?}
remotecheck -. yes .-> 3rdpartycheck{Artifact in 3rd party ?}
3rdpartycheck -. yes .-> done
3rdpartycheck -. no .-> centralcheck{Artifact in central ?}
remotecheck ==>|no| centralcheck
centralcheck ==>|yes| done
centralcheck -. no .-> fail([Fail])
What happens when a project is built for the second time ?
Maven will already have all dependencies cached. It will take the topmost path.
Compile time vs runtime
By default, maven incorporates dependencies only at compile time, that is, when we cannot run the produced JAR, without all dependencies manually provided as classpath arguments.
In a later lecture, we'll learn how to configure maven to produce a self-contained JAR, which can be used as-is.
The problem with JARs
JARs are a straightforward way to pass around functionality, but as projects grow, several issues tend to persist:
- The more dependencies you have, the more JARs you carry with you.
- Where to store the JARs? In the repo? What if you need the same JAR in multiple projects, do you store them twice?
- Everytime a new developer joins the project you need to pass on all the JARs and have them manually extend their classpath.
- Just compiling your project becomes somewhat tedious, because you always have to check a long list of dependencies are correctly installed.
- The client complains that your software is not running. Most likely they overlooked to install a JAR, or installed the wrong version. How do you find out which one it is?
- A JAR is a snapshot, it is one fixed version.
- What if a security vulnerability was found in a JAR you've downloaded. How would you know?
- You lost a JAR that you need to build your project, where do you find it again? Which version was it again that works with your project?
A true horror story
In a previous research lab we had a software that was particularly hard to work with.
Before a developer could even write a single line of code, they needed to spend at least 30 minutes to 1 hour of manual project configuration.
The project had even JARs where no-one knew where exactly they came from, whether they were still needed, or what exactly they were contributing.
There was some rumor of some intern who once was around 3 years ago, who had created the JARs.
But the intern was long gone and no one had contact information. At the same time these were fat software artefacts that bloated up our software executable.
Countless developer hours were wasted, because of poor dependency management.
Maven plugins
Apart from downloading and caching dependencies, for usage in the local classpath, maven also has a second purpose: Modifying the build pipeline.
- By default, all that happens on
mvn clean packageis the standard compiling of source files (using any specified libraries for the process). - But most of the time you want to do more, e.g. produce a human-readable documentation, run tests, or create a build artifact with all dependencies included.
- Maven's comportment regarding the build-pipeline can be modified with
plugins.
A plugin is a short (or sometimes not so short) snippet in a dedicated plugins section of the pom.xml. There can be
as many plugins as you want in the pom.xml:
<project>
<build>
<plugins>
<!-- First plugin details -->
<plugin>
...
</plugin>
<!-- Second plugin details -->
<plugin>
...
</plugin>
...
</plugins>
</build>
</project>
- Every plugin has a default location in the build pipeline, because most tasks make only sense at a given moment of the process.
- Example: building a jar with all dependencies inside should happen at the end, after all classes are compiled, all tests have passed etc.
We'll look at how plugins work in more detail, and maven's understanding of plugins variation points in the build process in a future lecture, for now we'll look at some short useful plugin examples.
Exec
The exec plugin lets you specify a main class for your code, that should be called by default when the code is
executed.
- This is closest to the infamous green triangle ("
▶ ") - All you need to do is point to the main class to be called on execution:
<!-- Specify main class for exec goal -->
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>exec-maven-plugin</artifactId>
<version>1.6.0</version>
<executions>
<execution>
<goals>
<goal>java</goal>
</goals>
</execution>
</executions>
<configuration>
<mainClass>full.package.name.YourMainClassLauncher</mainClass>
</configuration>
</plugin>
Once the plugin defined, you can conveniently run your program with: mvn clean compile exec:java
Add an IDE maven run configuration
Once the exec plugin defined in your pom.xml, modify the IDE's "Run Configuration" (a.k.a. what is called when the green triangle is clicked) to simply call maven's exec plugin!
Maven Jar
The Maven jar plugin allows you to add additional information when your program is packaged into a JAR.
- Previously we've seen that a maven produced JAR cannot be launched, without explicitly stating the main class
- The
maven-jar-pluginallows you to provide a default information, on which main class should be listed in the JAR's manifest.
<!-- specify main class for JAR manifest-->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
<configuration>
<archive>
<manifest>
<addClasspath>true</addClasspath>
<mainClass>ca.uqam.info.MainWithGson</mainClass>
</manifest>
</archive>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
<finalName>MainWithGson</finalName>
<appendAssemblyId>true</appendAssemblyId>
</configuration>
</plugin>
JavaDoc
In the second lab session you've learned a command to manually extract all JavaDoc information from your code, to generate a human-readable website. The JavaDoc plugin lets you automatize this step, as standard component of the build process.
- Enabling the JavaDoc plugin is also a good practice, as you directly see whether there are issues in your code style, whenever you compile your code.
- Ideally the plugin is configured to fail on warnings, so no developer is ever tempted to work with or produce
undocumented code
- "I'll document that later", easily turns into "I'll document that never."
<!-- Plugin to ensure all functions are commented and generate javadoc -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-javadoc-plugin</artifactId>
<version>3.4.1</version>
<configuration>
<javadocExecutable>${java.home}/bin/javadoc</javadocExecutable>
<reportOutputDirectory>${project.reporting.outputDirectory}/docs
</reportOutputDirectory>
<failOnWarnings>true</failOnWarnings>
<quiet>true</quiet>
</configuration>
<executions>
<execution>
<id>attach-javadocs</id>
<goals>
<goal>jar</goal>
</goals>
</execution>
</executions>
</plugin>
Use a snippet library
Most developers do not manually create their pom.xml line by line, but stitch it together from prepared blocks. Use a snippet library, e.g. https://m5c.github.io/MavenSnippetLibrary/ to rapidly create a working build pipeline.
Literature
Inspiration and further reads for the curious minds: