Skip to content

Build Systems (Basics)

In this unit we'll cover the basic of build systems and illustrate their functioning on the example of maven. We'll start be a short recapitulation of language compilation, notably in the context of the java programming language, and then investigate challenges associated with the assembling of binaries. Finally, we take a look at the core features of maven, notably dependency management, and make first attempts to customize build system behaviour with a configuration file.

Lecture upshot

Build systems have one purpose: make sure your source code can be reliably translated into a usable product. Although this sounds straightforward, it is all but a simple task. Buildsystems are a powerful and highly-configurable means to brings some order and reliability into the path from source code to product.

Recap: Packages, import and classpath

For programming novices the import statement is often perceived as "something I have to put at the start of the class, so an error message goes away".
To answer the question of why imports are needed in the first place (wouldn't it be easier if everything were automatically imported, all the time ?) we'll now go through the basic principles of:

  • How code context is actually structured into java packages, and why it matters.
  • Why you need imports to access some classes, but not others, and some exceptions to the rules.
  • Why it is not advised to ever import more than is absolutely needed.

Packages

  • Java organizes code into packages
    • Packages provides context for a set of related classes.
    • Example: Mouse.class might be a class imitating the behaviour of a biological mouse, or it might be a class to deal with the human interface device.
    • Surrounding packages animals and devices will remove the ambiguity.
  • By default, you can only access classes defined within the same package (or java.lang)
    • Example: When working in the animals package, the statement new Mouse() is not ambiguous. It will always create the class imitating behaviour of a biological mouse.
---
title: Example of ambigous context
---
classDiagram
    namespace animal {
        class Mouse {
            +String species
            +void squeak()
        }
    }

    namespace devices {
        class Ꮇouse {
            +String brand
            +void click()
            +void scroll()
        }
    }

Imports

  • By default, it is not possible to instantiate (or even call) classes outside your current package.
    • This protects you from accidental ambiguity.
    • Example: When you're working outside the animals and devices package, what should happen when you create a new Mouse() ? It is not clear what you want.
  • Imports allow you to extend a classes current package context by additional classes:
    • import animals.Mouse will allow you to use the model of the biological mouse (new Mouse()), but not the human interface device.
classDiagram
    namespace animals {
        class Mouse {
            +String species
            +void squeak()
        }
    }

    namespace devices {
        class Ꮇouse {
            +String brand
            +void click()
            +void scroll()
        }
    }

    class Main

    Mouse <.. Main: import

Illustration of Main using an import animals.Mouse statement.

Why are packages and imports are semantic antagonists ?

Packages set context boundaries, imports extend existing boundaries.

What about String

There a some classes that do not need importing, i.e. you can use them in your code, without explicitly adding an import, although they are not part of the current package.

An example is String. When you wrote your first HelloWorld program, you defined a String and called the System class, but you did not need any imports:

// No import for String required here...

public class HelloWorld
{
  public static void main(String[] args)
  {
    System.out.println("Hello, world!");
  }
}

But we just learned that calling any non-package class requires an import. What's going on here ?

There's one exception to the rule

java.lang.* is auto-imported. The classes of this package are so common-purpose, all classes are automatically accessible: String, System, Math, ...

The asterisk

Imports are not necessarily class based, you can also use the * wildcard to import entire packages:

  • ca.quam.mgl7010.animals.* will import all animals that might be in the package.
  • However, using wildcards is generally not recommended.
What could possibly go wrong, using wildcards (*) ?

Wildcards import everything from a package. What if the content of that package changes (new class added) and suddenly there's a conflict with something you already import? Better just import only what you definitely need - avoid unused imports.

Classpath limitations

  • The import statement gives allows access to classes known to the JVM, which would be otherwise out of context.
    • "Known to the JVM" means: A class is on the classpath (a variable, pointing to all directories scanned by the JVM for classes.)

Classpath & imports

You can only import what's on the classpath. If you found a useful library on the internet, you cannot import it without first downloading and adding it to the classpath. The JVM cannot scan the internet for your !

Software releases

In most cases, your client is less interested in your source code, as in a single executable file. To them, it does not matter how your program works, it only matters they can conveniently run it.

  • Not convenient:

    • Having to install the Java development kit.
    • Having to install an IDE.
    • Having to compile the sources themselves.
    • Having to remember a command to run your code.
  • Convenient:

    • A single file the client can just double-click, and your beautiful program starts.

Java has a dedicated file format, just for that: JAR files, or "Java ARchive" files.

JAR files

  • JARs are zip files.
  • They can contain whatever you put inside. To build a release, you should add:
    • All bytecode: (*class files, CAFEBABE...)
    • A manifest, hinting (amongst others) which of the classes serves as launcher.
  • If you add the above, a JVM can readily consume (interpret and run) the JAR file.
    • Your client only needs a JVM, not a full Java development environment.
    • Since the JAR contains bytecode, it will run on any platform.

Creating a JAR release

  • Let's start with a simple program:

    public class HelloWorld {
      public static void main(String[] args) {
        System.out.println("Bonjour, INF2050!");
      }
    }
    

  • Creating a JAR file from sources is relatively simple:

    # Compile all java files to *.class files, place them in a new build directory
    javac -d ./build *java
    
    # Enter build directory
    cd build
    
    # Create a java archive (JAR) file, using all *.class files.
    # Add MANIFEST.MF pointing to HelloWorld as launcher class.
    jar cfe MyDeliverable.jar HelloWorld *class
    

  • This produces a JAR file: MyDeliverable.jar

    • Content:

      MyDeliverable.jar
       ├── HelloWorld.class
       └── META-INF
           └── MANIFEST.MF
      

    • With MANIFEST.MF content:

      Manifest-Version: 1.0
      Created-By: 22.0.2 (Oracle Corporation)
      Main-Class: HelloWorld
      

  • The JAR file can be directly executed, using the JVM: java -jar MyDeliverable.jar

JARs for other purposes

  • JAR files are not necessarily self-sustained applications.
  • Since the JAR format serves only as zip container, you can also wrap up code as a library. That means:
    • The JAR is not a program, only a bundle of useful functionality, or interfaces.
    • The JAR does not need to specify a launcher class in its manifest, because it is not intended to be launched.
    • The JAR can optionally also contain documentation and source files, so whoever uses the JAR can understand how it internally works.

In general:

  • When you build a release for a client, only behaviour matters. You only add bytecode and a manifest specifying the launcher.
  • When you build a library for other developers, behaviour and internal functioning matters. You also add source-code and documentation.

Recap

JAR files are just containers. What to include depends on the intended audience.

Software releases with Maven

We could argue, that manually selecting files and wrapping them up to a JAR, is somewhat inconvenient.

Ideally:

  • Creating a new release is fast and convenient
  • Creating a new release is reliable

This is one of the key motivations for build systems: Getting fast, convenient and reliable from project sources, to something that can be delivered to the client.

In the following we'll take a look at how the build process is realized, using the build tool "Maven".

Maven project layout

Before we start, we have to respect a few constraints imposed by maven:

  • Maven projects stipulate a specific internal structure, which is slightly different from standard java projects.
  • Luckily, we do not need create the initial project structure manually, but can use maven to initialize our projects:
  mvn archetype:generate \
  -DgroupId=ca.uqam.info \
  -DartifactId=MavenHelloWorld \
  -DarchetypeArtifactId=maven-archetype-quickstart \
  -DinteractiveMode=false

Note: Some systems (windows) cannot handle multi-line commands. Remove the \ and place everything in a single line.

Let's take apart the above command:

  • archetype translates to "we want to use a project template"
    • There are different archetypes, for different purposes. E.g. for a webapp, or server backend we would have used a different archetypeArtifactId.
  • Similar to any dependencies you might need, your own software should have a unique identifier. Other developers might actually end up using your software as a library!
    • groupId represents an organization specific string, usually this is just the revered domain name of the company you are working for. Since we are all at UQAM's computer science department we use ca.uqam.info
    • artifactId stands for the software you are building. It should be a descriptive name, indicating what your software does.

Once executed the above command will have created the following folder and file structure:

MavenHelloWorld/
├── pom.xml
└── src
    ├── main
    │   └── java
    │        └── ca
    │            └── uqam
    │                └── info
    │                    └── App.java
    └── test
        └── java
            └── ca
                └── uqam
                    └── info
                        └── AppTest.java

12 directories, 3 files

For now, we are only interested in the pom.xml and the initial class file App.java. We will deal with tests in a later lecture.

Initial App class

The initial pom file is just a stub HelloWorld class:

package ca.uqam.info;

/**
 * Hello world!
 *
 */
public class App {
  public static void main(String[] args) {
    System.out.println("Hello World!");
  }
}

Package structures

Notice how the initial groupId argument has affected to project's package naming and internal folder structure ?

Initial pom file

The initial pom file looks, as created by the as follows:

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>ca.uqam.info</groupId>
    <artifactId>MavenHelloWorld</artifactId>
    <packaging>jar</packaging>
    <version>1.0-SNAPSHOT</version>
    <name>MavenHelloWorld</name>
    <url>http://maven.apache.org</url>

    <dependencies>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
            <scope>test</scope>
        </dependency>
    </dependencies>
</project>

We already see a first dependency entry, namely for junit.

  • In the spirit of good software development, maven assumed that we will test our software.
  • However, junit is not part of standard java. Hence, we need a dependency block.

Anything peculiar about the dependency block ?

The junit dependency block actually has an additional <scope>test<scope> entry. This is because maven makes a distinction between dependencies needed to build a software, VS dependencies needed to run a software. Junit is not needed at runtime, therefore maven added an additional test scope tag.

Building with maven

Let's use maven to build the project, that is, create java bytecode. The corresponding command is mvn package.

  • The first time you run mvn package, we'll actually see how maven downloads junit.

    • There will be some logging messages:
      ...
      Downloading from central: 
        https://repo.maven.apache.org/maven2/org/apache
        /maven/surefire/common-java5/3.2.5/common-java5-3.2.5.pom
      Downloaded from central: 
        https://repo.maven.apache.org/maven2/org/apache
        /maven/surefire/common-java5/3.2.5/common-java5-3.2.5.pom
      (2.8 kB at 156 kB/s)
      ...
      
  • Once the command is finished, we'll find a new directory target, with the following content:

    target/
    ├── MavenHelloWorld-1.0-SNAPSHOT.jar
    ├── classes
    │        └── ca
    │              └── uqam
    │                     └── info
    │                            └── App.class
    ...
    
    21 directories, 10 files
    

  • Among others, this is exactly the same outcome as we could have created manually, using the java compiler:

    • A jar file
    • Class files for our source code

package produces a build

While the initial setup was a bit tedious, a project only needs to be configured once. From here on we can conveniently produce new builds (JARs) with the corresponding maven command mvn clean package.

Running Maven artifacts

Running the generated artifacts is almost identical to running manually created binaries.

Class files

We can without issues run the generated class files. Note however, that we must be at the package structure's root to call our program:

  • Calling App.class program from wrong location:

    $ cd target/classes/ca/uqam/info; java App
    Error: Could not find or load main class App
    Caused by: java.lang.NoClassDefFoundError: App
    (wrong name: ca/uqam/info/App)
    

  • Calling App.class program from package root location:

    $ cd target/classes/
    $ tree
    .
    └── ca
        └── uqam
            └── info
                └── App.class
    $ java ca/uqam/info/App
    Hello World!
    

Jar files

Running the jar file is not possible without specifying the main class, as by default the manifest does not contain a reference to launcher class.

  • Trying to run jar file without arguments:

    $ cd target; java -jar MavenHelloWorld-1.0-SNAPSHOT.jar
    no main manifest attribute, in MavenHelloWorld-1.0-SNAPSHOT.jar
    

  • When we inspect the jar internal MANIFEST file, we see there is no launcher specified:

    Manifest-Version: 1.0
    Created-By: Maven JAR Plugin 3.4.1
    Build-Jdk-Spec: 22
    

  • Running jar file with custom main class as classpath argument:

    $ java -cp MavenHelloWorld-1.0-SNAPSHOT.jar ca.uqam.info.App
    Hello World!
    

Note: Maven of course offers a way to integrate a working MANIFEST into the produced jar file. More on that in a bit.

A clean build

The target directory accumulates all artifacts ever built. If you modify your code or pom.xml and re-build, new files might be added and it can be confusing to distinguish between old and new files. A good trick is to always use the clean argument before building, which wipes the entire target directory: Build your project systematically with **mvn clean package**

Dependencies

Most of the time you do not want to program everything form scratch ( See previous lecture on reuse-oriented development)

JSON example

  • We will now look at how compiling and execution changes when additional libraries are involved.
  • Imagine we want to serialize (create a machine-readable string representation) of a java object:

    class Student {
      private final int age;
      private final String firstName;
      private final String lastName;
    
      public Student(int age, String firstName, String lastName) {
        this.age = age;
        this.firstName = firstName;
        this.lastName = lastName;
      }
    
      //... and getters
    }
    

  • A student object, as created by new Student(34, "Maximilian", "Schiedermeier") should be serialized to:

    {
      "age": 34,
      "firstName": "Maximilian",
      "lastName": "Schiedermeier"
    }
    

Manual string creation

  • Of course, I could manually construct a JSON String:

        // Create student
        Student myStudent = new Student(34, "Maximilian", "Schiedermeier");
    
        // Export student
        String jsonString =
            "{\n"
                + "\t\"age\": " + myStudent.getAge()
                + ",\n\t\"firstName\": \"" + myStudent.getFirstName()
                + "\", \n\t\"lastName\": \"" + myStudent.getLastName()
                + "\"\n}";
        System.out.println(jsonString);
    

  • But what if I need to export another object ? What if object structure changes ?

Using a library

Conversion to JSON is a classic problem, which has been solved many times:

  • Why reinvent the wheel when I can just reuse existing code.
  • Maybe I can find something on the internet that solves my issue ?

Indeed, a convenient way to simplify the code would be to reuse an existing Google GSON library:

import com.google.gson.Gson;

class MainWithGson {

  public static void main(String[] args) {

    // Create student
    Student myStudent = new Student(34, "Maximilian", "Schiedermeier");

    // Export student
    String jsonString = new Gson().toJson(myStudent);
    System.out.println(jsonString);
  }
}

  • However, we are now using code that is not ours, and the compiler, as well as the JDK need to know about this dependency.
    • Download Gson library JAR file:
    • This time we compile with the -cp (classpath) argument, telling the compiler that there are additional classes to consider. javac -cp gson-2.11.0.jar *java
    • Same, when running the compiled bytecode, the JVM must know about the GSON library: java -cp gson-2.11.0.jar:. MainWithGson
What could possibly go wrong?

By re-using the Google GSON library we have created a "dependency". Without that library at hand, our code can be neither compiled, nor executed.

Dependency management

Dependency management aims to simplify the above procedure, by specifying which dependencies exist (and where to get them), instead of manually managing JAR files.

In essence, the ingredients for any dependency management tool are:

  • An online repository, systematically archiving all versions of all libraries
  • A local configuration file, describing for every dependency:
    • A unique identifier, e.g. "Google GSON library"
    • The specific version, e.g. "2.11.0"

Advantages:

  • Configuration files are textual and lightweight. They can be stored in the project itself.
  • Configuration files are written in a machine-interpretable syntax. A tool can collect all dependencies for you and even modify the classpath when needed.
  • You have a clear trace of all exact dependency versions. You can easily scan your project for security vulnerabilities.
  • No damage is done if you lose a library JAR, you can easily retrieve it again from the repository.

Maven

Maven is a build system for Java that offers exactly these two components:

  • A central repository, with almost every java library ever created: mavencentral.org
  • A project configuration file that (among others) lists all project dependencies: pom.xml
    • POM stands for "Project Object Model"
    • XML is a machine-readable file format
    • A dependency is stated as:
      <dependency>
          <groupId>com.google.code.gson</groupId>
          <artifactId>gson</artifactId>
          <version>2.11.0</version>
      </dependency>
      

Instead of ourselves downloading JAR files and placing them on the classpath, we ask maven to ensure all listed dependencies are in place.

Never ever

Never ever manually interfere with dependency management in maven-ready project. If you need an additional library, edit the pom.xml, but never-ever drag-and-drop a JAR file into your project, or edit the classpath.

Repositories

The local repository:

  • Maven also maintains a local repository on your computer, the ~/.m2 directory. Every library you ever used is cached in this directory.
  • The local repository has two purposes:
    • Performance: It is faster to reuse a cached JAR file, than to download it from the internet every time
    • Offline mode: You might not be online all the time. With the dependencies cached, you can develop without an internet connection

Third party repositories:

  • You might encounter situations where you need a library that is not in the official maven central repository.
  • Examples:
    • Libraries that are not free to use, and therefore not publicly accessible
    • Your own libraries, that you do not want to upload
  • Anyone can set up their own repository
    • An online repository is just a few files accessible over an HTTP webserver
    • However, by default maven does not know about third-party repositories. If you want maven to search your own repository, you need to edit the pom.xml file and indicate the location of your third party repository..

Mavens dependency resolve algorithm

To build a project, maven tries to satisfy all dependencies with corresponding artifacts (the JAR files, and some metadata). To satisfy a dependency, maven will:

  1. First check the local .m2 repository for a cached file.
  2. If not cached, it will check if any thrid-party repo is defined. (Usually there are none defined)
  3. Contact the official maven repository servers to retrieve the needed artifact
flowchart LR
    resolve[\Resolve depdendency/]
    resolve --> localcheck{Artifact in local repo ?}
    localcheck -.  yes .-> done([Success])
    localcheck ==>|no| remotecheck{3rd party repo defined ?}
    remotecheck -.  yes .-> 3rdpartycheck{Artifact in 3rd party ?}
    3rdpartycheck -.  yes .-> done
    3rdpartycheck -.  no .-> centralcheck{Artifact in central ?}
    remotecheck ==>|no| centralcheck
    centralcheck ==>|yes| done
    centralcheck -.  no .-> fail([Fail])
What happens when a project is built for the second time ?

Maven will already have all dependencies cached. It will take the topmost path.

Compile time vs runtime

By default, maven incorporates dependencies only at compile time, that is, when we cannot run the produced JAR, without all dependencies manually provided as classpath arguments.

In a later lecture, we'll learn how to configure maven to produce a self-contained JAR, which can be used as-is.

The problem with JARs

JARs are a straightforward way to pass around functionality, but as projects grow, several issues tend to persist:

  • The more dependencies you have, the more JARs you carry with you.
    • Where to store the JARs? In the repo? What if you need the same JAR in multiple projects, do you store them twice?
    • Everytime a new developer joins the project you need to pass on all the JARs and have them manually extend their classpath.
    • Just compiling your project becomes somewhat tedious, because you always have to check a long list of dependencies are correctly installed.
    • The client complains that your software is not running. Most likely they overlooked to install a JAR, or installed the wrong version. How do you find out which one it is?
  • A JAR is a snapshot, it is one fixed version.
    • What if a security vulnerability was found in a JAR you've downloaded. How would you know?
    • You lost a JAR that you need to build your project, where do you find it again? Which version was it again that works with your project?

A true horror story

In a previous research lab we had a software that was particularly hard to work with. Before a developer could even write a single line of code, they needed to spend at least 30 minutes to 1 hour of manual project configuration. The project had even JARs where no-one knew where exactly they came from, whether they were still needed, or what exactly they were contributing. There was some rumor of some intern who once was around 3 years ago, who had created the JARs. But the intern was long gone and no one had contact information. At the same time these were fat software artefacts that bloated up our software executable.
Countless developer hours were wasted, because of poor dependency management.

Maven plugins

Apart from downloading and caching dependencies, for usage in the local classpath, maven also has a second purpose: Modifying the build pipeline.

  • By default, all that happens on mvn clean package is the standard compiling of source files (using any specified libraries for the process).
  • But most of the time you want to do more, e.g. produce a human-readable documentation, run tests, or create a build artifact with all dependencies included.
  • Maven's comportment regarding the build-pipeline can be modified with plugins.

A plugin is a short (or sometimes not so short) snippet in a dedicated plugins section of the pom.xml. There can be as many plugins as you want in the pom.xml:

<project>
    <build>
        <plugins>
            <!-- First plugin details -->
            <plugin>
                ...
            </plugin>
            <!-- Second plugin details -->
            <plugin>
                ...
            </plugin>
            ...
        </plugins>
    </build>
</project>
  • Every plugin has a default location in the build pipeline, because most tasks make only sense at a given moment of the process.
  • Example: building a jar with all dependencies inside should happen at the end, after all classes are compiled, all tests have passed etc.

We'll look at how plugins work in more detail, and maven's understanding of plugins variation points in the build process in a future lecture, for now we'll look at some short useful plugin examples.

Exec

The exec plugin lets you specify a main class for your code, that should be called by default when the code is executed.

  • This is closest to the infamous green triangle ("")
  • All you need to do is point to the main class to be called on execution:
<!-- Specify main class for exec goal -->
<plugin>
    <groupId>org.codehaus.mojo</groupId>
    <artifactId>exec-maven-plugin</artifactId>
    <version>1.6.0</version>
    <executions>
        <execution>
            <goals>
                <goal>java</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <mainClass>full.package.name.YourMainClassLauncher</mainClass>
    </configuration>
</plugin>

Once the plugin defined, you can conveniently run your program with: mvn clean compile exec:java

Add an IDE maven run configuration

Once the exec plugin defined in your pom.xml, modify the IDE's "Run Configuration" (a.k.a. what is called when the green triangle is clicked) to simply call maven's exec plugin!

Maven Jar

The Maven jar plugin allows you to add additional information when your program is packaged into a JAR.

  • Previously we've seen that a maven produced JAR cannot be launched, without explicitly stating the main class
  • The maven-jar-plugin allows you to provide a default information, on which main class should be listed in the JAR's manifest.
<!-- specify main class for JAR manifest-->
<plugin>
    <artifactId>maven-assembly-plugin</artifactId>
    <executions>
        <execution>
            <phase>package</phase>
            <goals>
                <goal>single</goal>
            </goals>
        </execution>
    </executions>
    <configuration>
        <archive>
            <manifest>
                <addClasspath>true</addClasspath>
                <mainClass>ca.uqam.info.MainWithGson</mainClass>
            </manifest>
        </archive>
        <descriptorRefs>
            <descriptorRef>jar-with-dependencies</descriptorRef>
        </descriptorRefs>
        <finalName>MainWithGson</finalName>
        <appendAssemblyId>true</appendAssemblyId>
    </configuration>
</plugin>

JavaDoc

In the second lab session you've learned a command to manually extract all JavaDoc information from your code, to generate a human-readable website. The JavaDoc plugin lets you automatize this step, as standard component of the build process.

  • Enabling the JavaDoc plugin is also a good practice, as you directly see whether there are issues in your code style, whenever you compile your code.
  • Ideally the plugin is configured to fail on warnings, so no developer is ever tempted to work with or produce undocumented code
    • "I'll document that later", easily turns into "I'll document that never."
<!-- Plugin to ensure all functions are commented and generate javadoc -->
<plugin>
    <groupId>org.apache.maven.plugins</groupId>
    <artifactId>maven-javadoc-plugin</artifactId>
    <version>3.4.1</version>
    <configuration>
        <javadocExecutable>${java.home}/bin/javadoc</javadocExecutable>
        <reportOutputDirectory>${project.reporting.outputDirectory}/docs
        </reportOutputDirectory>
        <failOnWarnings>true</failOnWarnings>
        <quiet>true</quiet>
    </configuration>
    <executions>
        <execution>
            <id>attach-javadocs</id>
            <goals>
                <goal>jar</goal>
            </goals>
        </execution>
    </executions>
</plugin>

Use a snippet library

Most developers do not manually create their pom.xml line by line, but stitch it together from prepared blocks. Use a snippet library, e.g. https://m5c.github.io/MavenSnippetLibrary/ to rapidly create a working build pipeline.

Literature

Inspiration and further reads for the curious minds: