Advanced tools

In this final lecture we'll be looking at advanced software engineering tools, to analyze and improve runtime performance, reverse-engineer byte-code, and protect byte-code against reverse-engineering.

Profilers

Optimizing your code for performance is a noble goal, however...

Danger

Premature optimization is the root of all evil.
-- Sir Tony Hoare / Donald Knuth

What is the gist of this statement ?

Optimizing "by sight" is a bad idea:
- Bottlenecks are not where you think they are:
  - Code can look complex, but not amount for much resource consumption. For example a complex algorithm that is seldomly called is not relevant.
  - A statement can look benign, but excessive invocation can make it a primary resource consumer: Sysouts, IO operations, Hash functions, Object creation.
- What you see is not what you execute:
  - The compiler does all sorts of "behind the scenes" optimization. The code you write is not necessarily the exact same code when executed.
  - A supposed manual optimization may introduce no performance gain at all.
- If performance optimizations conflict code readability, it is usually readability that's more relevant for project success: A program that's faster but impossible to understand has less value, than a program than is slower but maintainable.

Effective optimizations

If you want to optimize, use an analytic tools. Never just change your code without empiric measurements.

Resource consumptions

Bottlenecks can stem from various resource shortcomings:

Computational bottleneck:
- A faster CPU is expensive, but provides only incremental speedup.
- More efficient code is potentially cheap, but can amount for tremendous speedup: Example: Switching from bubble-sort to quicksort.
Memory bottleneck:
- More physical RAM or storage is expensive.
- Switching for an improved data structure is potentially cheap.
Network bottleneck:
- A faster internet connection / networking switches are expensive.
- A more lightweight communication protocol is potentially cheap.

No bottleneck = no need for improvement

Quick reminder that before you improve your code, you actually have to identify the bottleneck. Otherwise, there is no performance gain at all, just more complex code.

Examples, where resource consumption is not an issue:

You can buy the same personal computer with more RAM for 300$ extra, but for 98% of your daily routine there will be no difference.
Playing Halma with MadMax AIs on a small board layout. Takes about only a handful of iterations to finish, less than a second of CPU time.
HTTP/REST is not the most efficient network protocol, but it is polyglot and easy to learn. 85% of all cloud services use HTTP/REST.

Profiler basics

Profilers are like flight-recorders. They track program behaviour at runtime and record what's going on inside the JVM.

Which other tool allows program analysis, based on runtime observations ?

Debuggers. Debuggers and profilers are both tools to assess program characteristics at runtime. They are dynamic analysis tools. However, debuggers interfere with execution (breakpoints, changing variable value), and profilers do not. The interest of profilers is that of a silent bystander who observes execution and collects whatever is needed for performance analysis.

Technically, profilers still interfere with execution:
- It is impossible to observe without resource consumption, so naturally just be analyzing you already interfere with program execution.
- Still, profilers are built to have a minimal footprint. The bias added by professional profilers is negligible.

In the remainder of this course we'll work with IntelliJ's built-in CPU-profiler. Alternatives for further metrics are YourKit (Commercial, no free education license) and the JVM's built-in HPROF profiler.

Sampling

Measuring which part of your source code causes which percentage of CPU consumption is not trivial:
- Just watching th CPU is not enough: The OS scheduler rapidly switches between OS tasks, your program, and potentially other tasks.
- Printing timestamps between method calls is also a bad idea: Slows down program execution and produces tons of output.
Most CPU profiles work by sampling the call stack:
- Call stack definition: list of unfinished method calls that led to where you currently are in the code execution.
- Every x milliseconds, take a snapshot of the entire call stack.
  - The top method is the one currently consuming CPU power.
  - All methods below are just what led to the top method, they do not currently consume power.
- The more often a method appears at the top, the more CPU consumption we can assess for that method.

Sampling frequency is a tradeoff

Since sampling takes place at given intervals, it's all about finding the right frequency. If too fine grained we might interfere with execution, if too course grained we might miss shorter method calls and our results are less precise.

Sampling example

Assume we have the following functions in our program:
- main()
- foo()
- bar()
- f1()
- f2()
- f0()
- b0()
- b1()
We sample the callstack at runtime, after 10, 20, 30, 40 and 50 milliseconds:

10ms	20ms	30ms	40ms	50ms
`f0`		`b0`		`b0`
`f1`	`f2`	`b2`	`f1`	`b2`
`foo`	`foo`	`bar`	`foo`	`bar`
`main`	`main`	`main`	`main`	`main`

Who's consuming the most CPU cycles

b0. Although main has most occurrences in the listings, we are only interested in the topmost functions (per stack), to track CPU consumption.

Flame graphs

Longer listings of sampled callstacks can be hard to analyze.
- One way to get around it is a visual representation called Flame Graph.
- Flame graphs are timed, sorted and merged variant of the original listing.

Tint

Human patter recognition works a lot better with colours than with text.
The previous sample listing can be tinted to:

Unfortunately for longer listings, this does not really help us spot the most consuming method.
For thousands of samples, we would just see colourful noise, it would be hard to "see" anything useful.

Order

Ordering serves the purpose of grouping the same methods together.
However, we do not want to break individual call stacks.
- We only sort column order, not what's going on within any column
- This translates to: We ignore the order of collected samples, but we leave each sample intact.
The algorithm:
- Sort alphabetically
- Sort bottom to top

Merge

Finally, the blocks (representing our functions) still all have the same size.
If we merge neighboured identical functions, we can get a sense for the CPU consumption relevance.
Additionally, since we're only really interested in the "surface" of our flame graph, we can add a black bar to the topmost fraction.

Important: consumption reports are cumulative !
- A function f can be called by multiple parent functions.
- In the flame graph, such function f is not merged to a single block (because sorting preserves the individual stacks).
- Hovering to a specific function still indicate the fraction of CPU cycles for a given function, no matter who's the caller.

In the flame graph, b0 comes before f0. What does that mean ?

Absolutely nothing. We ordered the columns alphabetically. The resulting representation tells us only something about proportional CPU consumption, bot nothing about the chronological method call order.

Post profiling

Just because a function takes up most CPU cycles, does not mean it can be optimized.
The flame graph only tells us where optimizations would have most impact. It does not tell us whether optimizations are possible.
- It's probably still a good idea to focus our attention on the methods suggested by the flame graph.
- In reverse, any effort to optimize functions not exposed on the flame graph surface, is probably wasted.

Profiling with IntelliJ

IntelliJ has a series of built-in profiling tools, among others a flame graph CPU time visualization.

Starting the profiler

Since profilers are dynamic analysis tools, we first need to execute something.
This can be our main program, or a test.
- Profiling very short executions can be tricky, as the execution might not produce enough samples. The flame graph can be confusing for very short executions, as there are always some boilerplate java internal methods, which tend to clutter the output for short executions.
- Integration tests are in general better than unit tests. Competing IAs in a game are excellent, as long as the IAs are not too sophisticated and do not amount for significant resource consumption themselves.

Reading the IntelliJ flame graph

The colouring in the IntelliJ flame graph is a bit different:

Dark violet: Native code (JVM stuff, nothing we can do about it.)
Bright violet: Library code (Library calls, nothing we can do about it, except for calling less often.)
Orange: Our own methods (Here we can optimize.)

Expand JUnit runner, when invoking code via tests

When profiling unit tests, the profiler tends to conceal details within a frame JunitCore.run. Expand the frame and scroll all to the top, to see where your own code consumes CPU cycles.

The remaining visualization is just like before:

Wider bar: many merges of the same method. Method appears in many sampled stacks.
Top surface: the actual CPU consumption, on one specific method without further sub-calls.

Additional information is available on hovering, notably the cumulative consumption of a method, across the entire flame graph (added consumption of same method, different callers)

The top boxes are purple, does that mean I cannot optimize my code ?

If the top boxes are purple, this means most time is consumed by java-internal methods. While they are not your code, and while you cannot optimize these methods, you can investigate if all these method calls make sense, or if they can be replaced by more efficient method calls.

Back to the code

Ultimately, we're interested in code not in graphs.
- We want a convenient way to spot performance issues in our code.
- Luckily IntelliJ does a great job at linking flame graph results with lines of code for us.
Once we have run the profiler, new insights appear in the code listing:

Why no maven plugin ?

So far we've pretty consistently added plugins not only to the IDE, but also to the maven build pipeline. However, profilers are commonly not integrated into the build pipeline. That's because the profiler insights are only indicators where code optimizations make most of a difference. This metric is useful for developers, but not for project maintenance. Profilers therefore are rarely found in the build or CI pipelines.

Decompilers

Since the start of this course we've always zoomed in on the build process, i.e. getting from source code to a deliverable product.
At times, you may seek to do the inverse: Take a peek into a product and figure out how it's internally working.
Going back from bytecode to sourcecode is called "decompiling".

Have you already used a decompiler in this course ?

Yes ! When debugging the provided unit-tests, IntelliJ reverted from the test byte-code (downloaded by maven) to the test source-code. As you advanced through lines of test code, you were implicitly using IntelliJ's decompiler.

Decompiling in action

Next we'll compile, then decompile a little HelloWorld class and try to spot the effects. For this demonstration, we'll use the CFR (Class File Reader).

class HelloWorld {
  public static void main(String[] args) {
    System.out.println("Hello, INF2050!");
  }
}

Step by step:

Like always we start with compiling the source code into java bytecode: javac HelloWorld.java

We'll obtain human-unreadable bytecode HelloWorld.class:

$ xxd -u -p HelloWorld.class | sed 's/..../& /g'
CAFE BABE 0000 0042 001D 0A00 0200 0307 0004 0C00 0500 0601 0010 6A61 7661
2F6C 616E 672F 4F62 6A65 6374 0100 063C 696E 6974 3E01 0003 2829 5609 0008
0009 0700 0A0C 000B 000C 0100 106A 6176 612F 6C61 6E67 2F53 7973 7465 6D01
0003 6F75 7401 0015 4C6A 6176 612F 696F 2F50 7269 6E74 5374 7265 616D 3B08
000E 0100 0F48 656C 6C6F 2C20 494E 4632 3035 3021 0A00 1000 1107 0012 0C00
1300 1401 0013 6A61 7661 2F69 6F2F 5072 696E 7453 7472 6561 6D01 0007 7072
696E 746C 6E01 0015 284C 6A61 7661 2F6C 616E 672F 5374 7269 6E67 3B29 5607
0016 0100 0A48 656C 6C6F 576F 726C 6401 0004 436F 6465 0100 0F4C 696E 654E
756D 6265 7254 6162 6C65 0100 046D 6169 6E01 0016 285B 4C6A 6176 612F 6C61
6E67 2F53 7472 696E 673B 2956 0100 0A53 6F75 7263 6546 696C 6501 000F 4865
6C6C 6F57 6F72 6C64 2E6A 6176 6100 2000 1500 0200 0000 0000 0200 0000 0500
0600 0100 1700 0000 1D00 0100 0100 0000 052A B700 01B1 0000 0001 0018 0000
0006 0001 0000 0001 0009 0019 001A 0001 0017 0000 0025 0002 0001 0000 0009
B200 0712 0DB6 000F B100 0000 0100 1800 0000 0A00 0200 0000 0700 0800 0800
0100 1B00 0000 0200 1C

Finally, we use CFR to convert back from byte-code to source-code:

wget https://www.benf.org/other/cfr/cfr-0.152.jar
java -jar cfr-0.152.jar --outputdir outdir HelloWorld

This produces a new file HelloWorld.java. We've made it back to the source-code.

Why does the byte-code start with CAFE BABE again ?

The above is just a representation in hexadecimal encoding. The more natural way to represent bytecode would be binary, i.e. only 0s and 1s. We can do so with xxd -b HelloWorld.class | cut -d\ -f2-7, but it would be way more verbous. Switching to hexadecimal is convenient, because it regroups always 4 bits to a hexadecimal value. The first 8 values in hexadecimal are just an easter egg, the same 32 bits are added to any java bytecode.

A deep comparison

If we take a closer look at the decompiled code, we'll see the equivalence is only semantic, i.e. the decompiled code is equivalent, but not identical to the original code.

$ icdiff HelloWorld.java outdir/HelloWorld.java
HelloWorld.java                                        outdir/HelloWorld.java
                                                       /*
                                                        * Decompiled with CFR 0.152.
                                                        */
class HelloWorld {                                     class HelloWorld {
                                                           HelloWorld() {
                                                           }

    /**
     * Our HelloWorld program.
     */
    public static void main(String[] args) {               public static void main(String[] stringArray) {
        System.out.println("Hello, INF2050!");                 System.out.println("Hello, INF2050!");
    }                                                      }
}                                                      }

Can you spot all differences ?

New class comment, new default constructor, original main method comment removed, main method argument renamed.

Another decompiler

Similar results are obtained with the Vineflower decompiler:

wget https://github.com/Vineflower/vineflower/releases/download/1.11.1/vineflower-1.11.1.jar
mkdir outdir2
java -jar vineflower-1.11.1.jar HelloWorld.class outdir2
cat outdir2/HelloWorld.java

Produces the following source-code:

class HelloWorld {
  public static void main(String[] var0) {
    System.out.println("Hello, INF2050!");
  }
}

Lost information

Why would the decompiler not just produce the exact original source code ? Because it cannot !

Compilers perform a lot of implicit modifications (and optimizations).
- Adding default constructors. (Required for class creation)
- Wiping comments. (Irrelevant for program behaviour)
- Renaming variables. (Irrelevant for program behaviour)
- ...
Even if the decompiler is 100% accurate, going back from byte-code to source-code cannot undo modifications made by the compiler.
However, if we really want to support decompiling, we can advise the compiler to stay closer to the original code ( e.g. when we want our code to be easily debuggable by library users.)

Why not just always keep comments and original names ?

It would make our final product more voluminous. Internal details are usually not relevant to a library or product user and can be erased to go easier on disk use.

Obfuscaters

While there are good reasons to instruct the compiler to integrate as much source-code information as possible in the byte-code, there are also use cases where you want the produced code as ominous and hard to reverse-engineer as possible.
This is notably the case for commercial products. Imagine you just spent years and millions into the development of a product - the last thing you want is your competitor to simply decompile your code and steal your intellectual property.
In that case, we can artificially extend the natural modifications made by the compiler, based on two premises:
- The code must still be functional, and just as performant as before.
- The code must be useless, when decompiled.

But how do you make code useless for anyone else ? By making it hard to read !

ProGuard Obfuscating example

Obfuscators like ProGuard work on bytecode, i.e. they take the modifications previously done by a compiler further. However, their goal is not optimize, but to protect.

In this example we'll obfuscate a halma jar. We begin by building our halma application to obtain a jar (mvn clean package).

Next, download the ProGuard obfuscator:

wget https://github.com/Guardsquare/proguard/releases/download/v7.7/proguard-7.7.0.zip
unzip proguard-7.7.0.zip

Create a configuration file: proguard.pro

-injars /Users/schieder/Desktop/build/HelloWorld.jar
-outjars output.jar

-libraryjars /Users/schieder/.sdkman/candidates/java/current/jmods/java.base.jmod

# Keep main method (Note: it MUST be public)
-keep public class HelloWorld {
    public static void main(java.lang.String[]);
}

-dontoptimize

# Optional: Allow optimization
-optimizations !code/simplification/arithmetic

Run the obfuscator (from the unzipped bin directory), using the above configuration file:

cd proguard-7.7.0/bin
./proguard.sh @proguard.pro

This creates a new jar file: halma-obfuscated.jar. If we peek inside, we see:

ca
└── uqam
    └── info
        └── solanum
            ├── a
            │   └── a
            │       ├── a
            │       │   ├── a.class
            │       │   ├── b.class
            │       │   ├── c.class
            │       │   └── d.class
            │       ├── b
            │       │   ├── a.class
            │       │   ├── b.class
            │       │   ├── c.class
            │       │   ├── d.class
            │       │   ├── e.class
            │       │   ├── f.class
            │       │   └── g.class
            │       └── c
            │           ├── a.class
            │           ├── b.class
            │           ├── c.class
            │           └── d.class
            └── max
                └── halma
                    ├── a
                    │   └── a.class
                    ├── b
                    │   ├── a.class
                    │   ├── b.class
                    │   ├── c.class
                    │   ├── d.class
                    │   └── e.class
                    ├── c
                    │   ├── a.class
                    │   └── b.class
                    └── view
                        └── AdvancedConsoleLauncher.class

Next we can try to decompile a class:

$ cd Desktop
$ for i in $(find ca -name \*class ); do java -jar cfr-0.152.jar --outputdir outdir $i; done
Processing ca.uqam.info.solanum.a.a.a.b
Processing ca.uqam.info.solanum.a.a.a.d
Processing ca.uqam.info.solanum.a.a.a.a
Processing ca.uqam.info.solanum.a.a.a.c
Processing ca.uqam.info.solanum.a.a.c.b
...

Inspecting the outcome

When we attempt to inspect, e.g. a class e.java:

/*
 * Decompiled with CFR 0.152.
 */

package ca.uqam.info.solanum.max.halma.b;

import ca.uqam.info.solanum.a.a.b.f;
import ca.uqam.info.solanum.max.halma.b.a;
import ca.uqam.info.solanum.max.halma.b.b;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.Set;

public class e extends a {
  @Override
  protected int a(int n) {
    return n * 2 - 1;
  }

  @Override
  protected int b(int n) {
    return (n - 1) * 4 / 3 + 1;
  }

  @Override
  protected ca.uqam.info.solanum.a.a.b.b[] a(int n, int n2) {
    ArrayList<ca.uqam.info.solanum.a.a.b.b> arrayList = new ArrayList<ca.uqam.info.solanum.a.a.b.b>();
    for (int i = 0; i < n2; ++i) {
      for (int j = 0; j < n; ++j) {
        ca.uqam.info.solanum.a.a.b.b b2 = new ca.uqam.info.solanum.a.a.b.b(j, i);
        if (!this.a(b2, n, n2))
          continue;
        arrayList.add(b2);
      }
    }
    return arrayList.toArray(new ca.uqam.info.solanum.a.a.b.b[arrayList.size()]);
  }

What has changed ?

We don't know what this code was before, but it seems all classnames, method names and variable names have been renamed to a, b, c, ...

Testing functionality

Obfuscation must not break the functionality of our product. That is, we must still be able to play our game, using the obfuscated jar:

$ java -jar halma-obfuscated.jar 2 Max Ryan Quentin
y\x |  00  01  02  03  04  05  06  07  08
----+------------------------------------
00  |         [ ]             [1]
01  |             [ ]     [1]
02  |         [ ]     ( )     [1]
03  |             ( )     ( )
04  |         ( )     ( )     ( )
05  |     [0]     ( )     ( )     [ ]
06  | [0]     ( )     ( )     ( )     [ ]
07  |     [0]     ( )     ( )     [ ]
08  |         ( )     ( )     ( )
09  |             ( )     ( )
10  |         [ ]     ( )     [2]
11  |             [ ]     [2]
12  |         [ ]             [2]
It's Max's turn. Max, your options are:
Available moves:
    00: (01,05) -> (02,04)  01: (01,05) -> (02,06)  02: (00,06) => (02,04)
    03: (00,06) => (02,08)  04: (01,07) -> (02,06)  05: (01,07) -> (02,08)

Enter move ID:

Literature

Inspiration and further reads for the curious minds:

Brendan Gregg (inventor of flame graphs): On flame graphs
JetBrains: IntelliJ Profiler guide
Lee Benfield: Class File Loader Java Decompiler
Jasmine Karthikeyan: Vineflower Decompiler
Guardsquare: ProGuard on GitHub