A good build tool and process should not perform needless and redundant time-consuming work on unchanged sources each time the build runs. In other words, it should only do the work required to apply the results of the source modifications to the build artifacts. If your build tool or process doesn't exhibit this behavior, then think about optimizing your builds by avoiding the needless work.
For example, assume we have a bottom-up build process—our project is built from a persistency layer up to higher layers. Usually, these kinds of projects have long builds caused by the code-generation and reverse-engineering tools used in them. As illustrated in Figure 1, first, the build process runs a SQL script on a database management system (e.g., MySQL) to create the database and fill it with sample data. Then, the Middlegen task runs to generate CMP (container-managed persistence) entity beans. Then the XDoclet task runs to generate remote, local, and home interfaces, value objects, and deployment descriptors for the CMP entity beans generated in the previous step. (For more on XDoclet and Middlegen, see Resources.) Next, this generated source code plus developers' written Java source code are compiled, and these compiled class files plus other resources are packed in jar, war, and ear files.
Now assume that a developer working on this system's source code has slightly modified an if-else block and wants to see the results of his change. So he does a rebuild. A clean build erases everything generated by the previous builds and builds everything from scratch. For the developer, that means minutes of waiting and standing idle for a simple recompilation of the modified class and then an update of the relevant jar or war file.
In this article, I introduce some techniques to save you time by preventing redundant rebuilds of up-to-date sources and doing faster builds on outdated sources.
To begin our discussion, let's consider continuous integration, the strategy of ensuring that changes to the project's codebase are built, tested, and reported as soon as possible after they are introduced. This reduces the costs and time needed for development teams' integration sessions. This process requires:
- A source code control system like CVS so you have a central location for maintaining your code
- A fully automated build and test process (e.g., using Ant)
- An optional but highly recommended continuous integration automation tool like CruiseControl
Let's look at the overall picture (illustrated in Figure 2) of a development team utilizing this discipline and the activities developers do in such teams. Each developer has a local working copy of the project. Before starting the new task assigned to him/her, he updates his local copy with the changes committed to the codebase, accomplishes the newly assigned task (changes some portions of code, for example), updates his working directory with possible changes during his development session, does a build, executes the tests, and, eventually, if all tests execute successfully, commits the changes to the codebase.
Now the continuous integration automation tool polls for the changes to the codebase (CVS) and starts a new build to determine whether the changes were integrated successfully with the codebase. If they were not, a notification message is sent to that developer. He should then rollback the changes, fix the problem, and then commit the changes again.
In this article, my main focus in this scenario is when a developer works on his local working directory: He changes the project and wants to obtain feedback and see the results as fast as possible. Before ending with a task and committing it to the codebase, he wants to do a couple of fast incremental builds. By using techniques introduced in this article, he will quicken these builds and consequently save time.
|Quickening the build process on the build server has its own tools and techniques like clustering builds among multiple servers.|
Before we go any further, let's define some primitive terms and concepts important to our discussion:
- Full build (clean build): A build that performs everything from scratch and executes all steps of the build completely. It treats all resources available in the project as if they have never been seen before by the builder and have undergone no processing. It ignores the previous build efforts completely.
- Incremental build: An optimized build based on the changes incurred since the last build. It only does as much work as required to update the build artifacts with those changes.
- Dependency checking: The act of checking project source against relevant products produced by previous builds and identifying modified or new sources and other sources dependent on them. Thus, the build process only works on the sources indicated for rebuild to perform an incremental build.
Most of the time, dependency checking occurs based on the timestamp of source files and their relevant products. That is, the modification timestamp of the source file is compared to the modification timestamp of its relevant product. If the product is older than its source, then it is outdated and marked for a rebuild.
However, zip-based tasks (zip, jar, and others) do a better dependency check. If we set the
update parameter of these tasks to
yes, then the zip file is updated with the files specified as its inputs (if the zip file already exists). New files are added and out-of-date files are replaced with the new versions.
I categorize the concept of dependency checking and build optimization into two levels:
- Task level: For example, the
compiletask only compiles those sources and their dependent classes that have changed.
- Target level: Execution of unnecessary targets is skipped completely. This kind of optimization also removes the redundant work required to check all of that target's sources one by one for the changes.
The make build tool
Make files follow the concept of dependency rule. Make is a Unix utility intended to automate and optimize the construction of programs. The purpose of the make utility is to determine automatically which pieces of a large program need to be recompiled and then issue the commands to recompile them. To prepare to use make, you should write a file called the makefile that describes the relationships among files in your program; the commands for updating each file are described in the makefile. A makefile consists of rules. A rule explains how and when to remake certain files that are targets of a particular file. A rule consists of three parts: one or more targets, zero or more prerequisites, and zero or more commands.
The following is a snippet of a sample makefile:
Listing 1. Sample makefile
Prog1: main.o file1.o cc -o prog1 main.o file1.o main.o: main.c mydefs.h cc –c main c
Make reads the makefile in the current directory and begins by processing the first target. Make looks at each of the target dependencies (prerequisites) to see if they are also listed as targets. Make follows the chain of dependencies and walks down the recursion chain until it finds a target with up-to-date prerequisites, a target with no prerequisites, or a target whose prerequisites have no rules. Once it reaches the end of the chain it walks back down the recursion chain by executing the commands found in each target's rule.
Ant and make differ in the way they look at the build process. Make requires you to state resources and nonresources dependencies and the build commands required to transform them. Ant wants you to state build steps and the order between them (like an assembly line). The tasks themselves can do dependency checking or not, whereas make has an explicit dependency checking mechanism provided to the make builder by the user via the makefile file. Also versus Ant, make is not platform independent. Both views have their advantages and disadvantages; it would be nice to combine the ideas behind both of them.
Techniques and guidelines
After our review of concepts and definitions, now I want to propose some techniques and guidelines for quickening your builds and optimizing them for incremental builds. Please note that I briefly introduce most of these techniques; this article is only a starting point. For more information on the tools used, refer to Resources.
|Jonathon Rasmusson discusses long builds and techniques for troubleshooting them in his article, "Long Build Trouble Shooting Guide." In that article, he focuses on quickening and trouble shooting the automated test processes.|
Avoid unnecessary target executions
Make sure that a correct and logical dependency remains among your build targets, but prevents the execution of unnecessary targets when a chain of dependent targets are run. Omitting dependencies between targets to optimize the build is a bad practice because it forces programmers to remember to invoke a series of targets in a particular order to get a decent build (See "Top 15 Ant Best Practices by Eric M. Burke (ONJava.com, December 2003)). Let the build file remember the correct dependency and conduct an optimized build on its own.
Getting back to our sample bottom-up build process, every time we do a build, we don't need to run the SQL commands, Middlegen, XDoclet and so on. Nevertheless, we want to keep the correct dependencies between targets. However, in some cases, dependency checking itself takes a long time (e.g., checking entity beans against database schema). If possible, we want to skip these kinds of targets completely.
Here, I introduce a simple technique that you can use to skip unnecessary targets: Check the last execution timestamp of the target to be skipped against its depending targets' last execution timestamps to determine whether that target's output is up-to-date. For example, if Target A depends on Targets B and C, then if B or C have executed after A's last execution, then A should execute again to keep things consistent, otherwise skip A. This rule works only when all inputs of A are produced by B and C, and no manual modifications are completed on B or C's outputs between target executions.
To add this functionality, we use the Ant
touch task to either create a temporary file or update that temporary file's modification timestamp each time the unnecessary target or its depending targets execute. Then, before the unnecessary target's execution, we check the timestamp of the file created during the last target execution against the timestamps of the files created during the last executions of the depending targets via the
uptodate task. This task sets a
skip property, which we can use as the value of the unnecessary target's
unless property so that Ant skips that target.
Let's return to our example. Obviously, when our database schema changes, our sample build process must run the Middlegen target, since it generates entity beans from our database schema. On the other hand, we know that we apply changes to our database schema by modifying our SQL script file and running it against the database using the SQL target. To embed the logic required to skip the Middlegen target when the database schema has not changed in our build, we check the timestamp of the last execution of the SQL target against the timestamp of the last execution of the Middlegen target. If the SQL target's execution timestamp is not newer than the Middlegen target's execution timestamp, we can skip the Middlegen target.
Listing 2. Sample Ant file that skips unnecessary targets
<project name="sample-build" default="" basedir=".">
<target name="init-skip-properties" description="initializes the skip properties" depends="init"> <uptodate srcfile="create-database.timestamp" targetfile="middlegen.timestamp" property="middlegen.skip" value="true"/> </target>
<target name="create-database" description="runs sql script file on dbms to create db" depends="init-skip-properties">
<sql src=" MySQL.sql" ... />
<touch file="create-database.timestamp"/> </target>
<target name="middlegen" description="Runs Middlegen to create Entity Beans " depends="create-database" unless="middlegen.skip"> ...
<middlegen <cmp20 ... </cmp20>
In Listing 2, we complete a target-level optimization. The
uptodate Ant task sets a property if a target file or set of target files is more current than a source file or set of files. The
touch task changes the modification time of a file and possibly creates it at the same time.
In our example build process, we can do another optimization and automation on the
create-database target, which should run when the SQL script changes. Thus, we will check the timestamp of the SQL script file against the timestamp of the temporary file created in the SQL target's last execution.