A good build tool and process should not perform needless and redundant time-consuming work on unchanged sources each time the build runs. In other words, it should only do the work required to apply the results of the source modifications to the build artifacts. If your build tool or process doesn't exhibit this behavior, then think about optimizing your builds by avoiding the needless work.
For example, assume we have a bottom-up build process—our project is built from a persistency layer up to higher layers. Usually, these kinds of projects have long builds caused by the code-generation and reverse-engineering tools used in them. As illustrated in Figure 1, first, the build process runs a SQL script on a database management system (e.g., MySQL) to create the database and fill it with sample data. Then, the Middlegen task runs to generate CMP (container-managed persistence) entity beans. Then the XDoclet task runs to generate remote, local, and home interfaces, value objects, and deployment descriptors for the CMP entity beans generated in the previous step. (For more on XDoclet and Middlegen, see Resources.) Next, this generated source code plus developers' written Java source code are compiled, and these compiled class files plus other resources are packed in jar, war, and ear files.
Now assume that a developer working on this system's source code has slightly modified an if-else block and wants to see the results of his change. So he does a rebuild. A clean build erases everything generated by the previous builds and builds everything from scratch. For the developer, that means minutes of waiting and standing idle for a simple recompilation of the modified class and then an update of the relevant jar or war file.
In this article, I introduce some techniques to save you time by preventing redundant rebuilds of up-to-date sources and doing faster builds on outdated sources.
To begin our discussion, let's consider continuous integration, the strategy of ensuring that changes to the project's codebase are built, tested, and reported as soon as possible after they are introduced. This reduces the costs and time needed for development teams' integration sessions. This process requires:
- A source code control system like CVS so you have a central location for maintaining your code
- A fully automated build and test process (e.g., using Ant)
- An optional but highly recommended continuous integration automation tool like CruiseControl
Let's look at the overall picture (illustrated in Figure 2) of a development team utilizing this discipline and the activities developers do in such teams. Each developer has a local working copy of the project. Before starting the new task assigned to him/her, he updates his local copy with the changes committed to the codebase, accomplishes the newly assigned task (changes some portions of code, for example), updates his working directory with possible changes during his development session, does a build, executes the tests, and, eventually, if all tests execute successfully, commits the changes to the codebase.
Now the continuous integration automation tool polls for the changes to the codebase (CVS) and starts a new build to determine whether the changes were integrated successfully with the codebase. If they were not, a notification message is sent to that developer. He should then rollback the changes, fix the problem, and then commit the changes again.
In this article, my main focus in this scenario is when a developer works on his local working directory: He changes the project and wants to obtain feedback and see the results as fast as possible. Before ending with a task and committing it to the codebase, he wants to do a couple of fast incremental builds. By using techniques introduced in this article, he will quicken these builds and consequently save time.
|Quickening the build process on the build server has its own tools and techniques like clustering builds among multiple servers.|
Before we go any further, let's define some primitive terms and concepts important to our discussion:
- Full build (clean build): A build that performs everything from scratch and executes all steps of the build completely. It treats all resources available in the project as if they have never been seen before by the builder and have undergone no processing. It ignores the previous build efforts completely.
- Incremental build: An optimized build based on the changes incurred since the last build. It only does as much work as required to update the build artifacts with those changes.
- Dependency checking: The act of checking project source against relevant products produced by previous builds and identifying modified or new sources and other sources dependent on them. Thus, the build process only works on the sources indicated for rebuild to perform an incremental build.
Most of the time, dependency checking occurs based on the timestamp of source files and their relevant products. That is, the modification timestamp of the source file is compared to the modification timestamp of its relevant product. If the product is older than its source, then it is outdated and marked for a rebuild.
However, zip-based tasks (zip, jar, and others) do a better dependency check. If we set the
update parameter of these tasks to
yes, then the zip file is updated with the files specified as its inputs (if the zip file already exists). New files are added and out-of-date files are replaced with the new versions.
I categorize the concept of dependency checking and build optimization into two levels:
- Task level: For example, the
compiletask only compiles those sources and their dependent classes that have changed.
- Target level: Execution of unnecessary targets is skipped completely. This kind of optimization also removes the redundant work required to check all of that target's sources one by one for the changes.
The make build tool
Make files follow the concept of dependency rule. Make is a Unix utility intended to automate and optimize the construction of programs. The purpose of the make utility is to determine automatically which pieces of a large program need to be recompiled and then issue the commands to recompile them. To prepare to use make, you should write a file called the makefile that describes the relationships among files in your program; the commands for updating each file are described in the makefile. A makefile consists of rules. A rule explains how and when to remake certain files that are targets of a particular file. A rule consists of three parts: one or more targets, zero or more prerequisites, and zero or more commands.
The following is a snippet of a sample makefile:
Listing 1. Sample makefile
Prog1: main.o file1.o cc -o prog1 main.o file1.o main.o: main.c mydefs.h cc –c main c
Make reads the makefile in the current directory and begins by processing the first target. Make looks at each of the target dependencies (prerequisites) to see if they are also listed as targets. Make follows the chain of dependencies and walks down the recursion chain until it finds a target with up-to-date prerequisites, a target with no prerequisites, or a target whose prerequisites have no rules. Once it reaches the end of the chain it walks back down the recursion chain by executing the commands found in each target's rule.
Ant and make differ in the way they look at the build process. Make requires you to state resources and nonresources dependencies and the build commands required to transform them. Ant wants you to state build steps and the order between them (like an assembly line). The tasks themselves can do dependency checking or not, whereas make has an explicit dependency checking mechanism provided to the make builder by the user via the makefile file. Also versus Ant, make is not platform independent. Both views have their advantages and disadvantages; it would be nice to combine the ideas behind both of them.
Techniques and guidelines
After our review of concepts and definitions, now I want to propose some techniques and guidelines for quickening your builds and optimizing them for incremental builds. Please note that I briefly introduce most of these techniques; this article is only a starting point. For more information on the tools used, refer to Resources.
|Jonathon Rasmusson discusses long builds and techniques for troubleshooting them in his article, "Long Build Trouble Shooting Guide." In that article, he focuses on quickening and trouble shooting the automated test processes.|
Avoid unnecessary target executions
Make sure that a correct and logical dependency remains among your build targets, but prevents the execution of unnecessary targets when a chain of dependent targets are run. Omitting dependencies between targets to optimize the build is a bad practice because it forces programmers to remember to invoke a series of targets in a particular order to get a decent build (See "Top 15 Ant Best Practices by Eric M. Burke (ONJava.com, December 2003)). Let the build file remember the correct dependency and conduct an optimized build on its own.
Getting back to our sample bottom-up build process, every time we do a build, we don't need to run the SQL commands, Middlegen, XDoclet and so on. Nevertheless, we want to keep the correct dependencies between targets. However, in some cases, dependency checking itself takes a long time (e.g., checking entity beans against database schema). If possible, we want to skip these kinds of targets completely.
Here, I introduce a simple technique that you can use to skip unnecessary targets: Check the last execution timestamp of the target to be skipped against its depending targets' last execution timestamps to determine whether that target's output is up-to-date. For example, if Target A depends on Targets B and C, then if B or C have executed after A's last execution, then A should execute again to keep things consistent, otherwise skip A. This rule works only when all inputs of A are produced by B and C, and no manual modifications are completed on B or C's outputs between target executions.
To add this functionality, we use the Ant
touch task to either create a temporary file or update that temporary file's modification timestamp each time the unnecessary target or its depending targets execute. Then, before the unnecessary target's execution, we check the timestamp of the file created during the last target execution against the timestamps of the files created during the last executions of the depending targets via the
uptodate task. This task sets a
skip property, which we can use as the value of the unnecessary target's
unless property so that Ant skips that target.
Let's return to our example. Obviously, when our database schema changes, our sample build process must run the Middlegen target, since it generates entity beans from our database schema. On the other hand, we know that we apply changes to our database schema by modifying our SQL script file and running it against the database using the SQL target. To embed the logic required to skip the Middlegen target when the database schema has not changed in our build, we check the timestamp of the last execution of the SQL target against the timestamp of the last execution of the Middlegen target. If the SQL target's execution timestamp is not newer than the Middlegen target's execution timestamp, we can skip the Middlegen target.
Listing 2. Sample Ant file that skips unnecessary targets
<project name="sample-build" default="" basedir=".">
<target name="init-skip-properties" description="initializes the skip properties" depends="init"> <uptodate srcfile="create-database.timestamp" targetfile="middlegen.timestamp" property="middlegen.skip" value="true"/> </target>
<target name="create-database" description="runs sql script file on dbms to create db" depends="init-skip-properties">
<sql src=" MySQL.sql" ... />
<touch file="create-database.timestamp"/> </target>
<target name="middlegen" description="Runs Middlegen to create Entity Beans " depends="create-database" unless="middlegen.skip"> ...
<middlegen <cmp20 ... </cmp20>
In Listing 2, we complete a target-level optimization. The
uptodate Ant task sets a property if a target file or set of target files is more current than a source file or set of files. The
touch task changes the modification time of a file and possibly creates it at the same time.
In our example build process, we can do another optimization and automation on the
create-database target, which should run when the SQL script changes. Thus, we will check the timestamp of the SQL script file against the timestamp of the temporary file created in the SQL target's last execution.
Keep in mind that automatically skipping the
create-database target execution should be considered carefully because even an unwanted change to the SQL file (e.g., pressing the Enter or Backspace key) will re-create the database and damage existing data, which, in some projects, is not desired, especially when the team uses a common database server. In these cases, keep the
create-database target separate and independent from the other targets, and run this target manually and separately. But, in project phases where the database schema changes repeatedly and you work on bottom layers, this execution or automatic skipping of the
create-database target will help you type less and produce a more automated build.
We skip execution of the
create-database target by adding this line to the
<uptodate srcfile="MySQL.sql" targetfile="create-database.TimeStamp" property="create-database.skip" value="true"/>
And change the
create-database target definition as follow:
<target name="create-database" description="runs sql commands file on dbms to create db" depends="init" unless="create-database.skip" >
For situations when you want to compare the timestamp of a target's last execution against two or more of its depending targets' last execution timestamps, use the
condition task. For example, the following Ant script checks
Listing 3. Ant's condition task
<condition property="target3-skip"> <and> <uptodate srcfile="target1.timestamp" targetfile="target3.timestamp" /> <uptodate srcfile="target2.timestamp" targetfile="target3.timestamp" /> </and> </condition>
Please note that this technique of skipping unnecessary targets is more appropriate for those parts of a build where it does phases of automatic code generation one after another, meaning no manual intervention occurs among these phases and targets works against their previous targets' output. For more on Ant tasks, see Resources.
|For those familiar with CruiseControl, you can perform incremental, or conditional, builds in the CruiseControl server machine by setting the |
Use faster and smarter tasks in your builds
Use faster alternatives for tasks if available, and don't use dummy tasks in your builds. If a task doesn't do a dependency check or doesn't do a correct and proper dependency check, replace that task with a smarter version, if available—if not available, extend the existing tasks to do smart dependency checking in some situations. This will enable you to perform fast incremental builds.
Use Jikes for faster code compilation
In this section, I discuss the Java source code compilation task, the most executed task in builds. For faster compilations, instead of the standard javac compiler, use Jikes because it is much faster and smarter, and does better dependency checking. However, Jikes is less portable than javac because it is written in C++.
To set Jikes as your project default compiler, set the
build.compiler property in your
build.properties file to Jikes. Also, if you set the
build.compiler.fulldepend property, then Jikes does a full dependency check. Full dependency analysis of Jikes is more reliable because it also checks the classes used by the out-of-date class and up to any indirection level.
To illustrate the concept of full dependency analysis in Jikes, assume that we have three classes,
A has a dependency on
B has a dependency on
C. The full dependency analysis of Jikes causes the recompilation of
C when we modify
C and then run Jikes on
A. Without the
fulldepend option, the compiler doesn't look beyond the immediately adjacent dependencies to find classes lower in the hierarchy where the source has changed. In our example, if we run Jikes without the
fulldepend option, then
jikes A won't trigger a recompilation of
Jikes is so fast that performing a complete recompilation (cleaning all class files and then compiling) is usually recommended. I should also mention that you can use Jikes to generate dependency information for use with make.
Ant uses only the names of the source and class files to find the classes that need a rebuild when using normal compilers. It will not scan the source and therefore will have no knowledge about nested classes, classes named differently from the source file, and so on. You can use Ant's
depend task to solve this problem.
depend task does dependency checking based on criteria other than just existence/modification times. It works by determining which classes are out of date with respect to their sources and then removing the class files of any other classes that depend on the out-of-date classes. To determine the class dependencies, the
depend task analyzes the class files and does not parse the source code in any way. It relies upon the class references encoded into the class files by the compiler. This approach is generally faster than parsing the Java source.
depend discovers all of the class dependencies, it inverts this relationship to determine, for each class, the other classes dependent upon them. This affects list is used to discover which classes are invalidated by the out-of-date class. The class files of the invalidated classes are removed, triggering the compilation of the affected classes. The
depend task supports the attribute
closure, which controls whether
depend will either consider only direct class-class relationships or transitive, indirect relationships. This task can be used as a completion for compilers that don't provide good or proper dependency analysis.
|For Ant's |
|You can acquire the functionality of the |
Note that some compilers do more optimized and accurate dependency checking—for example, Eclipse's incremental compiler is able to recompile single methods.
For more on Jikes,
depend, and javac, see Resources.
Divide your project into highly cohesive and loosely coupled modules
Always analyze your code to extract dependencies between packages and layers. Then you can indicate which dependencies are wrong and refactor your system's design. Break your system into smaller highly cohesive and loosely coupled components, and extract shared or common components as separate modules. Java classes must be packaged correctly and placed in their proper layer in the system architecture. Classes in different layers should have correct external dependencies on each other. For example, the user interface layer code should not have dependency on business layer code.
This best practice has many advantages: It encourages better object-oriented design. Design qualities like extensibility, reusability, and maintainability are all influenced by the design's inter-package dependencies. It also reduces build time. By cutting wrong dependencies, it prevents needless recompilation of code in other layers when code is modified in just one layer.
Also, when components are separate from each other, a developer working on a component only runs that component's build and automated test process, which ensures that no extra rebuilds and test cases from other modules execute. Many free tools and packages perform this kind of analysis. For example, you can use the JDepend tool, which analyzes dependencies between packages and generates reports for you. These reports contain metrics like afferent couplings (packages that use a package) and efferent couplings (packages that a package depends upon). You can run this tool from both the command line and Ant. It provides both graphical and textual outputs. Running it from Ant is as easy as adding the following code snippet into your Ant file (first you will need to download the JDepend package, see Resources):
Listing 4. Run JDepend from Ant
<jdepend outputfile="jdepend.xml" fork="yes" format="xml"> <sourcespath> <pathelement location="src"/> </sourcespath> <classpath> <pathelement location="classes"/> <pathelement location="lib/jdepend.jar"/> </classpath> </jdepend>
This code generates an XML file containing the analysis results.
Execute tasks and targets in parallel
You can run tasks in parallel in Ant using the
parallel task. Tasks nested in a
parallel task execute in their own thread, thereby reducing build time by leveraging available processing resources:
Listing 5. Ant's parallel task
<Parallel> <task1 ...> <task2 ...> </parallel>
This task is typically used for testing. The application server runs in one thread and the test harness runs in another thread. For more on the
parallel task, see Resources. Note: To run tasks in parallel, they should have no dependencies on each other.
Execute system command on updated sources using the apply task
If you want to execute a system command on some sources only when they are updated, you can use the
apply task. Somehow it acts like make's rule concept. If you specify a nested mapper and the
dest attribute, the timestamp of each source file is compared to the timestamp of a target file defined by the nested mapper element and searched for in the given
dest. Then, the command executes only for updated sources. When calling system commands from Ant, remember that using system commands in Ant reduces that build file's portability.
Listing 6. Ant's apply task
<apply executable="cc" dest="src/C" > <arg value="-c"/> <arg value="-o"/> <targetfile/> <srcfile/> <fileset dir="src/C" includes="*.c"/> <mapper type="glob" from="*.c" to="*.o"/> </apply>
For more on the
apply task, see Resources.
Incremental fast builds have great impact on project progression, especially in teams utilizing XP's continuous integration discipline.
Here is a summary of this article's techniques for optimizing and quickening builds:
- Keep correct and logical dependencies between tasks and let your build skip the unnecessary targets
- Use smart and fast tasks like Jikes
- Break your system into smaller, highly cohesive and loosely coupled components, and extract shared or common components as separate modules
- Use Ant's
applytask to execute system commands on updated sources
- Run tasks in parallel whenever possible.
You can develop an Ant main launcher that implements the technique described for skipping targets internally, which will result in less modification to your original Ant file.
Learn more about this topic
- "Top 15 Ant Best Practices," Eric M. Burke, (ONJava.com, December 2003)
- "Long Build Trouble Shooting Guide," Jonathan Rasmusson (ThoughtWorks)
- Ant homepage
- Depend task documentation
- Parallel task documentation
- Javac task documentation
- Apply task documentation
- "Continuous Integration," Martin Fowler (ThoughtWorks)
- GNU make
- For more articles on Java development tools, browse the Development Tools section of JavaWorld's Topical Index