Automated code reviews with Checkstyle, Part 1

Automated code reviews with Checkstyle, Part 1

Code reviews are essential to code quality, but no team wants to review tens of thousands of lines of code, or should have to. In this two-part article, ShriKant Vashishtha and Abhishek Gupta show you how to overcome the challenges associated with code reviews by automating them. Find out why Checkstyle is one of the most popular tools used for code review automation, then learn how to quickly enhance its built-in rules with custom ones just for your project. Level: Intermediate

If you've worked on a large-scale project you know first-hand the value of an automated code review. Really big projects require the input of hundreds of programmers, often geographically dispersed and with great differences in skill level. Code written by engineers interacts with code written by novices in such projects -- haiku interspersed with high-school poetry.

In many cases, a QA team is assigned to review this code manually, based on coding standards used as guidelines for development. Manually reviewing millions of lines of code is a tedious job, though; ultimately as exhausting as it must be exhaustive.

Smart teams don't do code reviews manually: instead they rely on source code analyzers like Checkstyle, PMD, and JTest. Such tools come with readymade rules that help in maintaining code standards. These rules are a good starting point, but they don't account for project-specific requirements. The trick to a successful automated code review is to combine the built-in rules with custom ones. The more refined your rules, the more truly automated your code review becomes.

Given the benefits of automated code review, you might expect more people to do it. In fact, many developers want to implement project-specific custom rules like those available with Checkstyle. To the uninitiated, custom rule creation seems difficult and time-consuming. There's very little documentation on the internals of code review tools, and very few tutorials show how to create custom checks.

In this first article in JavaWorld's two-part introduction to Checkstyle, we'll remedy that situation, making the task of writing custom Checkstyle rules so simple that any Java developer can potentially do it in a day. After reading this article, you will be able to write your own custom Checkstyle rules without the help of specialized skills. In Part 2, we'll show you how to be more proactive about code quality, by stopping faulty code before it enters your code base.

Checkstyle and Java grammar

Checkstyle is a free and open source development tool that helps ensure that your Java code conforms to the coding conventions you've established. It automates the boring but crucial task of checking Java code. Checkstyle is often used as an Eclipse plugin, and also as part of a project build to create a report of coding-standard violations. It can be used in conjunction with build tools such as Ant or Maven. Checkstyle provides many readymade standard coding rules, which are very useful. However, this article focuses on creating custom rules that are more useful in enterprise development.

Before you write any custom rules for Java files, you need to consider the grammar used to write those Java files. Whenever you think about Java classes, a certain structure comes to mind. A Java class begins with a package definition, followed by import statements. In the object block (for a class or interface) you will find instance variables, a constructor, and methods. You could compare this to an XML tree structure. When you want to read an XML file, you use a parser. You the same thing with Checkstyle, but for Java files. Checkstyle uses the ANTLR Parser. Figure 1 illustrates the tree structure you get when the ANTLR Parser takes on a Java file.

A diagram of the tree structure of a Java file.
Figure 1. Tree structure of a Java file (click to enlarge)

You'll find no surprises in the structure shown in Figure 1. Continuing with the metaphor of an XML structure, the Type column in Figure 1 corresponds to XML tags, the Text column corresponds to the value of a tag, and the Line and Column columns correspond to tag attributes.

Checkstyle provides a Java Swing GUI tool that will let you view the tree structure for your Java files. You can invoke this tool with the following command:

java -classpath checkstyle-all-<version>.jar com.puppycrawl.tools.checkstyle.gui.Main <JavaFileToParse>

To produce Figure 1, we used SessionAwareCacheStore.java on the command line in place of <JavaFileToParse>. This class, among the many others discussed in this article, is included in the article source, checkstyle-src.zip. This package provides the code for all the Checkstyle rules, or checks used in this article. It also contains some very useful utility classes that simplify the task of writing Checkstyle checks, along with a readme.txt file that provides details on how to build, configure, and use these custom checks.

The Java tree structure shown by the GUI tool forms the basis for the creation of Checkstyle checks. The tree structure helps define the test cases for creating them.

How does Checkstyle work?

When you say that you want to write a custom Checkstyle rule or check, you're essentially saying that you want to write a class that extends the Check class. Checkstyle is implemented in terms of modules of checks. Modules can contain other modules and hence form a tree structure, as you can see in Listing 1.

Listing 1. File containing the list of modules in a custom Check configuration file

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE module PUBLIC "-//Puppy Crawl//DTD Check Configuration 1.2//EN" "http://www.puppycrawl.com/dtds/configuration_1_2.dtd">
<module name="Checker">
  <property name="severity" value="error" />
  <module name="TreeWalker">
    <property name="severity" value="error" />
    <module
      name="com.abc.checkstyle.check.IllegalMethodCallInForCheck">
      <property name="severity" value="error" />
      <property name="methodNames" value="length,size" />
    </module>
  </module>
</module>

Using custom rules for code analysis

Custom rules can be used for the purposes of code analysis, as well as review automation. For instance, if you suspect a memory leak in an application, one of the first steps you should take is to a look at the Java collection classes, which often do not discard objects that are no longer in use. As a rule, you may want to look at static collections. The problem becomes a bit more complex, however, when a collection as a variable is not static, but the class that contains it is a static data member for another class. In those cases, simple grep-like tools may not do the trick. Rather than analyze thousands of classes, try creating a custom Checkstyle rule that will pick up all the instances of collection instance variables in different classes. This rule could also pick up all the data members in and out of the collection. Once you know how to write custom rules, you find that they are useful for more than just code reviews.

The Checkstyle kernel interacts with modules that implement the FileSetCheck interface. Checkstyle provides some FileSetCheck implementations by default. One of them is TreeWalker. TreeWalker walks through all modules (classes) that derive from the Check class. To write a custom rule, you need to extend the Check class and plug it into the Checkstyle configuration file.

In Listing 2, the class com.abc.checkstyle.check.IllegalMethodCallInForCheck is a custom Checkstyle rule that extends from the Check class.

TreeWalker is based on the fundamentals of the Visitor pattern. It walks through all classes that extend from the Check class. However, as a custom rule developer, you can specify the event that should prompt TreeWalker to visit a particular extension of the Check class.

1 2 3 4 Page 1