Make bad code good

Refactor broken Java code for fun and profit

1 2 3 Page 2
Page 2 of 3

Fourthly, if your class is one of several implementations of an interface, you might find that all implementations of that class share some common behavior, but the code to implement it is duplicated in each class -- a sure sign that you need an abstract superclass that implements the shared behavior. By taking the shared behavior up into the superclass, you're removing code from two classes, and coincidentally documenting the expected behavior of implementations of the interface. At the very least, you can write an empty abstract superclass, extend it in each implementation, and move shared behavior into the superclass as you resolve incompatibilities.

When dealing with interfaces of classes, it's best to keep them small. The Law of Demeter says that you should never write code that invokes a getter on an object returned from a getter, for example x.getBlah().getFoo(). Instead, x's class should have a getFoo() method that does that for you. I think that law is wrong. If x.getBlah()returns a complex object, the public interface expands way out of control. Furthermore, changes to the interface of x.getBlah() cause changes to the interface of x. For that reason, I suggest exactly the opposite: Convenience methods with simple alternatives should be removed from the public interface of the class. If you suspect that a public method does not need to be public, you can do an experiment by deprecating it. You won't break the build, and you'll soon find out if anyone cares whether or not it can be called.

Finally, it's my conviction that big classes with multiple instance variables and synchronized methods represent reliable sources of deadlocks. Programmers often use synchronization to protect against concurrent access to instance variables that could corrupt relationships among those variables. For example, consider the case above where we had a map and its reverse as instance variables of a class. Obviously, synchronization should be used to ensure that updates to the pair of variables remain consistent. If the class has another, separate pair of related variables, synchronization should be used to ensure their consistency as well. However, as the instance has only one synchronization lock, access to the separate pairs of dependent variables is sequentialized as well, although not required. With the inevitable addition of a little more complexity, the unnecessary synchronization will result in a deadlock, when all that was being attempted was to protect the instance variables. The related variables should be extracted into subclasses, where they can have their own synchronization locks.

Break up big methods

Just as big classes prove difficult to understand, so do big methods, whose usual causes include:

  • Too many options, causing the method to do too many things
  • Not enough support from other methods, causing the method to do tasks at a lower level than it should
  • Overly complicated exception handling

My rule of thumb for methods: If I can't fit the entire thing on one screen, it's too long. It's an entirely practical rule with no theoretical basis, but it seems right and works for me. Remember though, when you read a method of 100 statements for the first time, it looks like 100 unrelated statements, and it is only after some study that you can see the internal structure. Life would be easier for you and for others if you made that internal structure explicit.

The most obvious target when attacking overgrown methods is code repeated in multiple methods. You would have noticed these when you were reformatting the code. A typical case occurs when a getter method may throw an exception, for example to indicate that the value is not available. Of course, everywhere you need to get the value, you handle the exception and log an error message. This means that in every place you get the value, instead of one simple function invocation you have five or six lines of code. Duplicate those in 10 methods where you need to look up that value, and you have 50 or 60 lines of bad code, even though all you are trying to do is get some value.

To solve the problem, you can start by extracting the five or six lines of code into a new private (or protected) method. Decide on some way of handling the exceptional condition, for example by returning a default value, or by throwing another exception type that is propagated by the caller. Code the exception logging and handling only in the new method, and fix the callers to simply invoke the new method. This suggestion seems entirely trivial and it's basic object-oriented practice, but I have seen good programmers fall into the trap many times. The best defense: refuse to write any line of code twice, and always make a new method in preference to duplicating even a small piece of code.

Loops serve as another obvious target for removal from a method. Often a method will call another method to get a Collection back from it, and immediately iterate over the Collection. This is known as mixed levels of abstraction -- the caller doesn't need to look inside the collection to get it, but the caller does need to look inside to use it. It's cleaner to write a method that will do the iteration over the collection for you, so the original method can use the collection without being concerned about its implementation. By the way, studying the Java Collections API can save you a lot of effort in this type of code.

Once you've extracted any loops that don't belong, it'll be easier to read the method. You'll be able to see what is being done to what without the distraction of loop variables, indentation, and Iterator declarations. You might then notice that a chunk of the method only works with one of the parameters, and the results of that computation are combined with the values of the other parameters separately. This is a perfect opportunity to extract the operations on that parameter into a separate method. If you give the new method a good name, it'll be much clearer as to what's happening.

Moving on, exceptions represent an excellent source of long methods and hard-to-read code. As exceptions are a new thing for most people (since Java was invented, or C++ became popular), there's not much folklore on how to use them. Even worse, as exceptions usually indicate that something has gone wrong, and you hardly ever know why, you don't know what to do when you do get one. This generally results in overly long, complex, and useless exception-handling code.

With exceptions, it's best to realize that you don't have to handle every one. If you don't know what to do with a ClassNotFoundException, or an InterruptedException, don't do anything. You're allowed to pass them back to your caller. Of course, that changes the method's signature, which is not always possible, so it's best to declare your methods as throwing a nested exception (an exception that has another exception as its cause). That way, your caller can discover that the FooException you threw was caused by a ClassNotFoundException, and deal with it appropriately (or rethrow it).

It's advisable for your foo package to have a generic FooException to throw from any method, just in case something goes wrong. Then your caller has to catch only FooException. For particular problems, you can implement FooNotFoundException and IllegalFooException. As long as these subclass FooException, you won't have to change your method signature, and the caller can include enough detail to make sensible responses to the problem.

Now consider the following code, which uses the Reflection API in a fairly simple way:

public Object getInstance(String s) throws FooException {
   try {
      Class c = Class.forName(s);
      Class[] argTypes = { Integer.TYPE };
      Constructor constr = c.getConstructor(argTypes);
      Object[] args = { new Integer(0) };
      return constr.newInstance(args);
   } catch (ClassNotFoundException ex) {
      throw new FooException("class not found", ex);
   } catch (NoSuchMethodException ex) {
      throw new FooException("no such constructor", ex);
   } catch (InvocationTargetException ex) {
      throw new FooException("exception in constructor", ex);
   } catch (IllegalAccessException ex) {
      throw new FooException("can't access constructor", ex);
   } catch (InstantiationException ex) {
      throw new FooException("something bad", ex);
   }
}

In the example above, I consider the five catch statements in a row to be bad code, as they're essentially useless, they take up a lot of space, and if I ever change FooException I might have to modify them. Instead, I recommend putting them in a method by themselves, like this:

public Object newGetInstance(String s) throws FooException {
   try {
      Class c = Class.forName(s);
      Class[] argTypes = { Integer.TYPE };
      Constructor constr = c.getConstructor(argTypes);
      Object[] args = { new Integer(0) };
      return constr.newInstance(args);
   } catch (Exception ex) {
      throwFooException(ex, "constructing new " + s);
      return null;
   }
}
private void throwFooException(Throwable ex1, String explain)
      throws FooException {
   explain = toString() + ": " + explain;
   try {
      throw ex1;
   } catch (ClassNotFoundException ex) {
      throw new FooException(explain, ex);
   } catch (NoSuchMethodException ex) {
      throw new FooException(explain, ex);
   } catch (InvocationTargetException ex) {
      ex = ex1.getTargetException();
      throw new FooException(explain, ex);
   } catch (IllegalAccessException ex) {
      throw new FooException(explain, ex);
   } catch (InstantiationException ex) {
      throw new FooException(explain, ex);
   } catch (Exception ex) {
      throw new FooException(explain, ex);
   }
}

I claim that almost all of the class's exceptions will be converted into FooException, and almost always in the same mindless way, so you might as well do it explicitly and efficiently. The throwFooException() method can be reused many times. For each invocation, you can change 10 lines of exception-handling code into 3. In addition, you automatically add contextual information (toString()), providing useful debugging information in all FooExceptions. If you're ever required to improve that debugging information, you only have to do it in one place. Remember the lesson from Animal Farm: "Less code good, more code bad." (That's not exactly what George Orwell wrote ...)

Rearchitecting

As you increase your knowledge of the code, making it more comprehensible and better factored, you'll inevitably find some bits you don't like. When you started this project, you weren't really capable of changing much, but now you've played with the code, and it's not as damaged as it once was. Along the way, you have seen some bad things and had some ideas on how to fix them. It's time to make some serious changes.

Rewrite code you don't understand

If you know what the code should do, and it seems to do it, but you can't figure out quite how, it's bad code. If the original author did something tricky, some comments should have been provided. More likely though, the author was clueless and the code really is a disaster. Rewrite it.

By rewriting the code, you have the opportunity to do it the easy way, and to include useful comments. Start by writing down in Javadoc comments what the new code does (just to make sure that you know). Then you can take advantage of the byproducts of your refactoring: call this new method that does that, use that new data structure to store this. The new code will make more sense to everybody and will probably run faster as well. Your unit tests will tell you whether your new code works. Go into your bug tracking system and close all the bugs you have fixed.

Move to a layered architecture

Let's hope that your inherited, formerly bad code now looks better. It's time to look at the bigger picture: how the packages relate to each other. Using some sort of tool that tells you what classes invoke what methods (that information can be extracted from the class files), find out the interclass dependencies. From those dependencies, infer the interpackage dependencies. If you can't draw a hierarchy (without any loops!) of what packages depend on what other packages, you have some architectural work to do.

Using your instincts, a code browser, and a great deal of guesswork, figure out what such an architecture might look like one fine day. Then identify the particular problem spots. A particular class that frequently refers to other packages or a particular constant referred to from other packages hints that classes or parts of classes reside in the wrong package. Ask yourself whether it would make sense to move that code to another package.

After a lot of consideration, you can derive a package hierarchy that looks sensible. Although it may require a lot of work to achieve the clean architecture, it gives you a goal to work toward. Your unit tests have clarified what the code should do, and now the architecture allocates the responsibility to do it to packages and classes. All you have to do is fill in the code.

Tools

All this cutting and pasting, extracting, and creating does not come without a cost. You will find yourself mindlessly changing parameter lists more frequently than you would like. You will have more compilation errors than you ever dreamed possible. You will (sometimes) insert more bugs than you take out. To help you through the boring bits, you need good tool support.

Assuming you have a good editor with search and replace functionality (preferably on multiple files at a time), you need a fast compiler to quickly detect what you've broken. Multiple invocations of javac probably won't run fast enough, instead, to avoid the startup cost, you may turn to a tool like Jikes or something that keeps javac running in the background.

A good IDE, one that compiles fast and saves you from referring to the Javadoc all the time, will stand you in good stead. I was never a fan of Java IDEs, but they seem to have evolved into useful tools. Being able to compile without changing windows and losing concentration proves vital.

1 2 3 Page 2
Page 2 of 3