Java decompilers compared

Our detailed examples of how 3 top decompilers handle an extensive test suite will help you determine which, if any, meet your needs

The object of a Java decompiler is to convert Java class files into Java source code. In the chaotic world of software development there are many reasons, legitimate and otherwise, to wish for such a tool. Decompilers can save the day when you have the binary for your own code, but have misplaced or otherwise lost the corresponding source code. On the other hand, decompilers are the prized components of any good software piracy kit. Most often, however, decompilers help programmers clarify poor documentation (one decompiled function is worth a thousand words) or provide a means for creating not-yet-written documentation. When was the last time you thought the documentation for any software was complete and correct?

In any case, the transparent and information-rich structure of Java class files -- a feature that makes Java's dynamic linking much better than previous models -- also makes such tools particularly easy to build. In fact, there is an arms race brewing between decompilers and so-called obfuscators, which profess to provide Java code some measure of protection from decompilers. In essence, obfuscators remove all non-essential symbolic information from your class files and, optionally, replace it with fake symbolic information designed to confuse the decompiler. Crema, the companion obfuscator to the Mocha decompiler, was examined in detail in the December issue of JavaWorld. (See the Resources section at the end of this column for a link to this article and to several obfuscator products.)

Product overview

I'll be reviewing three Java decompilers in this article: DejaVu, Mocha, and WingDis. These products are the only commercial decompilers I'm aware of, but surely there are more to come.

  • DejaVu, distributed as part of Innovative Software's OEW for Java development environment, appears to be completely independent of it. DejaVu is available on a trial basis for free.

  • Mocha, the first and most widely known decompiler, is free. Although Mocha's creator, Hanpeter van Vliet, met with an untimely demise, you can still obtain a copy of the program free of charge on the Web. An official descendant of Mocha will probably be commercially available before long.

  • WingDis version 2.06, a product from WingSoft, is available free as a crippled demo version and as a time-limited fully capable trial version. The full version costs 9.95.

See the Resources section at the end of this article for more information on where to find each of these products.

Each of these tools is 100% Pure Java, so the essential distribution consists of a Java class library and instructions to invoke it. They're all a little quirky to set up and use, a characteristic shared by many standalone Java applications.

These are all command-line-oriented tools, so the most practical way to invoke them is to embed the detailed class path and other invocation instructions in a command file. Unfortunately, there is no standardized way to do this; the details vary depending on your choice of operating system. However, once you've conquered the setup, the decompilers easily produce output that is virtually compiler-ready.

Testing method

I chose a small utility library, consisting of about 15 classes, as my standard test set. I compiled the library using JDK 1.02, with optimization (with the -o switch) and without debugger information (without the -g switch); settings which correspond to how most Java code would actually be delivered. I decompiled the class files with each of the three decompilers, then manually edited the decompiled sources until they could be successfully recompiled. I then decompiled these three sets of "second-generation" binaries with each of the three decompilers, yielding nine sets of "third-generation" sources. Once I had my data, I manually compared various pairs of sources, looking for inconsistencies that might indicate incorrectly decompiled code.

Keep in mind that in performing this set of tests I had the luxury of referring to the original sources at any time, and the double luxury of having written these sources myself -- two advantages not generally available to anyone using a decompiler in earnest.

I organized decompilation errors into the categories described below. I've based the class error types 1 through 6 (class 1 being the least offensive) on my assumption that easy-to-spot and easy-to-fix errors are less significant than hidden or hard-to-fix errors. In the last portion of this article I'll examine detailed code examples of these error types.

Class 1 errors

Description: Errors flagged by the compiler that are easily fixed

Examples: Boolean variable incorrectly identified as an int; missing, but trivial, type cast

Class 2 errors

Description: Errors flagged by the compiler that are not easily fixed

Example: Generating code containing goto

Class 3 errors

Description: Errors that create ugly and incomprehensible, but correct code

Examples: Unreconstructed flow control; unreconstructed use of

+ for string appends

Class 4 errors

Description: Errors that cause subtle misprints and create subtly incorrect code

Examples: Failing to use \ to escape characters in string constants; misprinting

character constants

Class 5 errors

Description: Errors that cause total failure

Example: Crashing without producing output

Class 6 errors

Description: Errors not flagged by the compiler that result in severely damaged

semantics

Example: Misuse or non-use of this, and other patently incorrect code

The following table shows you which decompiler is guilty of which type of error.

Decompiler errors by type
Class 1 errors
Class 2 errors
Class 3 errors
Class 4 errors
Class 5 errors
Class 6 errors
DejaVu version 1.0SeveralNoMajor problem with flow analysisYesNoNo
Mocha version beta 1SeveralNoNoNoCrashes on some class filesNo
WingDis version 2.06OneNoOveruse of if(x!=false) and similar constructionNoNoMisuse or non-use of super; mistranslation of x=a++ to a++; x=a;

Caveat emptor: The test set was not specifically designed to validate or torture the decompilers, and it is impossible to know if the results here are representative of all classes, or if the list of problems encountered is complete.

Let's get to the heart of the matter and see some of my testing in action. The remainder of this article provides the actual code examples of the tests, which will allow you to see how the individual decompilers fared on each class of error.

Class 1 errors: Errors flagged by the compiler that are easily fixed

All three decompilers sometimes failed to infer a Boolean type for integer operations, although it is interesting to note that they failed in different places.

Example 1: Missed inference to Boolean

PrintStream PrintStream()
{
    return new PrintStream(outputstream, 1);  // 1 should be true
}

At the level of bytecodes, Boolean does not exist as a type; rather, Boolean exists as a special subclass of integer, and the Boolean nature of variables has to be deduced. In the case shown above, 1 should have been true, which could have been deduced by examing the definition of Printstream. Example 2: Beautiful, but it's not Java

Mocha transformed a static initializer into an elegant, but illegal, construction:

public ConsoleWindow(String string, int i1)
{
    dead = false;
    styles = { "Plain", "Bold", "Italic" };
    sizes = { "8", "9", "10", "12", "14", "16", "18", "24" };
    ...

Bracketed initializer lists for arrays are valid only as initializers for variable declarations (either class or local), not for other assignments. The reason for this differentiation is obscure to me, but I'm sure Sun must have had a reason. In any case, it's apparent that these initializers are actually implemented by inline code inside constructors, generated by the compiler.

When decompiling this same static initializer, WingDis produced equally beautiful and syntactically correct code. Unfortunately, the code was not semantically correct, which results in a class 6 error type.

Using this same static initializer, DejaVu emitted perfectly legal (but ugly) code, as shown in this snippet:

public ConsoleWindow(String arg1, int arg2) { ... String[] Har1; Har1 = new String[3]; Har1[0] = "Plain"; Har1[1] = "Bold"; Har1[2] = "Italic"; this.styles = Har1; ...

Class 2 errors: Errors flagged by the compiler that are not easily fixed

The ability to reverse-engineer code and reproduce the same for, while, or if statements as the original code is the most surprising (approaching magical) capability of Java decompilers. Java is "decompiler friendly" in several ways:

  • At the level of bytecodes, the much-maligned goto statement is the workhorse within any function, so the task of inferring the original structure from raw gotos is daunting indeed. In Java, however, there are no explicit goto statements added by the programmer. If any gotos do exist in the code to be decompiled, they must be part of some higher-level construction.

  • The set of control structures in Java is small, and compilers compile them in fairly stylized ways.

  • The Java compiler technology is immature. Highly optimizing compilers (which will eventually appear) will be able to transform code much more significantly than do current compilers.

  • There is a close semantic match between Java source code and Java bytecode.

Earlier versions of WingDis sometimes produced code containing incorrect goto statements. These erroneous statements were nearly impossible to understand and a royal pain to recode correctly. I'm pleased to report that this class of error seems to be extinct. The reviewed version of WingDis seems to flawlessly job handle flow analysis, as does Mocha. However, despite their success on my test cases, I'm sure that "sufficiently complex" cases would present a challenge that neither Mocha nor WingDis could overcome.

DejaVu sometimes emits correct code in this situation, but it is often incomprehensible. This type of error falls into the next error class.

Class 3 errors: Errors that create ugly and incomprehensible, but correct, code

This type of error involves reconstructions that are correct Java-wise, but perhaps not as easy to read or understand as the original. The quality of reconstructions I encountered varied widely -- from elegant to abysmal.

All three programs reconstructed simple loops quite well, but Mocha and WingDis were also able to handle complex reconstructions with equal grace. DejaVu frequently resorts to emitting legal, but nearly incomprehensible, code dominated by switch statements. Let's look at some simple cases of loop reconstruction first.

Note: I've edited the following source and reconstruction examples slightly to optimize indentation and whitespace for this presentation, but the non-whitespace code is unchanged. Also, note that these functions are uncommented, as was the case in the original. This is a realistic level of commentary in small functions found in the wild.

Original source:

public String toString () { String myname = this.getName(); return("#<" + super.toString() + (myname!=null ? (" " + myname) : "" ) + ">"); }

public static int LList_Length(LList l) { int len=0; while(l!=null) { len++; l=l.next; } return(len); }

Mocha's reconstruction:

public String toString() { String string; string = getName(); return "##<" + super.toString() + ((string != null) ? (" " + string) : "") + "#>"; }

public static int LList_Length(LList lList) { int i = 0; for (; lList != null; lList = lList.next) i++; return i; }

Note that in the toString method, string's initialization was not combined with its declaration; a minor stylistic point. In LList_Length, my while became an equivalent for.

WingDis's reconstruction:

public String toString() { String Stri1= getName(); return "#<" + super.toString() + ( (Stri1 == null) ? "" : new StringBuffer(" " + Stri1).toString() ) + ">"; }

public static int LList_Length( LList LLis0) { int int1; for (int1= 0; (LLis0!= null) ; LLis0= LLis0.next) { int1++; } return int1; }

Score: 99 percent. Other than the choice of

for

instead of

while

, these first two decompilations are just like the originals. Closer than this you can't reasonably ask.

DejaVu's reconstruction:

public String toString() { String obj; StringBuffer Hobj1; String Hobj; obj = this.getName(); Hobj1 = new StringBuffer().append("#>").append(super.toString()); if (!(obj == null)) { Hobj = new StringBuffer().append(" ").append(obj).toString(); } else { Hobj = ""; } return Hobj1.append(Hobj).append("<").toString(); }

public static int LList_Length(LList arg0) { int i; i = 0; while (arg0 != null) { i++; arg0 = arg0.next; } /* end while loop */ return i; }

The reconstruction of the

toString

method allows lots of ugly detail from Java's low-level implementation of

+

to show through. However, the reconstruction of

LList_Length

using

while

rather than

for

is just like the original, and it reads much better.

Now let's look at a more troublesome case of complex loop reconstruction.

Original source:

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more