Aug 1, 1997 1:00 AM PT

Take a look inside Java classes

Learn to deduce properties of a Java class from inside a Java program

Welcome to this month's installment of "Java In Depth." One of the earliest challenges for Java was whether or not it could stand as a capable "systems" language. The root of the question involved Java's safety features that prevent a Java class from knowing other classes that are running alongside it in the virtual machine. This ability to "look inside" the classes is called introspection. In the first public Java release, known as Alpha3, the strict language rules regarding visibility of the internal components of a class could be circumvented though the use of the ObjectScope class. Then, during beta, when ObjectScope was removed from the run time because of security concerns, many people declared Java to be unfit for "serious" development.

Why is introspection necessary in order for a language to be considered a "systems" language? One part of the answer is fairly mundane: Getting from "nothing" (that is, an uninitialized VM) to "something" (that is, a running Java class) requires that some part of the system be able to inspect the classes to be run so as to figure out just what to do with them. The canonical example of this problem is simply the following: "How does a program, written in a language that cannot look 'inside' another language component, begin executing the first language component, which is the starting point of execution for all other components?"

There are two ways to deal with introspection in Java: class file inspection and the new reflection API that is part of Java 1.1.x. I'll cover both techniques, but in this column I'll focus on the first -- class file inspection. In a future column I will look at how the reflection API solves this problem. (Links to complete source code for this column are available in the Resources section.)

Look deeply into my files...

In the 1.0.x releases of Java, one of the biggest warts on the Java run time is the way in which the Java executable starts a program. What is the problem? Execution is transiting from the domain of the host operating system (Win 95, SunOS, and so on) into the domain of the Java virtual machine. Typing the line "java MyClass arg1 arg2" sets in motion a series of events that are completely hard-coded by the Java interpreter.

As the first event, the operating system command shell loads the Java interpreter and passes it the string "MyClass arg1 arg2" as its argument. The next event occurs when the Java interpreter attempts to locate a class named MyClass in one of the directories identified in the class path. If the class is found, the third event is to locate a method inside the class named main, whose signature has the modifiers "public" and "static" and which takes an array of String objects as its argument. If this method is found, a primordial thread is constructed and the method is invoked. The Java interpreter then converts "arg1 arg2" into an array of strings. Once this method is invoked, everything else is pure Java.

This is all well and good except that the main method has to be static because the run time can't invoke it with a Java environment that doesn't exist yet. Further, the first method has to be named main because there isn't any way to tell the interpreter the method's name on the command line. Even if you did tell the interpreter the name of the method, there isn't any general way in which to find out if it was in the class you had named in the first place. Finally, because the main method is static, you can't declare it in an interface, and that means you can't specify an interface like this:

public interface Application {
    public void main(String args[]);
}

If the above interface was defined, and classes implemented it, then at least you could use the instanceof operator in Java to determine if you had an application or not and thus determine whether or not it was suitable for invoking from the command line. The bottom line is that you can't (define the interface), it wasn't (built into the Java interpreter), and so you can't (determine if a class file is an application easily). So what can you do?

Actually, you can do quite a bit if you know what to look for and how to use it.

Decompiling class files

The Java class file is architecture-neutral, which means it is the same set of bits whether it is loaded from a Windows 95 machine or a Sun Solaris machine. It is also very well documented in the book The Java Virtual Machine Specification by Lindholm and Yellin. The class file structure was designed, in part, to be easily loaded into the SPARC address space. Basically, the class file could be mapped into the virtual address space, then the relative pointers inside the class fixed up, and presto! You had instant class structure. This was less useful on the Intel architecture machines, but the heritage left the class file format easy to comprehend, and even easier to break down.

In the summer of 1994, I was working in the Java group and building what is known as a "least privilege" security model for Java. I had just finished figuring out that what I really wanted to do was to look inside a Java class, excise those pieces that were not allowed by the current privilege level, and then load the result through a custom class loader. It was then that I discovered there weren't any classes in the main run time that knew about the construction of class files. There were versions in the compiler class tree (which had to generate class files from the compiled code), but I was more interested in building something for manipulating pre-existing class files.

I started by building a Java class that could decompose a Java class file that was presented to it on an input stream. I gave it the less-than-original name ClassFile. The beginning of this class is shown below.

public class ClassFile {
    int                 magic;
    short               majorVersion;
    short               minorVersion;
    ConstantPoolInfo    constantPool[];
    short               accessFlags;
    ConstantPoolInfo    thisClass;
    ConstantPoolInfo    superClass;
    ConstantPoolInfo    interfaces[];
    FieldInfo           fields[];
    MethodInfo          methods[];
    AttributeInfo       attributes[];
    boolean             isValidClass = false;
    public static final int ACC_PUBLIC      = 0x1;
    public static final int ACC_PRIVATE     = 0x2;
    public static final int ACC_PROTECTED   = 0x4;
    public static final int ACC_STATIC      = 0x8;
    public static final int ACC_FINAL       = 0x10;
    public static final int ACC_SYNCHRONIZED    = 0x20;
    public static final int ACC_THREADSAFE  = 0x40;
    public static final int ACC_TRANSIENT   = 0x80;
    public static final int ACC_NATIVE      = 0x100;
    public static final int ACC_INTERFACE   = 0x200;
    public static final int ACC_ABSTRACT    = 0x400;

As you can see, the instance variables for class ClassFile define the major components of a Java class file. In particular, the central data structure for a Java class file is known as the constant pool. Other interesting chunks of class file get classes of their own: MethodInfo for methods, FieldInfo for fields (which are the variable declarations in the class), AttributeInfo to hold class file attributes, and a set of constants that was taken directly from the specification on class files to decode the various modifiers that apply to field, method, and class declarations.

The primary method of this class is read, which is used to read a class file from disk and create a new ClassFile instance from the data. The code for the read method is shown below. I've interspersed the description with the code since the method tends to be pretty long.

1  public boolean read(InputStream in)
2   throws IOException {
3    DataInputStream di = new DataInputStream(in);
4       int count;
5
6       magic = di.readInt();
7       if (magic != (int) 0xCAFEBABE) {
8           return (false);
9       }
10
11      majorVersion = di.readShort();
12      minorVersion = di.readShort();
13      count = di.readShort();
14      constantPool = new ConstantPoolInfo[count];
15      if (debug)
16          System.out.println("read(): Read header...");
17      constantPool[0] = new ConstantPoolInfo();
18      for (int i = 1; i < constantPool.length; i++) {
19          constantPool[i] = new ConstantPoolInfo();
20          if (! constantPool[i].read(di)) {
21              return (false);
22          }
23          // These two types take up "two" spots in the table
24          if ((constantPool[i].type == ConstantPoolInfo.LONG) ||
25              (constantPool[i].type == ConstantPoolInfo.DOUBLE))
26              i++;
27      }

As you can see, the code above begins by first wrapping a DataInputStream around the input stream referenced by the variable in. Further, in lines 6 through 12, all of the information necessary to determine that the code is indeed looking at a valid class file is present. This information consists of the magic "cookie" 0xCAFEBABE, and the version numbers 45 and 3 for the major and minor values respectively. Next, in lines 13 through 27, the constant pool is read into an array of ConstantPoolInfo objects. The source code to ConstantPoolInfo is unremarkable -- it simply reads in data and identifies it based on its type. Later elements from the constant pool are used to display information about the class.

Following the above code, the read method re-scans the constant pool and "fixes up" references in the constant pool that refer to other items in the constant pool. The fix-up code is shown below. This fix-up is necessary since the references typically are indexes into the constant pool, and it is useful to have those indexes already resolved. This also provides a check for the reader to know that the class file isn't corrupt at the constant pool level.

28    for (int i = 1; i < constantPool.length; i++) {
29          if (constantPool[i] == null)
30              continue;
31          if (constantPool[i].index1 > 0)
32              constantPool[i].arg1 = constantPool[constantPool[i].index1];
33          if (constantPool[i].index2 > 0)
34              constantPool[i].arg2 = constantPool[constantPool[i].index2];
35      }
36
37      if (dumpConstants) {
38          for (int i = 1; i < constantPool.length; i++) {
39              System.out.println("C"+i+" - "+constantPool[i]);
30          }
31          }

In the above code each constant pool entry uses the index values to figure out the reference to another constant pool entry. When complete in line 36, the entire pool is optionally dumped out.

Once the code has scanned past the constant pool, the class file defines the primary class information: its class name, superclass name, and implementing interfaces. The read code scans for these values as shown below.

32      accessFlags = di.readShort();
33
34      thisClass = constantPool[di.readShort()];
35      superClass = constantPool[di.readShort()];
36      if (debug)
37          System.out.println("read(): Read class info...");
38
39          /*
30       * Identify all of the interfaces implemented by this class
31       */
32      count = di.readShort();
33      if (count != 0) {
34          if (debug)
35              System.out.println("Class implements "+count+" interfaces.");
36          interfaces = new ConstantPoolInfo[count];
37          for (int i = 0; i < count; i++) {
38              int iindex = di.readShort();
39              if ((iindex < 1) || (iindex > constantPool.length - 1))
40                  return (false);
41              interfaces[i] = constantPool[iindex];
42              if (debug)
43                  System.out.println("I"+i+": "+interfaces[i]);
44          }
45      }
46      if (debug)
47          System.out.println("read(): Read interface info...");

Once this code is complete, the read method has built up a pretty good idea of the structure of the class. All that remains is to collect the field definitions, the method definitions, and, perhaps most importantly, the class file attributes.

The class file format breaks each of these three groups into a section consisting of a number, followed by that number of instances of the thing you are looking for. So, for fields, the class file has the number of defined fields, and then that many field definitions. The code to scan in the fields is shown below.

48      count = di.readShort();
49      if (debug)
50          System.out.println("This class has "+count+" fields.");
51      if (count != 0) {
52          fields = new FieldInfo[count];
53          for (int i = 0; i < count; i++) {
54              fields[i] = new FieldInfo();
55              if (! fields[i].read(di, constantPool)) {
56                 return (false);
57              }
58              if (debug)
59                  System.out.println("F"+i+": "+
60                      fields[i].toString(constantPool));
61          }
62      }
63      if (debug)
64          System.out.println("read(): Read field info...");

The above code starts by reading a count in line #48, then, while the count is non-zero, it reads in new fields using the FieldInfo class. The FieldInfo class simply fills out data that define a field to the Java virtual machine. The code to read methods and attributes is the same, simply replacing the references to FieldInfo with references to MethodInfo or AttributeInfo as appropriate. That source is not included here, however you can look at the source using the links in the Resources section below.

Ok, so now what?

At this point you might be asking, "What good does this do me?" The answer is "Quite a bit."

If you've compiled up these classes and have them in your class path, the simplest thing you can do is to print them out and have a look.

The ClassFile class defines a method named display for dumping the structure of the class file out to the terminal. I wrote a simple program named dumpclass to show how it is used. The source code to dumpclass is shown below.

import java.io.*;
import java.util.*;
import util.*;
public class dumpclass {
    public static void main(String args[]) {
    try {
        FileInputStream fi = new FileInputStream(args[0]);
            util.ClassFile cf = new util.ClassFile();
        // cf.debug = true;
        // cf.dumpConstants = true;
        if (! cf.read(fi)) {
            System.out.println("Unable to read class file.");
            System.exit(1);
        }
        cf.display(System.out);
    } catch (Exception e) { e.printStackTrace(); }
    }
}

The code above shows how dumpclass easily reads in a named class file and then displays it using the display method. The output of the display is shown below. If you look at the output you will see that generic imports in the source such as import java.io.*; are regenerated with the specific files that the dumpclass code actually imports. If nothing else, using dumpclass on your class files, and cutting and pasting the specific imports in for your generic imports, will save compile time on some compilers. The other interesting thing is that the source code looks like, well, source code. This is because the class file structure contains structural as well as implementation information. You should not use such information to illegally decompile other people's class files.

import java.io.FileInputStream;
import java.io.PrintStream;
import java.lang.Exception;
import java.lang.System;
import java.lang.Throwable;
import util.ClassFile;
/*
 * This class has 1 optional class attributes.
 * These attributes are: 
 * Attribute 1 is of type SourceFile
 *  SourceFile : dumpclass.java
 */
public synchronized class dumpclass extends java.lang.Object {
/* Methods */
    public static void main(java.lang.String a[]);
    public void dumpclass();
}

More interesting to me when I wrote these classes was the optional class file attribute. Since the ClassFile class can write as well as read class files, it is ideal for "adding on" an optional class file attribute.

For those of you who haven't seen the specification on class files, the optional class file attribute is a chunk of opaque data that has a string typename and a chunk of opaque binary data. Sun defines a few well-known attributes (the "SourceFile" attribute shown above is one such attribute), but you can use the attributes to store arbitrarily interesting data. In my secure system prototype I had space reserved in an optional class attribute for a public key signature and a capabilities certificate.

Another interesting application of class file attribute is demonstrated by the SBKTech application Jinstall, which uses an attribute to store the compressed data for its self-extracting archive process. Using these classes and the new ZIP file routines in 1.1 makes it pretty easy to generate this type of application.

Finally, perhaps the most intriguing application of reading and rewriting class files uses attributes and class loaders. Referring back to my article on writing class loaders, and knowing that attributes can be associated with methods, in addition to being generic to the class (and in fact there is an attribute with the method to indicate the exceptions it throws), consider the following application.

Let's say you have a Java class whose method code was stored in an attribute associated with that method and encrypted by a key known only to the author's server. The actual code associated with a method was some Java code that simply threw an UnlicensedUsageException. (Note that this is a fictional exception used to illustrate the design.) Now bundle with an application a custom class loader that was designed to load such a class. This class loader would work in the following way.

First, the code for the class would be read. Then the class would be decomposed into a ClassFile structure. After this, the methods in the class would be checked for encryption. The class loader, once satisfied such a thing was allowed, would contact, via the Internet, the author's server and request a decryption key. That key would be applied to the encrypted code, and the decrypted code would be substituted for the place holder code. The class would be rewritten into a byte stream and then fed into the class loader for loading and execution.

The result of these steps would be a Java class file that was very much more difficult to decompile than a "normal" Java class. Further, since the decryption happens on the fly, only a modified virtual machine could be used to extract the running code (assuming a secure decrypting key exchange).

I had thought about coding an example but realized that such a class loader would no doubt be declared to be a munition and I would be branded an arms dealer. So this description will have to suffice!

Wrapping up and further thoughts

Being able to see inside a Java class can enable a Java application to manipulate that class in useful ways. I've looked at reading and writing class files directly, and then through a custom class loader importing the class into the Java run time. Being able to write classes enables such applications as "self extracting" classes. These are meta classes around a distribution of classes. Another interesting application is the notion of an encrypted class whose contents are self-decrypted just prior to running by accessing a remote key. It all goes to show that we can learn new skills by looking inside ourselves!

Next month we will look at the Reflection API and how it achieves introspection while keeping a rein on security, and I'll show you how I'd write the initial code of the Java interpreter if I had an opportunity to update that code.

Chuck McManis currently is the director of system software at FreeGate Corp., a venture-funded start-up that is exploring opportunities in the Internet marketplace. Before joining FreeGate, Chuck was a member of the Java Group. He joined the Java Group just after the formation of FirstPerson Inc. and was a member of the portable OS group (the group responsible for the OS portion of Java). Later, when FirstPerson was dissolved, he stayed with the group through the development of the alpha and beta versions of the Java platform. He created the first "all Java" home page on the Internet when he did the programming for the Java version of the Sun home page in May 1995. He also developed a cryptographic library for Java and versions of the Java class loader that could screen classes based on digital signatures. Before joining FirstPerson, Chuck worked in the operating systems area of SunSoft, developing networking applications, where he did the initial design of NIS+. Check out his home page.

Learn more about this topic

  • "SBKTech Tools" -- Cool tools that take advantage of class file knowledge. http://www.sbktech.org/
  • Source files for this column:
  • "How to build an interpreter in Java, Part 2The structure"
    The trick to assembling the foundation classes for a simple interpreter. http://www.javaworld.com/javaworld/jw-06-1997/jw-06-indepth.html
  • "How to build an interpreter in Java, Part 1The BASICs"
    For complex applications requiring a scripting language, Java can be used to implement the interpreter, adding scripting abilities to any Java app. http://www.javaworld.com/javaworld/jw-05-1997/jw-05-indepth.html
  • "Lexical analysis, Part 2Build an application"
    How to use the StreamTokenizer object to implement an interactive calculator. http://www.javaworld.com/javaworld/jw-02-1997/jw-02-indepth.html
  • "Lexical analysis and JavaPart 1"
    Learn how to convert human-readable text into machine-readable data using the StringTokenizer and StreamTokenizer classes. http://www.javaworld.com/javaworld/jw-01-1997/jw-01-indepth.html
  • "Code reuse and object-oriented systems"
    Use a helper class to enforce dynamic behavior. http://www.javaworld.com/javaworld/jw-12-1996/jw-12-indepth.html
  • "Container support for objects in Java 1.0.2"
    Organizing objects is easy when you put them into containers. This article walks you through the design and implementation of a container. http://www.javaworld.com/javaworld/jw-11-1996/jw-11-indepth.html
  • "The basics of Java class loaders"
    The fundamentals of this key component of the Java architecture. http://www.javaworld.com/javaworld/jw-10-1996/jw-10-indepth.html
  • "Not using garbage collection"
    Minimize heap thrashing in your Java programs. http://www.javaworld.com/javaworld/jw-09-1996/jw-09-indepth.html
  • "Threads and applets and visual controls"
    This final part of the series explores reading multiple data channels. http://www.javaworld.com/javaworld/jw-07-1996/jw-07-mcmanis.html
  • "Using communication channels in applets, Part 3"
    Develop Visual Basic-style techniques to applet design -- and convert temperatures in the process. http://www.javaworld.com/javaworld/jw-06-1996/jw-06-mcmanis.html
  • "Synchronizing threads in Java, Part II"
    Learn how to write a data channel class, and then create a simple example application that illustrates a real-world implementation of the class. http://www.javaworld.com/javaworld/jw-05-1996/jw-05-mcmanis.html
  • "Synchronizing threads in Java"
    Former Java team developer Chuck McManis walks you through a simple example illustrating how to synchronize threads to assure reliable and predictable applet behavior. http://www.javaworld.com/javaworld/jw-04-1996/jw-04-synch.html