Take a look inside Java classes

Learn to deduce properties of a Java class from inside a Java program

Welcome to this month's installment of "Java In Depth." One of the earliest challenges for Java was whether or not it could stand as a capable "systems" language. The root of the question involved Java's safety features that prevent a Java class from knowing other classes that are running alongside it in the virtual machine. This ability to "look inside" the classes is called introspection. In the first public Java release, known as Alpha3, the strict language rules regarding visibility of the internal components of a class could be circumvented though the use of the ObjectScope class. Then, during beta, when ObjectScope was removed from the run time because of security concerns, many people declared Java to be unfit for "serious" development.

Why is introspection necessary in order for a language to be considered a "systems" language? One part of the answer is fairly mundane: Getting from "nothing" (that is, an uninitialized VM) to "something" (that is, a running Java class) requires that some part of the system be able to inspect the classes to be run so as to figure out just what to do with them. The canonical example of this problem is simply the following: "How does a program, written in a language that cannot look 'inside' another language component, begin executing the first language component, which is the starting point of execution for all other components?"

There are two ways to deal with introspection in Java: class file inspection and the new reflection API that is part of Java 1.1.x. I'll cover both techniques, but in this column I'll focus on the first -- class file inspection. In a future column I will look at how the reflection API solves this problem. (Links to complete source code for this column are available in the Resources section.)

Look deeply into my files...

In the 1.0.x releases of Java, one of the biggest warts on the Java run time is the way in which the Java executable starts a program. What is the problem? Execution is transiting from the domain of the host operating system (Win 95, SunOS, and so on) into the domain of the Java virtual machine. Typing the line "java MyClass arg1 arg2" sets in motion a series of events that are completely hard-coded by the Java interpreter.

As the first event, the operating system command shell loads the Java interpreter and passes it the string "MyClass arg1 arg2" as its argument. The next event occurs when the Java interpreter attempts to locate a class named MyClass in one of the directories identified in the class path. If the class is found, the third event is to locate a method inside the class named main, whose signature has the modifiers "public" and "static" and which takes an array of String objects as its argument. If this method is found, a primordial thread is constructed and the method is invoked. The Java interpreter then converts "arg1 arg2" into an array of strings. Once this method is invoked, everything else is pure Java.

This is all well and good except that the main method has to be static because the run time can't invoke it with a Java environment that doesn't exist yet. Further, the first method has to be named main because there isn't any way to tell the interpreter the method's name on the command line. Even if you did tell the interpreter the name of the method, there isn't any general way in which to find out if it was in the class you had named in the first place. Finally, because the main method is static, you can't declare it in an interface, and that means you can't specify an interface like this:

public interface Application {
    public void main(String args[]);

If the above interface was defined, and classes implemented it, then at least you could use the instanceof operator in Java to determine if you had an application or not and thus determine whether or not it was suitable for invoking from the command line. The bottom line is that you can't (define the interface), it wasn't (built into the Java interpreter), and so you can't (determine if a class file is an application easily). So what can you do?

Actually, you can do quite a bit if you know what to look for and how to use it.

Decompiling class files

The Java class file is architecture-neutral, which means it is the same set of bits whether it is loaded from a Windows 95 machine or a Sun Solaris machine. It is also very well documented in the book The Java Virtual Machine Specification by Lindholm and Yellin. The class file structure was designed, in part, to be easily loaded into the SPARC address space. Basically, the class file could be mapped into the virtual address space, then the relative pointers inside the class fixed up, and presto! You had instant class structure. This was less useful on the Intel architecture machines, but the heritage left the class file format easy to comprehend, and even easier to break down.

In the summer of 1994, I was working in the Java group and building what is known as a "least privilege" security model for Java. I had just finished figuring out that what I really wanted to do was to look inside a Java class, excise those pieces that were not allowed by the current privilege level, and then load the result through a custom class loader. It was then that I discovered there weren't any classes in the main run time that knew about the construction of class files. There were versions in the compiler class tree (which had to generate class files from the compiled code), but I was more interested in building something for manipulating pre-existing class files.

I started by building a Java class that could decompose a Java class file that was presented to it on an input stream. I gave it the less-than-original name ClassFile. The beginning of this class is shown below.

public class ClassFile {
    int                 magic;
    short               majorVersion;
    short               minorVersion;
    ConstantPoolInfo    constantPool[];
    short               accessFlags;
    ConstantPoolInfo    thisClass;
    ConstantPoolInfo    superClass;
    ConstantPoolInfo    interfaces[];
    FieldInfo           fields[];
    MethodInfo          methods[];
    AttributeInfo       attributes[];
    boolean             isValidClass = false;
    public static final int ACC_PUBLIC      = 0x1;
    public static final int ACC_PRIVATE     = 0x2;
    public static final int ACC_PROTECTED   = 0x4;
    public static final int ACC_STATIC      = 0x8;
    public static final int ACC_FINAL       = 0x10;
    public static final int ACC_SYNCHRONIZED    = 0x20;
    public static final int ACC_THREADSAFE  = 0x40;
    public static final int ACC_TRANSIENT   = 0x80;
    public static final int ACC_NATIVE      = 0x100;
    public static final int ACC_INTERFACE   = 0x200;
    public static final int ACC_ABSTRACT    = 0x400;

As you can see, the instance variables for class ClassFile define the major components of a Java class file. In particular, the central data structure for a Java class file is known as the constant pool. Other interesting chunks of class file get classes of their own: MethodInfo for methods, FieldInfo for fields (which are the variable declarations in the class), AttributeInfo to hold class file attributes, and a set of constants that was taken directly from the specification on class files to decode the various modifiers that apply to field, method, and class declarations.

The primary method of this class is read, which is used to read a class file from disk and create a new ClassFile instance from the data. The code for the read method is shown below. I've interspersed the description with the code since the method tends to be pretty long.

1  public boolean read(InputStream in)
2   throws IOException {
3    DataInputStream di = new DataInputStream(in);
4       int count;
6       magic = di.readInt();
7       if (magic != (int) 0xCAFEBABE) {
8           return (false);
9       }
11      majorVersion = di.readShort();
12      minorVersion = di.readShort();
13      count = di.readShort();
14      constantPool = new ConstantPoolInfo[count];
15      if (debug)
16          System.out.println("read(): Read header...");
17      constantPool[0] = new ConstantPoolInfo();
18      for (int i = 1; i < constantPool.length; i++) {
19          constantPool[i] = new ConstantPoolInfo();
20          if (! constantPool[i].read(di)) {
21              return (false);
22          }
23          // These two types take up "two" spots in the table
24          if ((constantPool[i].type == ConstantPoolInfo.LONG) ||
25              (constantPool[i].type == ConstantPoolInfo.DOUBLE))
26              i++;
27      }

As you can see, the code above begins by first wrapping a DataInputStream around the input stream referenced by the variable in. Further, in lines 6 through 12, all of the information necessary to determine that the code is indeed looking at a valid class file is present. This information consists of the magic "cookie" 0xCAFEBABE, and the version numbers 45 and 3 for the major and minor values respectively. Next, in lines 13 through 27, the constant pool is read into an array of ConstantPoolInfo objects. The source code to ConstantPoolInfo is unremarkable -- it simply reads in data and identifies it based on its type. Later elements from the constant pool are used to display information about the class.

Following the above code, the read method re-scans the constant pool and "fixes up" references in the constant pool that refer to other items in the constant pool. The fix-up code is shown below. This fix-up is necessary since the references typically are indexes into the constant pool, and it is useful to have those indexes already resolved. This also provides a check for the reader to know that the class file isn't corrupt at the constant pool level.

28    for (int i = 1; i < constantPool.length; i++) {
29          if (constantPool[i] == null)
30              continue;
31          if (constantPool[i].index1 > 0)
32              constantPool[i].arg1 = constantPool[constantPool[i].index1];
33          if (constantPool[i].index2 > 0)
34              constantPool[i].arg2 = constantPool[constantPool[i].index2];
35      }
37      if (dumpConstants) {
38          for (int i = 1; i < constantPool.length; i++) {
39              System.out.println("C"+i+" - "+constantPool[i]);
30          }
31          }

In the above code each constant pool entry uses the index values to figure out the reference to another constant pool entry. When complete in line 36, the entire pool is optionally dumped out.

Once the code has scanned past the constant pool, the class file defines the primary class information: its class name, superclass name, and implementing interfaces. The read code scans for these values as shown below.

32      accessFlags = di.readShort();
34      thisClass = constantPool[di.readShort()];
35      superClass = constantPool[di.readShort()];
36      if (debug)
37          System.out.println("read(): Read class info...");
39          /*
30       * Identify all of the interfaces implemented by this class
31       */
32      count = di.readShort();
33      if (count != 0) {
34          if (debug)
35              System.out.println("Class implements "+count+" interfaces.");
36          interfaces = new ConstantPoolInfo[count];
37          for (int i = 0; i < count; i++) {
38              int iindex = di.readShort();
39              if ((iindex < 1) || (iindex > constantPool.length - 1))
40                  return (false);
41              interfaces[i] = constantPool[iindex];
42              if (debug)
43                  System.out.println("I"+i+": "+interfaces[i]);
44          }
45      }
46      if (debug)
47          System.out.println("read(): Read interface info...");

Once this code is complete, the read method has built up a pretty good idea of the structure of the class. All that remains is to collect the field definitions, the method definitions, and, perhaps most importantly, the class file attributes.

The class file format breaks each of these three groups into a section consisting of a number, followed by that number of instances of the thing you are looking for. So, for fields, the class file has the number of defined fields, and then that many field definitions. The code to scan in the fields is shown below.

48      count = di.readShort();
49      if (debug)
50          System.out.println("This class has "+count+" fields.");
51      if (count != 0) {
52          fields = new FieldInfo[count];
53          for (int i = 0; i < count; i++) {
54              fields[i] = new FieldInfo();
55              if (! fields[i].read(di, constantPool)) {
56                 return (false);
57              }
58              if (debug)
59                  System.out.println("F"+i+": "+
60                      fields[i].toString(constantPool));
61          }
62      }
63      if (debug)
64          System.out.println("read(): Read field info...");

The above code starts by reading a count in line #48, then, while the count is non-zero, it reads in new fields using the FieldInfo class. The FieldInfo class simply fills out data that define a field to the Java virtual machine. The code to read methods and attributes is the same, simply replacing the references to FieldInfo with references to MethodInfo or AttributeInfo as appropriate. That source is not included here, however you can look at the source using the links in the Resources section below.

1 2 Page 1
Page 1 of 2