Modify archives, Part 1

Supplement Java's util.zip package to make it easy to write or modify existing archives

I started out this month intending to write an article about caching class loaders. I wanted to create a class loader that a client-side application could use to automatically update itself from a server-side version every time the application ran. The idea was for the class loader to maintain a jar archive of the class files that made up the application, and to update that archive as needed during the class-loading process. I still intend to write the class loader article at some point in the future. But when I started writing the code, I quickly bogged down in implementing the archive-related piece. It turned out that the java.util.zip APIs weren't nearly as flexible or complete as I had thought, and I had to put a significant effort into a set of archive-maintenance classes before I could proceed with my class loader. Consequently, this article and Part 2 will present those classes; I'll return to the class loader afterwards.

TEXTBOX:

TEXTBOX_HEAD: Modifying archives: Read the whole series!

:END_TEXTBOX

Reading a jar from a URL

Java features a rich set of jar APIs. For example, you can read archives using a URL that takes the form jar:<url_of_jar>!<path_within_jar>. The following URL gets the source code for one of the classes in my threads package from an archive on my Website:

jar:http://www.holub.com/taming.java.threads.zip!/src/com/holub/asynch/Mutex.java

The code in Listing 1 demonstrates how to access a jar file with a URL.

Listing 1. JarURL.java
   1: import java.net.*;
   2: import java.io.*;
   3: import com.holub.io.P;
   4: 
   5: public class JarURL
   6: {
   7:   public static void main(String[] args)
   8:     {   new JarURL();
   9:     }
  10: 
  11:   public JarURL()
  12:     {
  13:         try
  14:         {
  15:             String url_of_file  = "file:/tmp/foo.jar";
  16:             String path_to_file = "tmp/foo.txt";
  17: 
  18:             // Read from the jar using a URL
  19: 
  20:             URL cache_url = new URL("jar:" + url_of_file + "!/" );
  21: 
  22:             URL file_url = new URL( cache_url, path_to_file );
  23:             JarURLConnection connection = 
  24:                     (JarURLConnection)( file_url.openConnection() );
  25: 
  26:             connection.setDoInput(true);
  27:             connection.setDoOutput(false);
  28:             connection.connect();
  29: 
  30:             InputStream in = connection.getInputStream();
  31:             BufferedReader reader =
  32:                     new BufferedReader( new InputStreamReader(in) );
  33:             P.rintln("Read " + reader.readLine() );
  34: 
  35:             // Unfortunately, you can't write to the jar via a URL.
  36:             // The following code does not work:
  37:             //
  38:             //  OutputStream out = connection.getOutputStream();
  39:             //  PrintWriter writer = new PrintWriter( out );
  40:             //  writer.println("Goodbye world");
  41:             //  P.rintln("Wrote");
  42: 
  43:             // Write to a local jar
  44:         }
  45:         catch( MalformedURLException e ){   System.out.println(e);  }
  46:         catch( IOException e )          {   System.out.println(e);  }
  47:     }
  48: 
  49:   private String path_for(String class_name)
  50:     {
  51:         String path_name = class_name.replace('.', '/');
  52:         return "/" + path_name ;
  53:     }
  54: }

Use the ZipFile and ZipEntry

Unfortunately, URL access to a jar file doesn't permit write operations, so I turned to the various classes in java.util.zip. The ZipFile, which seemed particularly promising, gets a list of ZipEntry objects that represent the files in the archive, and then asks the ZipEntry object for information about the file represented by the object. The following code demonstrates the process by printing the names of all the files in my_file.zip:

zip_file = new ZipFile( "my_file.zip" );
for( Enumeration e = zip_file.entries(); e.hasMoreElements(); )
{   
    ZipEntry entry = (ZipEntry) e.nextElement();
    System.out.println( entry.getName() );
}

You can request an InputStream to read the file associated with a given entry from the ZipFile:

InputStream in = zip_file.getInputStream(entry);

and then read from it in the normal way. So far so good; but, to my horror, I found that there is no getOutputStream() method available. It was back to the drawing board!

Modify a jar file: The problem

More digging unearthed ZipOutputStream, but this class is far from easy to use. There are examples in Chan, Lee, and Kramer's Java Class Libraries book (see Resources), but they prove hideously complicated.

It turns out that the only way to modify an archive is to make a new archive from scratch and copy the old one to the new one, making any changes along the way. In truth, you cannot in a simple way use Java's archive classes to modify, replace, or add a file in an existing archive.

With that in mind, the not-so-basic drill is as follows:

  1. Get the ZipEntry objects for the existing archive.
  2. Create a temporary file to hold the new archive as it's being built.
  3. Wrap that temporary file with a FileOutputStream.
  4. To remove a file:
    • Remove its entry from the list of ZipEntry objects made in step 1.
  5. To replace a file in the archive:

    1. Remove the old ZipEntry from the list of entries.
    2. Make a new ZipEntry by copying relevant fields from the old one.
    3. Put the ZipEntry into the ZipOutputStream.
    4. Open an InputStream to the file in the original archive.
    5. Copy the new contents of the file from the InputStream to the ZipOutputStream.
    6. Tell the ZipOutputStream that you're done with the entry.
    7. Close the InputStream.
  6. To add a file to the archive:

    • Follow the steps above for replacing a file, but just write the new bytes rather than transferring them.
  7. Once all modifications have been made, transfer the contents of the files represented by the ZipEntry objects that remain in the list created in Step 1 (that is, the files you haven't deleted or replaced). Use the process described earlier.
  8. Close the new and old archives, then rename the new one so that it has the same name as the old one.

Ugh! (That's a technical term we Java programmers use.) To make matters worse, the requirements for writing a compressed (DEFLATED) file differ from those for writing an uncompressed (STORED) file. The ZipEntry for uncompressed files must be initialized with a CRC value and file size before it can be written to the ZipOutputStream. Since the ZipEntry must be written before the file contents, this means that you have to process the new data twice -- once to figure out the CRC and once to copy the bytes to the ZipOutputStream. Fortunately, the process isn't so brain-dead for a compressed file; you can give the ZipOutputStream a ZipEntry with uninitialized size and CRC fields, and the ZipOutputStream will modify the fields for you as it does the compression.

The double processing of the uncompressed file gave me substantial grief. First, I didn't want to read the file twice. Second, what if my program generated the file programmatically? I didn't want to generate the file contents twice. A temptingly easy strategy is to use the ByteArrayOutputStream -- transfer the file to one of these, extract the resulting buffer, and then process the buffer twice. The problem with this approach is the size of the runtime memory footprint. If I put a 1 MB file into my archive, I'll need 1 MB of memory for the underlying byte array. Even if I have this much memory available, the program's memory footprint would probably get so large that the operating system would start swapping the executable image to disk to allow other programs to run. A program can slow down by an order of magnitude (or more) once the virtual memory manager starts swapping files to disk -- not a good outcome. On the other hand, most of the files that I would be processing in the class-loader application would be small -- a typical jar file comprises a couple KB or less. Nonetheless, it seemed to me that writing the class with the assumption that all files would be small was a bad idea. I wanted to write this class once and be done with it.

The solution

I solved the archive-modification problem by writing an Archive class that simplifies the process. Archive will be the subject of Part 2 of this series; here in Part 1, we'll look at the support classes that Archive uses.

The D class

The D class, the first support class of interest, solves a common problem: you always seem to remove debugging diagnostics five minutes before you need them. The class achieves this by allowing you to leave the diagnostics in place in such a way that they can be optimized out of existence in the production code. In other words, I didn't want to just disable the diagnostics with a runtime test of an enabled flag because all those tests (and associated method calls) would still be executed in the production system. I wanted the diagnostics to be gone entirely.

I solved the problem with the two classes, both called D, found in Listing 2 and Listing 3.

Both classes define an ebug() method that works like println. For example:

import com.holub.tools.debug.D;
// import com.holub.tools.D;
D.ebug("Hello world");

(The D.ebug() is either way cool or hideous, depending on your perspective. It's cleaner than DebugStream.println("hello") or some such alternative, though.)

Note that the classes have identical methods, though the version in the com.holub.tools package (Listing 2) contains nothing but empty methods, while the version in com.holub.tools.debug (Listing 3) actually does something. You can choose which of the two versions you want with an import statement. The earlier code employed the version that did something; I'll pick the other one once I'm done debugging. I'm counting on the JVM inlining the empty version. If you replace the call with the contents, you'll end up effectively removing the call, since there are in fact no contents. The only question is whether the arguments will be evaluated. For example, given:

D.ebug( "Var=" + var );

will the string concatenation be executed if the empty version of ebug is used? The answer depends on the JVM. HotSpot will recognize the concatenation as dead code and discard it, but other JVMs might not.

An alternative approach to commenting out import statements: Split the package between two directories. Put both versions in the com.holub.tools package, then put the definition for the empty version of D in /src/com/holub/tools and the definition for the working version of D in /src/debug/com/holub/tools. The CLASSPATH will then determine which of the two versions you pull into the program. You'll get the working version if /src/debug comes first and you'll get the empty version if /src comes first.

Listing 3. /src/com/holub/tools/debug/D.java
   1: package com.holub.tools.debug;
   2: 
   3: import com.holub.io.Std;
   4: 
   5: public class D
   6: {   static private boolean enabled = true;
   7: 
   8:   public static final void ebug_enable (){ enabled = true;  }
   9:   public static final void ebug_disable(){ enabled = false; }
  10: 
  11:   public static final void ebug( String text )
  12:     {   if( enabled )
  13:             Std.err().println( text );
  14:     }
  15: }   

The Tester class

The Tester class (Listing 4) simplifies writing unit tests. (A unit test is a test that verifies the correct operation of a single thing -- a class, in this case.) I generally like to include a unit test in every class I write, using the following form:

package com.holub.test;
class Under_test
{
    //...
   public static class Test
    {   Tester t;
       public static void main( String[] arguments )
        {   t = new Tester( arguments.length > 0, new PrintWriter(System.out) );
            //...
            t.exit();
        }
    }
}

The inner Test class is represented by its own class file (called Under_test$Test.class), which I don't ship with the production code. You run the unit test as follows:

java com.holub.test.Under_test\$Test

(Note: In the instructions above, omit the backslash if you're running a Windows shell.)

My basic philosophy vis-a-vis unit tests is to not print anything if everything's OK. That way, when you run groups of tests all at once, you end up with a list of what's wrong, with no additional clutter. I do get paranoid sometimes, though, and want to prove to myself that the test is actually running by seeing the results of a successful test as well. The first argument to the Tester constructor controls verbosity. If it's false, the Tester object reports only test failures; if it's true, the object reports successes too. In the earlier example, I'll print verbose output if any command-line arguments are specified:

java com.holub.test.Under_test\$Test -v
Related:
1 2 3 4 Page 1