Modify archives, Part 1

Supplement Java's util.zip package to make it easy to write or modify existing archives

I started out this month intending to write an article about caching class loaders. I wanted to create a class loader that a client-side application could use to automatically update itself from a server-side version every time the application ran. The idea was for the class loader to maintain a jar archive of the class files that made up the application, and to update that archive as needed during the class-loading process. I still intend to write the class loader article at some point in the future. But when I started writing the code, I quickly bogged down in implementing the archive-related piece. It turned out that the java.util.zip APIs weren't nearly as flexible or complete as I had thought, and I had to put a significant effort into a set of archive-maintenance classes before I could proceed with my class loader. Consequently, this article and Part 2 will present those classes; I'll return to the class loader afterwards.

TEXTBOX:

TEXTBOX_HEAD: Modifying archives: Read the whole series!

:END_TEXTBOX

Reading a jar from a URL

Java features a rich set of jar APIs. For example, you can read archives using a URL that takes the form jar:<url_of_jar>!<path_within_jar>. The following URL gets the source code for one of the classes in my threads package from an archive on my Website:

jar:http://www.holub.com/taming.java.threads.zip!/src/com/holub/asynch/Mutex.java

The code in Listing 1 demonstrates how to access a jar file with a URL.

Listing 1. JarURL.java
   1: import java.net.*;
   2: import java.io.*;
   3: import com.holub.io.P;
   4: 
   5: public class JarURL
   6: {
   7:   public static void main(String[] args)
   8:     {   new JarURL();
   9:     }
  10: 
  11:   public JarURL()
  12:     {
  13:         try
  14:         {
  15:             String url_of_file  = "file:/tmp/foo.jar";
  16:             String path_to_file = "tmp/foo.txt";
  17: 
  18:             // Read from the jar using a URL
  19: 
  20:             URL cache_url = new URL("jar:" + url_of_file + "!/" );
  21: 
  22:             URL file_url = new URL( cache_url, path_to_file );
  23:             JarURLConnection connection = 
  24:                     (JarURLConnection)( file_url.openConnection() );
  25: 
  26:             connection.setDoInput(true);
  27:             connection.setDoOutput(false);
  28:             connection.connect();
  29: 
  30:             InputStream in = connection.getInputStream();
  31:             BufferedReader reader =
  32:                     new BufferedReader( new InputStreamReader(in) );
  33:             P.rintln("Read " + reader.readLine() );
  34: 
  35:             // Unfortunately, you can't write to the jar via a URL.
  36:             // The following code does not work:
  37:             //
  38:             //  OutputStream out = connection.getOutputStream();
  39:             //  PrintWriter writer = new PrintWriter( out );
  40:             //  writer.println("Goodbye world");
  41:             //  P.rintln("Wrote");
  42: 
  43:             // Write to a local jar
  44:         }
  45:         catch( MalformedURLException e ){   System.out.println(e);  }
  46:         catch( IOException e )          {   System.out.println(e);  }
  47:     }
  48: 
  49:   private String path_for(String class_name)
  50:     {
  51:         String path_name = class_name.replace('.', '/');
  52:         return "/" + path_name ;
  53:     }
  54: }

Use the ZipFile and ZipEntry

Unfortunately, URL access to a jar file doesn't permit write operations, so I turned to the various classes in java.util.zip. The ZipFile, which seemed particularly promising, gets a list of ZipEntry objects that represent the files in the archive, and then asks the ZipEntry object for information about the file represented by the object. The following code demonstrates the process by printing the names of all the files in my_file.zip:

zip_file = new ZipFile( "my_file.zip" );
for( Enumeration e = zip_file.entries(); e.hasMoreElements(); )
{   
    ZipEntry entry = (ZipEntry) e.nextElement();
    System.out.println( entry.getName() );
}

You can request an InputStream to read the file associated with a given entry from the ZipFile:

InputStream in = zip_file.getInputStream(entry);

and then read from it in the normal way. So far so good; but, to my horror, I found that there is no getOutputStream() method available. It was back to the drawing board!

Modify a jar file: The problem

More digging unearthed ZipOutputStream, but this class is far from easy to use. There are examples in Chan, Lee, and Kramer's Java Class Libraries book (see Resources), but they prove hideously complicated.

It turns out that the only way to modify an archive is to make a new archive from scratch and copy the old one to the new one, making any changes along the way. In truth, you cannot in a simple way use Java's archive classes to modify, replace, or add a file in an existing archive.

With that in mind, the not-so-basic drill is as follows:

  1. Get the ZipEntry objects for the existing archive.
  2. Create a temporary file to hold the new archive as it's being built.
  3. Wrap that temporary file with a FileOutputStream.
  4. To remove a file:
    • Remove its entry from the list of ZipEntry objects made in step 1.
  5. To replace a file in the archive:

    1. Remove the old ZipEntry from the list of entries.
    2. Make a new ZipEntry by copying relevant fields from the old one.
    3. Put the ZipEntry into the ZipOutputStream.
    4. Open an InputStream to the file in the original archive.
    5. Copy the new contents of the file from the InputStream to the ZipOutputStream.
    6. Tell the ZipOutputStream that you're done with the entry.
    7. Close the InputStream.
  6. To add a file to the archive:

    • Follow the steps above for replacing a file, but just write the new bytes rather than transferring them.
  7. Once all modifications have been made, transfer the contents of the files represented by the ZipEntry objects that remain in the list created in Step 1 (that is, the files you haven't deleted or replaced). Use the process described earlier.
  8. Close the new and old archives, then rename the new one so that it has the same name as the old one.

Ugh! (That's a technical term we Java programmers use.) To make matters worse, the requirements for writing a compressed (DEFLATED) file differ from those for writing an uncompressed (STORED) file. The ZipEntry for uncompressed files must be initialized with a CRC value and file size before it can be written to the ZipOutputStream. Since the ZipEntry must be written before the file contents, this means that you have to process the new data twice -- once to figure out the CRC and once to copy the bytes to the ZipOutputStream. Fortunately, the process isn't so brain-dead for a compressed file; you can give the ZipOutputStream a ZipEntry with uninitialized size and CRC fields, and the ZipOutputStream will modify the fields for you as it does the compression.

The double processing of the uncompressed file gave me substantial grief. First, I didn't want to read the file twice. Second, what if my program generated the file programmatically? I didn't want to generate the file contents twice. A temptingly easy strategy is to use the ByteArrayOutputStream -- transfer the file to one of these, extract the resulting buffer, and then process the buffer twice. The problem with this approach is the size of the runtime memory footprint. If I put a 1 MB file into my archive, I'll need 1 MB of memory for the underlying byte array. Even if I have this much memory available, the program's memory footprint would probably get so large that the operating system would start swapping the executable image to disk to allow other programs to run. A program can slow down by an order of magnitude (or more) once the virtual memory manager starts swapping files to disk -- not a good outcome. On the other hand, most of the files that I would be processing in the class-loader application would be small -- a typical jar file comprises a couple KB or less. Nonetheless, it seemed to me that writing the class with the assumption that all files would be small was a bad idea. I wanted to write this class once and be done with it.

The solution

I solved the archive-modification problem by writing an Archive class that simplifies the process. Archive will be the subject of Part 2 of this series; here in Part 1, we'll look at the support classes that Archive uses.

The D class

The D class, the first support class of interest, solves a common problem: you always seem to remove debugging diagnostics five minutes before you need them. The class achieves this by allowing you to leave the diagnostics in place in such a way that they can be optimized out of existence in the production code. In other words, I didn't want to just disable the diagnostics with a runtime test of an enabled flag because all those tests (and associated method calls) would still be executed in the production system. I wanted the diagnostics to be gone entirely.

I solved the problem with the two classes, both called D, found in Listing 2 and Listing 3.

Both classes define an ebug() method that works like println. For example:

import com.holub.tools.debug.D;
// import com.holub.tools.D;
D.ebug("Hello world");

(The D.ebug() is either way cool or hideous, depending on your perspective. It's cleaner than DebugStream.println("hello") or some such alternative, though.)

Note that the classes have identical methods, though the version in the com.holub.tools package (Listing 2) contains nothing but empty methods, while the version in com.holub.tools.debug (Listing 3) actually does something. You can choose which of the two versions you want with an import statement. The earlier code employed the version that did something; I'll pick the other one once I'm done debugging. I'm counting on the JVM inlining the empty version. If you replace the call with the contents, you'll end up effectively removing the call, since there are in fact no contents. The only question is whether the arguments will be evaluated. For example, given:

D.ebug( "Var=" + var );

will the string concatenation be executed if the empty version of ebug is used? The answer depends on the JVM. HotSpot will recognize the concatenation as dead code and discard it, but other JVMs might not.

An alternative approach to commenting out import statements: Split the package between two directories. Put both versions in the com.holub.tools package, then put the definition for the empty version of D in /src/com/holub/tools and the definition for the working version of D in /src/debug/com/holub/tools. The CLASSPATH will then determine which of the two versions you pull into the program. You'll get the working version if /src/debug comes first and you'll get the empty version if /src comes first.

Listing 3. /src/com/holub/tools/debug/D.java
   1: package com.holub.tools.debug;
   2: 
   3: import com.holub.io.Std;
   4: 
   5: public class D
   6: {   static private boolean enabled = true;
   7: 
   8:   public static final void ebug_enable (){ enabled = true;  }
   9:   public static final void ebug_disable(){ enabled = false; }
  10: 
  11:   public static final void ebug( String text )
  12:     {   if( enabled )
  13:             Std.err().println( text );
  14:     }
  15: }   

The Tester class

The Tester class (Listing 4) simplifies writing unit tests. (A unit test is a test that verifies the correct operation of a single thing -- a class, in this case.) I generally like to include a unit test in every class I write, using the following form:

package com.holub.test;
class Under_test
{
    //...
   public static class Test
    {   Tester t;
       public static void main( String[] arguments )
        {   t = new Tester( arguments.length > 0, new PrintWriter(System.out) );
            //...
            t.exit();
        }
    }
}

The inner Test class is represented by its own class file (called Under_test$Test.class), which I don't ship with the production code. You run the unit test as follows:

java com.holub.test.Under_test\$Test

(Note: In the instructions above, omit the backslash if you're running a Windows shell.)

My basic philosophy vis-a-vis unit tests is to not print anything if everything's OK. That way, when you run groups of tests all at once, you end up with a list of what's wrong, with no additional clutter. I do get paranoid sometimes, though, and want to prove to myself that the test is actually running by seeing the results of a successful test as well. The first argument to the Tester constructor controls verbosity. If it's false, the Tester object reports only test failures; if it's true, the object reports successes too. In the earlier example, I'll print verbose output if any command-line arguments are specified:

java com.holub.test.Under_test\$Test -v

The second constructor argument is a Writer, to which the reports are sent. The t.exit() call at the end of main() causes the program to terminate with an exit status equal to the number of test failures; that number is a useful thing to know in a test script. It also prints the total error count if verbose mode is set.

The Tester class has several methods of interest. The check() method (of which there are several overloads) compares an expected value with an actual value and prints a message if they don't match:

t.check( "test_id", 0, f() );

The example above prints an error message if f() doesn't return 0.

I couldn't override a general version for methods that return objects, so an alternative is provided:

t.check( "test_id", f()==null, "f()==null" );

This version prints a message (including the string passed as the third argument) if the test specified in the second argument evaluates to false.

A third method:

t.println("Message");

simply prints the message if the tester is operating in verbose mode; otherwise, it prints nothing.

You can override the verbosity value passed to the constructor by calling the t.verbose(TESTER.ON) or t.verbose(TESTER.OFF) methods. Restore verbosity to the constructor-specified value with t.verbose(TESTER.RESTORE) To check if errors were found, use t.errors_were_found().

You'll note that I'm following good object-oriented design practice by not exposing the number of errors anywhere. There is no get_error_count() method. I can do this because the Tester object itself does everything that it needs to do with the error count when exit() is called.

We'll see lots of examples of the Tester class being used in subsequent listings.

Listing 4. /src/com/holub/tools/Tester.java
   1: package com.holub.tools;
   2: 
   3: import com.holub.tools.debug.Assert;
   4: import java.io.*;
   5: 
/******************************************************
A simple class to help in testing. Various check() methods are passed a test id, an expected value, and an actual value. The test prints appropriate messages, and keeps track of the total error count for you. For example:
    class Test
    {    public static void main(String[] args);
         {
             // Create a tester that sends output to standard error, which
             // operates in verbose mode if there are any command-line
             // arguments.
    
             Tester t = new Tester( args.length < 0,
                                         com.holub.tools.Std.out() );
             //...
    
             t.check("test.1", 0,     foo());  // check that foo() returns 0.
             t.check("test.2", "abc", bar());  // check that bar() returns "abc".
             t.check("test.3", true , cow());  // check that cow() returns true
             t.check("test.4", true , dog()==null);  // check that dog() returns null
    
// Check arbitrary statement
             t.check("test.5", f()!=g(), "Expected f() to return same value as g()" );
    
             //...
             t.exit();
         }
     }
*/
   6: public class Tester
   7: {
   8:   private int               errors = 0;
   9:   private boolean           verbose;
  10:   private final PrintWriter log;
  11:   private final boolean     original_verbosity;
  12: 
/**
Create a tester that has the specified behavior and output stream: @param verbose Print messages even if test succeeds. (Normally, only failures are indicated.) @param log if not null, all output is sent here, otherwise output is sent to standard error.
*/
  13:   public Tester( boolean verbose, PrintWriter log )
  14:     {   this.verbose            = verbose;
  15:         this.original_verbosity = verbose;
  16:         this.log                = log;
  17:     }
  18: 
/******************************************************
Change the verbose mode, overriding the mode passed to the constructor. @param mode
Tester.ONMessages are reported.
Tester.OFFMessages aren't reported.
Tester.RESTOREUse Verbose mode specified in constructor.
*/
  19:   public void verbose( int mode )
  20:     {   switch( mode )
  21:         {
  22:         case ON:    verbose = true;                 break;
  23:         case OFF:   verbose = false;                break;
  24:         default:    verbose = original_verbosity;   break;
  25:         }
  26:     }
  27: 
  28:   public static final int ON      = 0;
  29:   public static final int OFF     = 1;
  30:   public static final int RESTORE = 2;
  31: 
/******************************************************
Check that and expected result of type String is equal to the actual result. @param test_id String that uniquely identifies this test. @param expected the expected result. @param actual the value returned from the function under test. @return true if the expected and actual parameters matched.
*/
  32:   public boolean check( String test_id, String expected, String actual)
  33:     {
  34:         Assert.is_true( log != null    , "Tester.check(): log is null"      );
  35:         Assert.is_true( test_id != null, "Tester.check(): test_id is null"  );
  36: 
  37:         boolean okay = expected.equals( actual );
  38: 
  39:         if( !okay )
  40:             ++errors;
  41: 
  42:         if( !okay || verbose )
  43:         {   log.println (  (okay ? "   okay " : "** FAIL ")
  44:                          + "("  + test_id + ")"
  45:                          + " expected: " + expected
  46:                          + " got: "      + actual
  47:                     );
  48:         }
  49:         return okay;
  50:     }
/******************************************************
Print the message if verbose mode is on.
*/
  51:   public void println( String message )
  52:     {   if( verbose )
  53:             log.println( "\t" + message );
  54:     }
  55: 
/******************************************************
For situations not covered by normal check() methods. If okay is false, ups the error count and prints the associated message string (assuming verbose is on). Otherwise does nothing.
*/
  56:   public void check( String test_id, boolean okay, String message )
  57:     {   Assert.is_true( message != null );
  58: 
  59:         if( !okay )
  60:             ++errors;
  61: 
  62:         if( !okay || verbose )
  63:         {
  64:             log.println (  (okay ? "   okay " : "** FAIL ")
  65:                          + "("  + test_id + ") "
  66:                          + message
  67:                     );
  68:         }
  69:     }
/******************************************************
Convenience method, compares a string against a StringBuffer.
*/
  70:   public boolean check( String test_id, String expected, StringBuffer actual)
  71:     {   return check( test_id, expected, actual.toString());
  72:     }
/******************************************************
Convenience method, compares two doubles.
*/
  73:   public boolean check( String test_id, double expected, double actual)
  74:     {   return check( test_id, "" + expected, "" + actual );
  75:     }
/******************************************************
Convenience method, compares two longs.
*/
  76:   public boolean check( String test_id, long expected, long actual)
  77:     {   return check( test_id, "" + expected, "" + actual );
  78:     }
/******************************************************
Convenience method, compares two booleans.
*/
  79:   public boolean check( String test_id, boolean expected, boolean actual)
  80:     {   return check( test_id, expected?"true":"false", actual?"true":"false" );
  81:     }
/******************************************************
Return true if any preceding check() call resulted in an error.
*/
  82:   public boolean errors_were_found()
  83:     {   return errors != 0;
  84:     }
/******************************************************
Exit the program, using the total error count as the exit status.
*/
  85:   public void exit()
  86:     {   if( verbose )
  87:             log.println( "\n" + errors + " errors detected" );
  88:         System.exit( errors );
  89:     }
  90: }

The FastBufferedOutputStream class

Let's move on to the first class we actually need for the archive-file class. The FastBufferedOutputStream (Listing 5) class solves two problems with Java's BufferedOutputStream, but otherwise works identically to a BufferedOutputStream.

First, recall that the BufferedOutputStream's write() methods are synchronized, even though it's rare that two threads will ever write to the same stream simultaneously without external synchronization; as a result, the fact that write() is synchronized can slow your program down measurably. I once found that something like 80 percent of the synchronization overhead associated with a program I wrote was caused by the unnecessary synchronization on the BufferedOutputStream's write() method. As seen in Listing 5, I solve the problem by simply not synchronizing the write() methods.

Second, FastBufferedOutputStream can export the buffered data from the stream if the stream has never been flushed to disk, a feature unsupported by BufferedOutputStream. I've provided two similar methods for this purpose, export_buffer_and_close() and export_buffer_and_close(OutputStream).

Both methods close the stream. If the buffer was never flushed to the disk, the first override returns a byte array that contains the buffered characters, while the second override flushes the characters to the OutputStream you specify as an argument rather than to the OutputStream that the FastBufferedOutputStream wraps. A call to close() (or flush()) flushes the buffer to the wrapped OutputStream, of course.

The raison d'être for these two export methods will be apparent in Part 2's Archive class. A file that's being stored (uncompressed) in the archive is copied to a file wrapped with a FastBufferedOutputStream. If the file is small enough, it's never flushed to disk, so I can get it directly from the stream without incurring any file-I/O overhead. If it's large enough to have been written to the disk, then I can fall back and read the file manually. For example, the following code takes data from the FastBufferedOutputStream called temporary_file and transfers it to the destination stream:

FastBufferedOutputStream temporary_file
        = new FastBufferedOutputStream( new FileOutputStream("file.tmp") );
OutputStream destination;
//...
if( (temporary_file.export_buffer_and_close(destination)) == null )
{   
    InputStream in     = new FileInputStream("file.tmp");
    byte[]      buffer = new byte[1024];
    int         got    = 0;
    while( (got = in.read(buffer)) > 0 )
        destination.write( buffer, 0, got );
    in.close();
}

In the current application, the FastBufferedOutputStream wraps a temporary file in which I'm storing the data that's supposed to go into the archive. The destination stream references the ZipOutputStream for the archive. The foregoing code transfers the data from the temporary file to the archive, but it does it efficiently (from memory rather than from the disk) if the dataset is too small to be flushed to the disk.

Listing 5. /src/com/holub/io/FastBufferedOutputStream.java
   1: package com.holub.io;
   2: import  java.io.*;
   3: import  com.holub.tools.Tester; // for testing
   4: import  com.holub.io.Std;       // for testing
   5: 
   6: import  com.holub.tools.D;      // for testing
   7: //import com.holub.tools.debug.D;// for testing
   8: 
/**
This version of BufferedOutputStream isn't thread safe, so is much faster than the standard BufferedOutputStream in situations where the stream is not shared between threads; Otherwise, it works identically to java.io.BufferedOutputStream.
*/
   9: public class FastBufferedOutputStream extends FilterOutputStream
  10: {   
  11:   private final int      size;        // buffer size
  12:   private       byte[]   buffer;
  13:   private       int      current       = 0;
  14:   private       boolean  flushed       = false;
  15:   private       int      bytes_written = 0;
  16: 
  17:   public static final int DEFAULT_SIZE = 2048;
  18: 
/** Create a FastBufferedOutputStream whose buffer is
FastBufferedOutputStream.DEFULT_SIZE in size.
*/
  19:   public FastBufferedOutputStream( OutputStream out )
  20:     {   this( out, DEFAULT_SIZE );
  21:     }
  22: 
/**
Create a FastBufferedOutputStream whose buffer is the indicated size.
*/
  23:   public FastBufferedOutputStream( OutputStream out, int size )
  24:     {   super( out );
  25:         this.size   = size;
  26:         buffer      = new byte[ size ];
  27:     }
  28: 
  29:     // Inherit write(byte[]);
  30: 
  31:   public void close() throws IOException
  32:     {   D.ebug("\t\tFastBufferedOutputStream closing");
  33: 
  34:         flush();
  35:         buffer = null;
  36:         current = 0;
  37:         super.close();
  38:     }
  39: 
  40:   public void flush() throws IOException
  41:     {   if( current > 0 )
  42:         {   D.ebug("\t\tFlushing");
  43:             out.write( buffer, 0, current );
  44:             out.flush( );
  45:             current = 0;
  46:             flushed = true;
  47:         }
  48:     }
  49: 
/**
Write a character on the stream. Flush the buffer first if the buffer is full. That is, if you have a 10-character buffer, the flush occurs just before writing the 11th character.
*/
  50:   public void write(int the_byte) throws IOException
  51:     {   
  52:         if( current >= buffer.length )
  53:             flush();    // resets current to 0
  54: 
  55:         D.ebug(   "\t\twrite(" + the_byte 
  56:                 + "): current=" + current 
  57:                 + ", buffer.length=" + buffer.length );
  58: 
  59:         buffer[current++] = (byte)the_byte;
  60:         ++bytes_written;
  61:     }
  62: 
  63:   public void write(byte[] bytes, int offset, int length)
  64:                                                 throws IOException
  65:     {   while( --length >= 0 )
  66:             write( bytes[offset++] );
  67:     }
  68: 
/******************************************************
Return the total number of bytes written to this stream.
*/
  69:   public int bytes_written(){ return bytes_written; }
  70: 
/******************************************************
Return the object wrapped by the FastBufferedOutputStream. (I don't consider this to be a violation of encapsulation because that object is passed into the Decorator, so is externally accessible anyway.) The internal buffer is flushed so it is safe to write directly to the "contents" object.
*/
  71:   public OutputStream contents() throws IOException
  72:     {   flush();
  73:         return out;
  74:     }
  75: 
/******************************************************
Return true if the buffer has been flushed to the underlying stream.
*/
  76:   public boolean has_flushed(){ return flushed; }
  77:     
/******************************************************
If the buffer has never been flushed to the wrapped stream, copy it to destination stream and return true (without sending the characters to the wrapped stream), otherwise return false; in any event, close the stream. @see #has_flushed
*/
  78:   public boolean export_buffer_and_close(OutputStream destination)
  79:                                                 throws IOException
  80:     {   
  81:         if( !flushed )
  82:         {   destination.write( buffer, 0, current );
  83:             current = 0;
  84:         }
  85:         close();
  86:         return !flushed;
  87:     }
  88: 
/******************************************************
If the buffer has never been flushed to the wrapped stream, return it; otherwise return null. In any event, close the stream; @see #has_flushed
*/
  89:   public byte[] export_buffer_and_close() throws IOException
  90:     {   byte[] buffer = null;
  91: 
  92:         if( !flushed )
  93:         {   buffer = this.buffer;
  94:             current = 0;
  95:         }
  96:         close();
  97:         return buffer ;
  98:     }
  99: 
/******************************************************
A test class.
*/
 100:   static public class Test
 101:   {   static public void main(String[] args) throws Exception
 102:         {   Tester  t = new Tester( args.length > 0, Std.out() );
 103:             try
 104:             {
 105:                 File f = File.createTempFile( "FastBufferedOutputStream", ".test");
 106:                 FastBufferedOutputStream out = new FastBufferedOutputStream
 107:                                                 ( new FileOutputStream(f), 10 );
 108: 
 109:                 for( char c = 'a'; c <= 'x'; ++c )
 110:                     out.write( (byte)c );
 111: 
 112:                 out.write( new byte[]{ (byte)'y', (byte)'z' } );
 113:                 out.close();
 114: 
 115:                 t.check("FastBufferedOutputStream.1.0", 'z'-'a'+1, out.bytes_written() );
 116: 
 117:                 t.verbose(Tester.OFF);
 118:                 char got;
 119:                 FileInputStream in = new FileInputStream(f);
 120:                 for( char c = 'a'; c <= 'z'; ++c )
 121:                     t.check("FastBufferedOutputStream.1.1", c,  in.read() );
 122:                 t.verbose(Tester.RESTORE);
 123:                 t.check("FastBufferedOutputStream.1.1", !t.errors_were_found(),
 124:                                                             "read/write test");
 125: 
 126:                 t.check("FastBufferedOutputStream.1.2", -1, in.read() );
 127:                 in.close();
 128: 
 129:                 if( !t.errors_were_found() )
 130:                     f.delete();
 131:                 else
 132:                     Std.out().println("Test file not deleted: f.getName()" );
 133: 
 134:                 //----------------------------------------------------------------
 135:                 File temp = new File("Fast.2.test");
 136: 
 137:                 t.check( "FastBufferedOutputStream.2.0", false, temp.exists() );
 138: 
 139:                 DelayedOutputStream stream  = new DelayedOutputStream(temp);
 140: 
 141:                 t.check( "FastBufferedOutputStream.2.1", false, temp.exists() );
 142: 
 143:                 out = new FastBufferedOutputStream( stream, 2 );
 144: 
 145:                 t.check( "FastBufferedOutputStream.2.2", false, temp.exists() );
 146: 
 147:                 out.write( (byte)'x' );
 148: 
 149:                 t.check( "FastBufferedOutputStream.2.3", false, temp.exists() );
 150: 
 151:                 out.write( (byte)'x' );
 152: 
 153:                 t.check( "FastBufferedOutputStream.2.4", false, temp.exists() );
 154: 
 155:                 out.write( (byte)'x' );
 156:                 t.check( "FastBufferedOutputStream.2.5", true, temp.exists() );
 157: 
 158:                 out.close();
 159:                 boolean deleted = temp.delete();
 160:                 t.check( "FastBufferedOutputStream.2.6", deleted,
 161:                                                         "Deleting temporary file");
 162: 
 163:                 //------------------------------------------------------
 164: 
 165:                 stream = new DelayedOutputStream("Fast",".3.test");
 166:                 out = new FastBufferedOutputStream( stream, 2 );
 167: 
 168:                 out.write( 'a' );
 169:                 out.write( 'b' );
 170: 
 171:                 byte[] buffer = out.export_buffer_and_close();
 172:                 t.check("FastBufferedOutputStream.3.1", buffer != null, "Expected non null");
 173:                 t.check("FastBufferedOutputStream.3.2", buffer[0]=='a' && buffer[1]=='b',
 174:                                                     "Expected \"ab\"");
 175: 
 176:                 t.check("FastBufferedOutputStream.3.3", stream.temporary_file()==null,
 177:                                                     "Expected no temporary-file reference");
 178: 
 179:                 //------------------------------------------------------
 180: 
 181:                 stream = new DelayedOutputStream("Fast",".4.test");
 182:                 out = new FastBufferedOutputStream( stream, 2 );
 183: 
 184:                 out.write( 'a' );
 185:                 out.write( 'b' );
 186: 
 187:                 ByteArrayOutputStream bytes = new ByteArrayOutputStream();
 188: 
 189:                 t.check("FastBufferedOutputStream.4.1",
                           true, out.export_buffer_and_close(bytes) );
 190:                 t.check("FastBufferedOutputStream.4.2", bytes.toString().equals("ab"),
 191:                                                     "Expected \"ab\"");
 192:                 t.check("FastBufferedOutputStream.4.3", stream.temporary_file()==null,
 193:                                                     "Expected no temporary-file reference");
 194: 
 195:                 //------------------------------------------------------
 196: 
 197:                 stream = new DelayedOutputStream("Fast",".5.test");
 198:                 out = new FastBufferedOutputStream( stream, 2 );
 199: 
 200:                 for( char c = 'a'; c <= 'z'; ++c )
 201:                     out.write( (byte)c );
 202:                 out.close();
 203: 
 204:                 buffer = out.export_buffer_and_close();
 205:                 t.check("FastBufferedOutputStream.5.1", buffer==null, "Expected null");
 206:                 t.check("FastBufferedOutputStream.5.2",
                           false, out.export_buffer_and_close(bytes) );
 207:                 t.check("FastBufferedOutputStream.5.3", stream.temporary_file()!=null,
 208:                                                     "Expected temporary-file reference");
 209: 
 210:                 in = new FileInputStream(stream.temporary_file());
 211:                 for( char c = 'a'; c <= 'z'; ++c )
 212:                     t.check("FastBufferedOutputStream.1.1", c,  in.read() );
 213:                 t.check("FastBufferedOutputStream.1.2", -1, in.read() );
 214:                 in.close();
 215: 
 216:                 String name = stream.temporary_file().getName();
 217:                 stream.delete_temporary();
 218:                 t.check("FastBufferedOutputStream.5.3", !(new File(name).exists()),
 219:                                                     "Temporary file destroyed" );
 220:             }
 221:             catch( Exception e )
 222:             {   t.check( "FastBufferedOutputStream.Abort", false,
 223:                                 "Terminated by Exception toss" );
 224:                 e.printStackTrace();
 225:             }
 226:             finally
 227:             {   t.exit();
 228:             }
 229:         }
 230:     }
 231: }

The DelayedOutputStream class

The main difficulty with the earlier example is that I don't want the temporary file to be created until the flush occurs. In other words, I want no disk I/O to be performed at all if the incoming dataset is small enough. I solve this problem with another

OutputStream variant called DelayedOutputStream (Listing 6). The DelayedOutputStream -- a real class, not a wrapper -- works just like FileOutputStream with one important exception: the file doesn't open until the first write() operation is performed. The constructors just cache the information needed to open the file, and the open_file() method (Listing 6, line 185), called from the various write overloads (starting on line 71), actually opens the file.

Replacing the temporary_file declared in the earlier example with:

FastBufferedOutputStream temporary_file
        = new FastBufferedOutputStream( new DelayedOutputStream("file.tmp") );

ensures that file.tmp will not be created if the buffer is never flushed to disk.

The only implementation problem relates to temporary files. In Java, you create a temporary file with a call to File.createTempFile("root", ".ext");, which creates a file named something like root123456.ext in the default temporary-file directory (typically specified in the TEMP or TMP environment variable). The generated number ensures that the file name is unique. The only problem with this method is that it actually does create the file, as compared to just creating the name, and the whole point of this exercise is to avoid unnecessary disk I/O. Consequently, something like:

new DelayedOutputStream( File.createTempFile("root", ".ext") );

won't do what I want, which is to create the file only when (or if) it's actually needed.

I've solved this problem by providing a few methods not found in OutputStream itself. The DelayedOutputStream(String,String) constructor (Listing 6, line 37) creates a uniquely named temporary file if necessary.

I've also provided two methods to make it easy to delete or rename this temporary file. The delete_temporary() method (Listing 6, line 139) deletes the temporary file if it exists, and the rename_temporary_to(...) method (Listing 6, line 158) renames it. Neither of these last two methods may be called if the stream hasn't been closed. (A java.lang.IllegalStateException object is thrown in this situation.)

Listing 6. /src/com/holub/io/DelayedOutputStream.java
   1: package com.holub.io;
   2: import  java.io.*;
   3: 
   4: import com.holub.tools.Tester;  // for testing
   5: import com.holub.io.Std;        // for testing
   6: import com.holub.tools.Assert;  // for testing
   7: 
   8: // import com.holub.tools.debug.D;      // for testing
   9: import com.holub.tools.D;       // for testing
  10: 
  11: /*  A DelayedOutputStream works like a FileOutputStream, except
  12:  *  that the file is not opened until the first write occurs.
  13:  *  Note that, though you'd like this class to extend
  14:  *  FileOutputStream rather than OutputStream,
  15:  *  that approach won't work here because all of the former's
  16:  *  constructors actually open the file, and the whole point of
  17:  *  the current exercise is not to do that.
  18:  *
  19:  *  This class is not thread safe---two threads cannot safely
  20:  *  write simultaneously to the same DelayedOutputStream without
  21:  *  some sort of external synchronization.
  22:  */
  23: 
  24: public class DelayedOutputStream extends OutputStream
  25: {   
  26:   private String              file_name = null;
  27:   private String              extension = null;
  28:   private boolean             temporary = false;
  29:   private File                file      = null;
  30:   private FileDescriptor      descriptor= null;
  31:   private boolean             append    = false;
  32:   private boolean             closed    = false;
  33:   private FileOutputStream    out       = null;
  34: 
  35:     //======================================================
  36: 
/** 
Creates a temporary file on first write. The file name is the concatenation of the root, an arbitrary number, and the extension. @see #temporary_file
*/
  37:   public DelayedOutputStream( String root, String extension )
  38:     {   Assert.is_true( root != null && extension != null );
  39:         this.temporary = true;
  40:         this.file_name = root;
  41:         this.extension = extension;
  42:     }
  43: 
  44:   public DelayedOutputStream( String file_name )
  45:     {   Assert.is_true( file_name != null );
  46: 
  47:         this.file_name = file_name;
  48:     }
  49: 
  50:   public DelayedOutputStream( String file_name, boolean append )
  51:     {   Assert.is_true( file_name != null );
  52: 
  53:         this.file_name = file_name;
  54:         this.append    = append;
  55:     }
  56: 
  57:   public DelayedOutputStream( File file )
  58:     {   Assert.is_true( file != null );
  59: 
  60:         this.file = file;
  61:     }
  62: 
  63:   public DelayedOutputStream( FileDescriptor descriptor )
  64:     {   Assert.is_true( descriptor != null );
  65: 
  66:         this.descriptor = descriptor;
  67:     }
  68: 
  69:     //======================================================
  70: 
  71:   public void write(int the_byte) throws IOException
  72:     {   open_file();
  73:         out.write( the_byte );
  74:     }
  75: 
  76:   public void write(byte[] bytes, int offset, int length)
  77:                                                 throws IOException
  78:     {   Assert.is_true( bytes != null );
  79: 
  80:         open_file();
  81:         out.write(bytes, offset, length);
  82:     }
  83: 
  84:   public void write(byte[] bytes) throws IOException
  85:     {   open_file();
  86:         out.write(bytes, 0, bytes.length);
  87:     }
  88: 
/**
Close the stream. This method can be called even if the underlying file has not been opened (because nobody's written anything to it). It just silently does nothing in this case.
*/
  89:   public void close() throws IOException
  90:     {   
  91:         if( !closed )
  92:         {
  93:             closed = true;
  94: 
  95:             D.ebug("\t\tDelayedOutputStream closing "
  96:                   + ( file!=null      ? file.getPath() :
  97:                       file_name!=null ? file_name : "???"
  98:                     )
  99:                   );
 100: 
 101:             if( out != null )
 102:             {
 103:                 out.close();
 104:                 D.ebug("\t\t\tclose accomplished");
 105:             }
 106:             else D.ebug("\t\t\tno-op (file never opened)");
 107: 
 108:             // Null out all references *except* the "file"
 109:             // reference for temporary files. (which might be
 110:             // needed by a subsequent call to delete_temporary).
 111: 
 112:             file_name = null;
 113:             extension = null;
 114:             descriptor= null;
 115:             out       = null;
 116: 
 117:             if( !temporary )
 118:                 file = null;
 119:         }
 120:     }
 121: 
 122:   public void flush() throws IOException
 123:     {   open_file();
 124:         out.flush();
 125:     }
 126: 
 127:   public FileDescriptor getFD() throws IOException
 128:     {   if( out==null )
 129:             throw new IOException("No FD yet in DelayedOutputStream");
 130:         return out.getFD();
 131:     }
 132: 
/**
< Return a File object that represents the temporary file created by the DelayedOutputStream(String,String) constructor. @return the File reference or null if the temporary file hasn't been created (either because nobody's written to the current stream or because the current object doesn't represent a temporary file).
*/
 133:   public File temporary_file() throws IOException
 134:     {   if( temporary )
 135:             return file;
 136:         return null;
 137:     }
 138: 
/** 
If a temporary file has been created by a write operation, delete it, otherwise do nothing. @return true if the temporary file existed and was successfully deleted, false if it didn't exist or wasn't successfully deleted. @throws IllegalStateException if the file hasn't been closed.
*/
 139:   public boolean delete_temporary()
 140:     {   if( !closed )
 141:             throw new IllegalStateException(
 142:                           "Stream must be closed before underlying"
 143:                         + " File can be deleted");
 144: 
 145:         boolean it_exists = temporary && (file != null);
 146: 
 147:         if( it_exists )
 148:         {   D.ebug("\t\tDelayedOutputStream deleting " +file.getPath());
 149: 
 150:             if( !file.delete() )
 151:                 return false;
 152:         }
 153: 
 154:         return it_exists;
 155: 
 156:     }
 157: 
/**
If a temporary file has been created by a write operation, rename it, otherwise do nothing. If a file with the same name as the target exists, that file is deleted first. @return true if the temporary file existed and was successfully renamed, false if it didn't exist or the rename attempt failed. @throws IllegalStateException if the file hasn't been closed.
*/
 158:   public boolean rename_temporary_to( File new_name )
 159:     {   if( !closed )
 160:         {   throw new IllegalStateException(
 161:                           "Stream must be closed before underlying"
 162:                         + " File can be renamed");
 163:         }
 164:     
 165:         boolean it_exists = temporary && (file != null);
 166: 
 167:         if( it_exists )
 168:         {   D.ebug("\t\tDelayedOutputStream renaming "
 169:                     +   file.getPath()
 170:                     + " to "
 171:                     +   new_name.getPath()
 172:                   );
 173: 
 174:             if( new_name.exists() )
 175:                 if( !new_name.delete() )
 176:                     return false;
 177: 
 178:             if( !file.renameTo(new_name) )
 179:                 return false;
 180:         }
 181: 
 182:         return it_exists;
 183:     }
 184: 
/** Workhorse function called by write() variants before they
do any I/O in order to bring the file into existence.
*/
 185:   private void open_file() throws IOException
 186:     {   
 187:         if( closed )
 188:             throw new IOException(
 189:                         "Tried to access closed DelayedOutputStream");
 190:         if( out == null )
 191:         {
 192:             if( temporary )
 193:             {   file = File.createTempFile(file_name,extension);
 194:                 file_name = null;
 195:                 extension = null;
 196:             }
 197: 
 198:             if( file_name != null )
 199:             {   out = new FileOutputStream(file_name, append);
 200:                 D.ebug("\t\tDelayedOutputStream created " + file_name );
 201:             }
 202:             else if( file != null )
 203:             {   out = new FileOutputStream(file);
 204:                 D.ebug("\t\tDelayedOutputStream created " + file.getPath());
 205:             }
 206:             else if( descriptor != null )
 207:             {   out = new FileOutputStream(descriptor);
 208:                 D.ebug("\t\tDelayedOutputStream created file from fd "
 209:                                                     + descriptor.toString());
 210:             }
 211:             else
 212:                 Assert.failure(
 213:                     "DelayedOutputStream internal error: nothing to open");
 214:         }
 215:     }
 216: 
 217:   static public class Test
 218:   {   static public void main(String[] args)
 219:         {
 220:             // Note that the temporary-file creation stuff is tested
 221:             // in the FastBufferedOutputStream class's test method,
 222:             // so you should run that test too.
 223: 
 224:             Tester  t = new Tester( args.length > 0, Std.out() );
 225:             try
 226:             {   
 227:                 File f = File.createTempFile("DelayedOutputStream",".test");
 228:                 OutputStream out = new DelayedOutputStream(f);
 229: 
 230:                 for( char c = 'a'; c <= 'x'; ++c )
 231:                     out.write( (byte)c );
 232: 
 233:                 out.write( new byte[]{ (byte)'y', (byte)'z' } );
 234:                 out.close();
 235: 
 236:                 t.verbose(Tester.OFF);
 237:                 char got;
 238:                 FileInputStream in = new FileInputStream(f);
 239:                 for( char c = 'a'; c <= 'z'; ++c )
 240:                     t.check("DelayedOutputStream.1", c,  in.read() );
 241:                 t.verbose(Tester.RESTORE);
 242:                 t.check("DelayedOutputStream.1",
                           !t.errors_were_found(), "Read/Write test" );
 243: 
 244:                 t.check("DelayedOutputStream.2", -1, in.read() );
 245:                 in.close();
 246: 
 247:                 if( !t.errors_were_found() )
 248:                     f.delete();
 249:                 else
 250:                     Std.out().println("Test file not deleted: f.getPath()" );
 251:             }
 252:             catch( Exception e )
 253:             {   t.println("DelayedOutputStream.3:  Exception Toss");
 254:                 e.printStackTrace();
 255:             }
 256: 
 257:             t.exit();
 258:         }
 259:     }
 260: }

Conclusion

That's it for now. The four classes I presented here -- D, Tester, FastBufferedOutputStream, and BufferedOutputStream -- are useful in their own right. Indeed, the D class lets you insert disappearing debugging diagnostics into your code, while Tester eases the process of writing automated unit tests. The FastBufferedOutputStream can speed up your programs considerably (by eliminating the unnecessary synchronization of the BufferedOutputStream) and also give you the ability to read back the output without having to go to the disk. Finally, DelayedOutputStream lets you open a file that you may need to use, but not actually create the file unless you actually use it.

In Part 2, I'll put these classes to work with an Archive class that makes it easy to read, write, or modify an existing zip or jar file.

Allen Holub has been working in the computer industry since 1979. He is widely published in magazines (Dr. Dobb's Journal, Programmers Journal, Byte, and MSJ, among others) and is a contributing editor for JavaWorld. He has eight books to his credit, the latest of which covers the traps and pitfalls of Java threading. (Taming Java Threads [Apress, 2000]). He's been designing and building object-oriented software for longer than he cares to remember. After eight years as a C++ programmer, Allen abandoned C++ for Java in early 1996. He now looks at C++ as a bad dream, the memory of which is mercifully fading. He's been teaching programming (first C, then C++ and MFC, now object-oriented design and Java) both on his own and for the University of California Berkeley Extension since 1982. Allen offers both public classes and in-house training in Java and object-oriented design topics. He also does object-oriented design consulting and contract Java programming. Get information, and contact Allen, via his Website (http://www.holub.com).

Learn more about this topic