Modifying archives, Part 2: The Archive class

The Archive class allows you to write or modify stored archive files

Author's note: Before we get started on this month's article, I'd like to mention that my new book on Java threading, Taming Java Threads (APress, June 2000 (see Resources)), is finally out. The book shows you how to create production-quality multithreaded programs; it presents a full-blown industrial-strength threading library along with a lot of advice about threading pitfalls and good architecture. Much of the material in this book first appeared in JavaWorld as a nine-part series on threading (see Resources), though the material has been expanded considerably and the code has been cleaned up and expanded as well.

TEXTBOX:

TEXTBOX_HEAD: Modifying archives: Read the whole series!

:END_TEXTBOX

Modifying Archives

As I discussed in Part 1 of this series, the built-in Java archive classes contain no support for modifying an existing archive. They only let you build one from scratch. To modify an archive, you must copy it to another archive, performing the modifications along the way. Three classes are involved in the transfer:

  • ZipFile: Represents the file as a whole; you get ZipEntry objects that represent the archive's contents from here. The constructor takes the full path name of the .zip or .jar file as an argument.
  • ZipEntry: Essentially the directory entry for a file within the archive. You get an InputStream for a particular file within the archive by calling a_ZipFile_object.getInputStream(a_ZipEntry).
  • ZipOutputStream: An output stream that builds an archive. You can write ZipEntry objects onto this stream as well as the actual data (the ZipEntry object has to be written first, then the data). A ZipOutputStream is a standard java.io-style decorator used along the same lines as BufferedOutputStream. You pass an OutputStream representing the physical archive file to the ZipOutputStream as a constructor argument, and you write to the ZipOutputStream wrapper.

Next, we see the general (but not so easy) process for modifying an archive:

  1. Get all the ZipEntry objects for the existing archive.
  2. Create a temporary file to hold the new archive as it's being built.
  3. Wrap that temporary with a ZipOutputStream.
  4. To remove a file:
    • Remove its entry from the list of ZipEntry objects made in Step 1.
  5. To replace a file in the archive:
    1. Remove the old ZipEntry from the list of entries made in Step 1.
    2. Make a new ZipEntry by copying relevant fields from the old one.
    3. Put the new ZipEntry into the ZipOutputStream.
    4. Copy the new contents of the file to the ZipOutputStream.
    5. Tell the ZipOutputStream that you're done with the entry.
    6. Close the InputStream.
  6. To add a file to the archive:
    • It's just like replacing a file, but there's no ZipEntry in the old archive, so you have to create one from scratch.
  7. Once you've made all the modifications, transfer the contents of the files represented by the ZipEntry objects that remain in the list created in Step 1 (that is, the files you haven't deleted or replaced). To do this, you'll have to open an InputStream for each of the entries remaining in the list (by asking the ZipFile for an InputStream for a particular ZipEntry), then transfer bytes from that stream to the ZipOutputStream using the process described earlier.
  8. Close the new and old archives, then rename the new one to have the same name as the old one.

To make matters worse, the requirements for writing a compressed (ZipEntry.DEFLATED) file differ from those for writing an uncompressed (ZipEntry.STORED) file. The ZipEntry for uncompressed files must be initialized with a CRC value (a checksum) and file size before it can be written to the ZipOutputStream. The checksum can be built using Java's CRC32 class (which is passed the bytes that comprise the file and provides a checksum when all the bytes have been imported). The ZipEntry must be written before the file contents, however, so you have to process the data twice -- once to figure out the CRC and once again to copy the bytes to the ZipOutputStream. Fortunately, the process isn't so brain dead for a compressed file; you can give the ZipOutputStream a ZipEntry with uninitialized size and CRC fields, and the ZipOutputStream will modify the fields for you as it does the compression.

The entire process proves ridiculously complicated, and it's mysterious to me that Sun, which has worked so hard to hide complexity elsewhere in the Java packages, has given us this hideous mechanism for archive management.

Using the Archive class

I've hidden all this complexity in the Archive class -- the subject of this month's Java Toolbox. Compared to Sun's APIs, the Archive is refreshingly easy to use.

To get started, first create an Archive object using one of two constructors:

In both cases, the first argument is a pathname string that identifies the .zip or .jar file that you want to access. Note: the file doesn't need to exist if you're creating a new archive, as opposed to modifying or examining an existing one.

Meanwhile, the compress argument tells the Archive what to do with new files that you add to the archive. If it's true, we compress the new files (using the maximum compression ratio); otherwise, we simply store the files. If you're modifying a file in an existing archive, the original compression mode is preserved.

Once you've created the Archive, reading or writing a file is simply a matter of asking for an appropriate InputStream or OutputStream. Three methods are provided for this purpose:

  • InputStream input_stream_for(String internal_path)
  • OutputStream output_stream_for(String internal_path, boolean appending)
  • OutputStream output_stream_for(String internal_path)

The internal_path argument specifies the path (within the archive) to the file you want to access. If we've specified the appending flag and it's true, then the characters sent to the returned OutputStream are appended to the existing file rather than overwriting the original contents. The version of output_stream_for() without an append argument always overwrites. When you're done with the read or write operation, just close the stream in the normal way (by passing it a close() message).

You can remove a file from the archive by calling:

void remove(java.lang.String internal_path) 

which works as expected.

When you're done with the archive, you have to close it using one of two methods. The close() request closes the archive file, preserving any changes you've made, while the revert() request closes the archive file, discarding any changes you've made. It's important to call revert() if you're discarding changes (as opposed to simply abandoning the Archive reference) because otherwise temporary files used to perform the archive manipulation will remain on the disk.

Listing 1 shows a simple example of a program that copies standard input into a file called input.txt in the root directory of an archive called input.zip. If the archive already exists, then the existing input.txt file is overwritten with new contents.

Listing 1. Arc.java
   1: import com.holub.io.Archive;
   2: import java.io.*;
   3: 
   4: public class Arc
   5: {
   6:   public static void main(String[] s) throws Exception
   7:     {
   8:         Archive archive = new Archive("input.zip");
   9: 
  10:         OutputStream out = archive.output_stream_for( "input.txt" );
  11: 
  12:         int c;
  13:         while( (c = System.in.read()) != -1 )
  14:             out.write( c );
  15: 
  16:         out.close();
  17:         archive.close();
  18:     }
  19: }
         

Be aware that Archive is thread safe in a rather primitive way: no two threads may access the Archive simultaneously. The input_stream_for(...) or output_stream_for(...) methods effectively lock the Archive object, and it remains locked until the stream returned by one or the other of these methods closes. Any thread that tries to get an input or output stream or otherwise use the Archive object while a stream remains active will block until that stream closes. Keep in mind that at some point I might make this access a bit less restrictive. Indeed, there's no theoretical reason why one thread couldn't read an archive while another is writing, for example, provided that they aren't accessing the same file. I haven't had need of this behavior, however, so I haven't implemented it.

There's one other foible of the existing implementation: once you've written to a file within the archive, that file is no longer available for reading. If you want to read the modified file, you'll have to close() the Archive and then reopen it by creating a new Archive object. Again, I could change this behavior to allow reading a modified file, but I saw no reason to complicate the code by implementing features that I didn't use.

The final implementation detail focuses on three minor methods that let you get (limited) information about entries in the original source archive without having to deal with the ZipEntry object. First up, the is_newer_than(String file_name,Date d) method returns true if the file indicated by the first argument was modified after the Date passed in as the second argument. Second, the is_older_than method(String file_name, Date d) method does the obvious. Finally, the contains(String file_name) method returns true if the source archive contains the file specified by its argument. If you need to delve deeper into the attributes of the entries, you'll have to open a ZipFile and get the attributes that way. Note that none of these last three methods work reliably if you enquire about a file that's been modified or didn't exist in the original source archive.

The architecture

The architecture for the Archive class falls roughly under the aegis of the Abstract Factory design pattern, so let's look at the pattern first. A good example of a pure Abstract Factory in Java is a Collection with respect to an Iterator. In this example, you can write a method that traverses a data structure entirely in terms of interfaces, without having any idea what data structure you're traversing. Here's a method that leverages this ability to print all the elements of some unknown data structure:

public void print( Collection data_structure )
{
    Iterator i = data_structure.iterator();
    while( i.hasNext )
    {   Object current = i.next();
        System.out.println( current.toString() );
    }
}

This way of working gives you tremendous flexibility at implementation time -- you can completely change the data structure and the way it's traversed without at all modifying the print() method.

Figure 1. The Abstract Factory pattern

Figure 1 shows the general pattern, which you might implement as follows:

public interface Collection
{   //...
   Iterator iterator();
}
public interface Iterator
{   boolean hasNext();
   Object next();
   void remove();
}
public class LinkedList
{
   private static class List_iterator implements Iterator
    {   boolean hasNext(){  /*...*/ }
        Object  next()   {  /*...*/ }
        void    remove() {  /*...*/ }
    }
   public Iterator iterator()
    {   return new List_iterator();
    }
}

At the basic level, the implementation of the Iterator interface is completely hidden from the user of the interface. The user knows literally nothing about the implementation other than the fact that it implements a well-defined interface. The iterator() factory method returns a private inner class that implements a public interface, thereby guaranteeing that even if the users of the Iterator object know the actual class name, they still can't access any methods of that implementation class other than the ones defined in the public interface. (Other methods may exist to provide a private communication system between the Iterator implementation and the object across which it's iterating.)

The Archive

Figure 2 shows the static model for the Archive class implemented in Listing 2. As you can see from the figure, Archive follows the Abstract Factory pattern with one exception: there is no player in the actual Abstract Factory role -- there's only a Concrete Factory. The main point, though, is that when you ask an Archive for an OutputStream (by calling output_stream_for (Listing 2, line 124)), the method returns an instance of a private inner class Archive_OutputStream (Listing 2, line 370) that implements a public interface (java.io.OutputStream). The same reasoning applies to the InputStream derivative returned from input_stream_for(...) (Listing 2, line 195). (Yeah, I know that InputStream and OutputStream are abstract classes, not Java interfaces, but they're both interfaces in the design sense of the word regardless of the implementation details.)

You can find lots of similar examples of this design pattern in Java. For example, a URL object returns a generic implementation of the URLConnection interface in response to an openConnection() request. You, the user of the URL object, know nothing about the class that extends URLConnection. You must access it through the effective interface (again, URLConnection is a class that's being used here as an interface in the design sense.)

Figure 2. The Archive-class static model

The Archive_InputStream class

Looking at the implementations of the concrete products, Archive_InputStream (Listing 2, line 330) adds only a small amount of functionality to the default behavior defined in InputStream. For the most part, the methods just chain to the methods of the wrapped InputStream object. The one exception is the close() method (Listing 2, line 364), which notifies the creating Archive object when the stream is closed by passing it a read_accomplished() message.

The Archive_InputStream demonstrates another design pattern: Decorator. A BufferedInputStream serves as an example of a decorator built into Java -- it wraps an InputStream, implementing the same interface as InputStream but slightly modifying the behaviors of a few of the InputStream methods (to add buffering). Decorators are effective ways to modify behavior by using interface, rather than implementation, inheritance.

The Archive_OutputStream class

The Archive_OutputStream (Listing 2, line 370) class isn't a simple decorator that wraps an object and provides minor modifications to behavior, though. It's a full-blown class that implements all the relevant methods of the OutputStream base class in significant ways. Its close() override (Listing 2, line 406) does notify the creating Archive when it's closed (in the same way that Archive_InputStream does), but the Archive_OutputStream does a lot more additional work. In particular, most of the messy mechanics of talking to a .zip archive are buried in the methods of Archive_OutputStream.

The Archive_OutputStream(...) constructor (Listing 2, line 378) is passed a ZipEntry that represents the file to which it's writing. It puts that ZipEntry into the current temporary-file archive's stream (destination). Then, in the case of a DEFLATED (compressed) file, it just sets up to use the destination stream as the data sink. If the file isn't compressed, then a checksum must be computed, so it uses a DelayedOutputStream wrapped by a FastBufferedOutputStream (both discussed in Part 1 of this series) for the output stream. Characters written to this stream are buffered internally until 2K characters are written to the stream, at which time a temporary file is created and the characters are staged to the temporary file. In this way, you don't incur the overhead of creating an on-disk temporary file unless the file size is large enough to justify doing so.

The write(...) override (Listing 2, line 397) writes the characters to whichever data sink is in use (the actual zip file or the buffer), and it updates the CRC value as it does so.

Most of the real work goes on in the close() override (Listing 2, line 406). In the case of a DEFLATED file, the CRC isn't needed and the characters have already been written to the actual archive file, so you'll skip the big else clause, close the current zip-file entry, and notify the creating Archive object that the write operation is complete (by calling write_accomplished()).

In the case of an uncompressed file, close() updates the ZipEntry object's CRC and size fields; then, if the data hasn't been flushed to a temporary file yet, the buffer is extracted from the FastBufferedOutputStream and written to the real file. Otherwise, the temporary file is opened and its contents are transferred to the actual output file.

The Archive class

That's all the really hard work. Meanwhile, the Archive class does two additional things: it manages the list of ZipEntry objects that represent the directory of the archive that we're modifying, and it handles the locking required to stop two threads from simultaneously accessing the same Archive object.

The zip entries are loaded up at the top of the Archive(String,boolean) constructor (Listing 2, line 51). After the load completes, the constructor creates a temporary file to hold the modified archive and wraps that temporary file in a ZipOutputStream (which in turn wraps a FastBufferedOutputStream). I use FastBufferedOutputStream rather than BufferedOutputStream to avoid the unnecessary synchronization overhead imposed by the fact that BufferedOutputStream's write() method is (inappropriately) declared as synchronized.

The remove(...) (Listing 2, line 102) method is trivial -- it just removes the ZipEntry corresponding to the desired file from the list of source files, thereby preventing that file from being copied to the destination archive when the Archive object is closed.

The output_stream_for method (Listing 2, line 124) is less trivial. Rather than being synchronized, this method acquires a roll-your-own Mutex object on line 30 (see Resources). I've done this rather than simply synchronizing output_stream_for() because I want the Archive object to be locked from the point at which the output stream is created until that output stream has closed. The output_stream_for() method will have returned long before the caller finishes with the stream. A roll-your-own Mutex lets me acquire the lock in the current method and release it from another method entirely (write_accomplished() (Listing 2, line 191), which is called from the returned Archive_OutputStream's close() method (Listing 2, line 406)].

If you're unclear about how a Mutex works, go back and read "Programming Java Threads in the Real World, Part 4" (or get a copy of Taming Java Threads).

After acquiring the lock Mutex, the method then pulls the ZipEntry for the desired file out of the entries list (or manufactures a ZipEntry from scratch if the file is new). It then creates an Archive_OutputStream object that will handle output for that specific file. If this is an append -- rather than an overwrite -- request, output_stream_for() copies the old file contents to the output stream before returning the stream.

The input_stream_for(...) method (Listing 2, line 195) works in a similar way in that it acquires the lock Mutex, which is released when read_accomplished() (Listing 2, line 225) closes the stream.

The final method of interest is close() (Listing 2, line 229), which copies to the destination archive all files that haven't been removed from the entries list due to a previous remove() or output_stream_for() call. It then closes everything and gives the destination file the same name as the original archive. The revert() method (Listing 2, line 301) is just like close(), except that it destroys the destination archive rather than overwriting it.

The final 250 lines or so of the file are just a giant unit test that guarantees that everything works as expected. You'll find it all in Listing 2 below.

Listing 2: /src/com/holub/tools/Archive.java
   1: package com.holub.tools;
   2: 
   3: import java.io.*;
   4: import java.util.*;
   5: import java.util.zip.*;
   6: import com.holub.asynch.Mutex;
   7: import com.holub.io.FastBufferedOutputStream;
   8: import com.holub.io.DelayedOutputStream;
   9: 
  10: import com.holub.io.Std;        // for testing
  11: import com.holub.tools.Tester;  // for testing
  12: // import com.holub.tools.debug.D;  // for testing
  13: import com.holub.tools.D;           // for testing
  14: 
         
/**********************************

A class that simplifies reading from, writing to, and modifying jar/zip files. Sun's support for JAR files is dicey. It's not difficult to read them, but writing or updating is nigh on impossible. The only way to update a jar, for example, is to copy an existing jar into a new one, writing the changes as you copy, and then renaming the new file so that you overwrite the old one. If you have many updates to perform, then this is a time-consuming process, to say the least. The Archive works by creating a second jar file that holds the modified files, then when you close the archive, the close method overwrites the original jar with the copy. Note that the internal_path passed to the various methods of this class must be a fully formed path name (no ".'s" or "..'s") that uses forward slashes as a path separator. This class is thread safe, but access to the Archive is serialized. Only one thread at a time can access the archive. You modify the Archive be requesting an output or input stream for a specific internal file. The returned InputStream or OutputStream must be closed before anyone else is grated access for read or write. In a multi-threaded scenario, the requesting threads will block until the Archive becomes available.

(c) 2000, Allen I. Holub.
You may not distribute this code except in binary form, 
incorporated into a Java .class file. You may use this code 
freely for personal purposes, but you 
may not incorporate it into any product (commercial, shareware, 
or free) without the express written permission 
of Allen I. Holub.
@author Allen I. Holub
*/
  15: public class Archive
  16: {
  17:   private File                source_file;
  18:   private DelayedOutputStream destination_stream;
  19: 
  20:   private ZipFile         source;
  21:   private ZipOutputStream destination; // Temporary file that holds
  22:                                          // modified archive. Overwrites
  23:                                          // source file on close.
  24: 
  25:   private int compression = ZipEntry.DEFLATED ;
  26: 
  27:   private boolean closed  = false;
  28:   private boolean archive_has_been_modified = false;
  29: 
  30:   private Mutex   lock    = new Mutex();   // Locks Archive while read
  31:                                              // or write is in progress.
  32:   private Map     entries = new HashMap(); // Zip entries in the
  33:                                              // source archive, indexed
  34:                                              // by name.
  35: 
  36:   private static final boolean running_under_windows =
  37:                     System.getProperty("os.name").startsWith("Windows");
  38: 
         
/** Alias for true, useful as self-documenting second argument to 
Archive(String,boolean)
*/
  39:   public static final boolean COMPRESSED   = true;
  40: 
         
/** Alias for false, useful as self-documenting second argument to 
Archive(String,boolean)
*/
  41:   public static final boolean UNCOMPRESSED = false;
  42: 
         
/** Alias for true, useful as self-documenting second argument to 
output_stream_for.
*/
  43:   public static final boolean APPEND = true;
  44: 
         
/** Alias for false, useful as self-documenting second argument to 
output_stream_for.
*/
  45:   public static final boolean OVERWRITE = false;
  46: 
         
/*****************************************************************
Thrown by all methods of this class (except the constructor) if you try to access a closed archive.
*/
  47:   public static class Closed extends RuntimeException
  48:     {   Closed(){ super("Tried to modify a closed Archive"); }
  49:     }
  50: 
         
/********************************
Create a new Archive object that represents the .zip or .jar file at the indicated path. @param jar_file_path The path in the file system to the .jar or .zip file. @param compress If true, new files written to the archive (as compared to modifications of existing files) are compressed, otherwise the data in the file is simply stored. The maximum possible compression level is used.
*/
  51:   public Archive( String jar_file_path, boolean compress )
  52:                                                 throws IOException
  53:     {   source_file = new File( jar_file_path );
  54:         try
  55:         {   source = new ZipFile( jar_file_path );
  56: 
  57:             // Transfer all the zip entries into local memory to make
  58:             // them easier to access and manipulate.
  59:             for(Enumeration e = source.entries(); e.hasMoreElements();)
  60:             {   
  61:                 ZipEntry current = (ZipEntry) e.nextElement();
  62:                 entries.put( current.getName(), current );
  63:             }
  64:         }
  65:         catch( Exception e )    // Assume file doen't exist
  66:         {   source = null;      // Since the "entries" list will be
  67:         }                       // empty, "source" won't be used
  68: 
  69:         // The following constructor causes a temporary file to be created
  70:         // when the first write occurs on the output stream. If no
  71:         // writes happen, the file isn't created.
  72: 
  73:         destination_stream = new DelayedOutputStream(
  74:                                 source_file.getName(), ".tmp");
  75: 
  76:         destination        = new ZipOutputStream(
  77:                                 new FastBufferedOutputStream(
  78:                                                 destination_stream));
  79: 
  80:         destination.setLevel(9);    // compression level == max
  81: 
  82:         this.compression=compress ? ZipEntry.DEFLATED : ZipEntry.STORED;
  83:         destination.setMethod( compress ? ZipOutputStream.DEFLATED
  84:                                         : ZipOutputStream.STORED   );
  85:                                              
  86:     }
  87: 
         
/** Convenience method; creates a compressed archive
*/
  88:   public Archive( String jar_file_path ) throws IOException
  89:     {   this( jar_file_path, true );
  90:     }
  91: 
         
/********************************
Clean up from a close, closing all handles and freeing memory.
*/
  92:   private final void deconstruct() throws IOException
  93:     {
  94:                                      // The archive is now unusable,
  95:         entries             = null; // so free up any internal
  96:         source              = null; // memory in case it's needed
  97:         destination         = null; // elsewhere and the Archive
  98:         source_file         = null; // itself isn't freed for some
  99:         destination_stream  = null; // reason.
 100:     }
 101: 
         
/********************************
Remove a file from the archive.
*/
 102:   public void remove( String internal_path )
 103:                                 throws IOException, InterruptedException
 104:     {
 105:         lock.acquire();
 106:         try
 107:         {   if( closed )
 108:                 throw new Closed();
 109: 
 110:             // When the archive is closed, all files in the "entries"
 111:             // list are copied from the original jar to the new
 112:             // one. By removing the file from the list, you will
 113:             // prevent that file from being copied to the new
 114:             // archive.
 115: 
 116:             archive_has_been_modified = true;
 117:             entries.remove( internal_path );
 118:         }
 119:         finally
 120:         {   lock.release();
 121:         }
 122:     }
 123: 
         
/********************************
Return an OutputStream that you can use to write to the indicated file. The current Archive object is locked until the returned stream is closed. Any existing file that is overwritten will no longer be accessible for reading via the current 'Archive' object. It's an error to call this method more than once with the same internal_path argument. Note that the returned output stream is not thread safe. Two threads cannot write to the same output stream simultaneously without some sort of explicit synchronization. I've done it this way because output streams are typically not shared between threads, and the overhead of synchronization would be nontrivial. The file is created within the archive if it doesn't already exist. @param internal_path The path (within the archive) to the desired file. @param appending true if you want the bytes written to the returned OutputStream to be appended to the original contents. @return A stream to receive the new data.
*/
 124:   public OutputStream output_stream_for( String internal_path,
 125:                                                    boolean appending )
 126:                                                     throws IOException
 127:     {   try
 128:         {
 129:             lock.acquire(); // Lock the archive. The lock is released
 130:                             // by write_accomplished when the stream
 131:                             // is closed.
 132:             if( closed )
 133:             {   lock.release();
 134:                 throw new Closed();
 135:             }
 136: 
 137:             archive_has_been_modified = true;
 138: 
 139:             // See if it's an existing file, and if so, remove
 140:             // it from the list of files that were in the original
 141:             // archive. Otherwise the new contents will be
 142:             // overwritten by the original when the Archive is closed.
 143: 
 144:             ZipEntry found = (ZipEntry)( entries.remove(internal_path));
 145: 
 146:             ZipEntry entry = new ZipEntry(internal_path);
 147: 
 148:             entry.setMethod( found != null 
 149:                                 ? found.getMethod() : compression ); 
 150: 
 151:             entry.setComment( entry.getMethod()==ZipEntry.DEFLATED
 152:                                 ? "compressed" : "uncompressed");
 153: 
 154:             OutputStream out = new Archive_OutputStream( entry );
 155: 
 156:             if( found != null && appending )
 157:                 copy_entry_to_stream( entry, out );
 158: 
 159:             return out;
 160:         }
 161:         catch(IOException e) // release on an exception toss, but
 162:         {   lock.release();  // not a finally.
 163:             throw e;
 164:         }
 165:         catch(InterruptedException e)
 166:         {   // fall through to null return.
 167:         }
 168:         return null;
 169:     }
 170: 
         
/** Convenience method. Original file is overwritten if it's there.
*/
 171:   public OutputStream output_stream_for( String internal_path )
 172:                                             throws IOException
 173:     {   return output_stream_for( internal_path, false );
 174:     }
 175: 
         
/********************************
Copies the contents of a file in the source archive to the indicated destination stream. This method is private because it doesn't do any locking. The output stream is not closed by this method.
*/
 176:   private void copy_entry_to_stream(ZipEntry entry, OutputStream out)
 177:                                                     throws IOException
 178:     {   InputStream in = source.getInputStream(entry);
 179:         try
 180:         {   byte[] buffer = new byte[1024];
 181: 
 182:             for(int got=0; (got = in.read(buffer,0,buffer.length)) >0 ;)
 183:             {   out.write(buffer, 0, got);
 184:             }
 185:         }
 186:         finally
 187:         {   in.close();
 188:         }
 189:     }
 190: 
         
/********************************
Called from the Archive_OutputStream's close() method. In the case of a compressed write, it just releases the lock. In the case of a "stored" write, it transfers from the ByteArrayOutputStream to the file, creating the necessary checksum.
*/
 191:   private void write_accomplished()
 192:     {   lock.release();
 193:     }
 194: 
         
/********************************
Return an InputStream that you can use to read from the indicated file. The current Archive object is locked until the returned stream is closed. Once a particular archived file is overwritten (by a call to output_stream_for), that file is no longer available for write, and an attempt to call input_stream_for() on that file will fail. @return a reference to an InputStream or null if no file that matches internal_path exists. @throw ZIPException if the requested file doesn't exist. @throw IOException if an I/O error occurs when the method tries to open the stream.
*/
 195:   public InputStream input_stream_for( String internal_path )
 196:                                         throws ZipException, IOException
 197:     {   
 198:         Assert.is_true( source != null, "source is null" );
 199: 
 200:         try
 201:         {   lock.acquire(); // Lock the archive. The lock is released
 202:                             // when the returned InputStream is closed.
 203:             if( closed )
 204:             {   lock.release();
 205:                 throw new Closed();
 206:             }
 207: 
 208:             ZipEntry current = (ZipEntry)entries.get( internal_path );
 209:             if( current == null )
 210:                 throw new ZipException(internal_path +" doesn't exist");
 211: 
 212:             InputStream in  = source.getInputStream(current);
 213:             return new Archive_InputStream( in );
 214:         }
 215:         catch( IOException e )  // ZipException extends IOException
 216:         {   lock.release();
 217:             throw e;
 218:         }
 219:         catch(InterruptedException e)
 220:         {   // fall through to null return.
 221:         }
 222:         return null;
 223:     }
 224: 
         
/********************************
Called from the Archive_InputStream's close() method.
*/
 225:   private void read_accomplished()
 226:     {   lock.release();
 227:     }
 228: 
         
/********************************
Close the current Archive object (rendering it unusable) and overwrite the original archive with the new contents. The original archive is not actually modified until close() is called. A call to this method blocks until any ongoing read or write operations complete (and the associated stream is closed). @throws ZipException Zip files must have more than at least one entry in them. A ZipException is thrown if the destination file is empty, either becuase you've removed everything or because you never put anything into it. The original archive will not have been changed if this exception is thrown.
*/
 229:   public void close()
 230:                 throws IOException, InterruptedException, ZipException
 231:     {   
 232:         // The main thing that close() does is copy any files that
 233:         // remain in the "entries" list from the original archive
 234:         // to the new one. The original compression mode of the
 235:         // file is preserved. Finally, the new archive is renamed
 236:         // to the original archive's name (thereby blasting the
 237:         // original out of existence.)
 238: 
 239:         lock.acquire();
 240:         try
 241:         {   if( !closed ) // Closing a closed archive is harmless
 242:             {
 243:                 if( source != null ) // there is a source archive
 244:                 {
 245:                     if( archive_has_been_modified )
 246:                       copy_remaining_files_from_source_to_destination();
 247:                     source.close();
 248:                 }
 249: 
 250:                 if( archive_has_been_modified )
 251:                 {   
 252:                     destination.close();
 253:                     if( !destination_stream.rename_temporary_to(
 254:                                                         source_file ) )
 255:                     {   D.ebug("***\t\tTemporary file not renamed!");
 256:                     }
 257:                 }
 258:                 else
 259:                 {   destination_stream.close();
 260:                     destination_stream.delete_temporary();
 261:                 }
 262: 
 263:                 closed = true;
 264:                 deconstruct();
 265:             }
 266:         }
 267:         catch( ZipException e )
 268:         {   // Thrown if the destination archive is empty.
 269:             // Clean up the temporary file, then rethrow
 270:             // the exception.
 271: 
 272:             destination_stream.close();
 273:             destination_stream.delete_temporary();
 274:             throw e;
 275:         }
 276:         finally
 277:         {   lock.release();
 278:         }
 279:     }
 280: 
         
/********************************
Copies any files from the source archive that have not been modified to the destination archive (and removes them from the entries list as they are copied).
*/
 281:   private void copy_remaining_files_from_source_to_destination()
 282:                                                     throws IOException
 283:     {   for(Iterator i = entries.values().iterator(); i.hasNext() ;)
 284:         {   ZipEntry current = (ZipEntry)i.next();
 285:             i.remove();
 286: 
 287:             ZipEntry entry = new ZipEntry(current.getName());
 288: 
 289:             entry.setMethod ( current.getMethod()   );
 290:             entry.setSize   ( current.getSize()     );
 291:             entry.setCrc    ( current.getCrc()      );
 292:             entry.setComment( current.getComment()  );
 293: 
 294:             D.ebug( "\t\tTransferring "+current.getName()+" to output");
 295: 
 296:             destination.putNextEntry( current );
 297:             copy_entry_to_stream( current, destination );
 298:         }
 299:     }
 300: 
         
/********************************
Close the archive, abandoning any changes that you've made. It's important to call this method (as compared to simply abandoning the Archive object) if you want to discard changes; otherwise, temporary files will be left on your disk. Reverting a closed archive is considered harmless, so is not flagged as an error.
*/
 301:   public void revert() throws IOException, InterruptedException
 302:     {   
 303:         // All that this method does is close, and then destroy, the
 304:         // temporary file that contains the partially assembled
 305:         // new archive. It also puts the Archive object into the
 306:         // "closed" state so that no further modifications are
 307:         // possible.
 308: 
 309:         lock.acquire();
 310:         try
 311:         {   if( closed )
 312:                 return;
 313: 
 314:             source.close();
 315: 
 316:             if( archive_has_been_modified )
 317:                 destination.close();
 318:             else
 319:                 destination_stream.close();
 320: 
 321:             destination_stream.delete_temporary();
 322:             closed = true;
 323:             deconstruct();
 324:         }
 325:         finally
 326:         {   lock.release();
 327:         }
 328:     }
 329: 
         
/********************************
A gang-of-four Decorator that wraps InputStream in such a way that the Archive object that created it is notified when the stream is closed.
*/
 330:   private class Archive_InputStream extends InputStream
 331:   {   private final InputStream wrapped;
 332: 
 333:       public Archive_InputStream( InputStream wrapped )
 334:         {   this.wrapped = wrapped;
 335:         }
 336: 
 337:       public int available()   throws IOException 
 338:         {   return wrapped.available();
 339:         }
 340: 
 341:       public void reset()  throws IOException
 342:         {   wrapped.reset();
 343:         }
 344: 
 345:       public long skip(long n)throws IOException
 346:         {   return wrapped.skip(n);
 347:         }
 348: 
 349:       public int read() throws IOException
 350:         {   return wrapped.read();
 351:         }
 352: 
 353:       public int read(byte[] b)throws IOException
 354:         {   return wrapped.read(b);
 355:         }
 356: 
 357:       public void mark( int limit ) {       wrapped.mark(limit);    }
 358:       public boolean markSupported(){return wrapped.markSupported();}
 359: 
 360:       public int read(byte[] b,int o,int l) throws IOException
 361:         {   return wrapped.read(b,o,l);
 362:         }
 363: 
 364:       public void close() throws IOException
 365:         {   wrapped.close();
 366:             read_accomplished();
 367:         }
 368:     }
 369: 
         
/********************************
The Archive_OutputStream is a real class, not a Decorator. The main problem is that storing a file mandates computing a checksum before writing the bytes. Though it's tempting to just save the file to a ByteArrayOutputStream and then get the bytes, the in-memory footprint can be too large. The current implementation transfers the bytes to a temporary file, then transfers them from a temporary file to the archive. The bytes are buffered internally, so for small files (under 2K in size), the temporary file is actually never created.
*/
 370:   private class Archive_OutputStream extends OutputStream
 371:     {
 372:       private final   ZipEntry     entry;
 373:       private final   CRC32        crc  = new CRC32();
 374: 
 375:       private         OutputStream        out;
 376:       private         DelayedOutputStream stream;
 377: 
 378:       public Archive_OutputStream( ZipEntry entry ) throws IOException
 379:         {   this.entry  = entry;
 380: 
 381:             destination.setMethod( entry.getMethod()  );
 382: 
 383:             if( entry.getMethod() == ZipEntry.DEFLATED )
 384:             {   destination.putNextEntry( entry );
 385:                 out = destination;
 386:             }
 387:             else
 388:             {   stream = new DelayedOutputStream("Archive", ".tmp");
 389:                 out    = new FastBufferedOutputStream( stream );
 390:             }
 391: 
 392:             D.ebug("\t\tOpened " + entry.getComment() + " stream" );
 393:         }
 394: 
 395:       public void flush(){ /* meaningless in context */ }
 396: 
 397:       public void write(int the_byte) throws IOException
 398:         {   
 399:             // The other variants of write are inherited from
 400:             // OutputStream, and will call the current version.
 401: 
 402:             crc.update( the_byte );
 403:             out.write ( the_byte );
 404:         }
 405: 
 406:       public void close() throws IOException
 407:         {   
 408:             if( entry.getMethod() == ZipEntry.DEFLATED )
 409:             {
 410:                 D.ebug("\t\tClosing compressed stream. crc="
 411:                                     + entry.getCrc()
 412:                                     + " size="
 413:                                     + entry.getSize() );
 414:             }
 415:             else
 416:             {
 417:                 FastBufferedOutputStream buffer_stream =
 418:                                         (FastBufferedOutputStream)out;
 419: 
 420:                 entry.setCrc( crc.getValue() );
 421:                 entry.setSize( buffer_stream.bytes_written() );
 422: 
 423:                 destination.putNextEntry( entry );
 424: 
 425:                 D.ebug("\t\tClosing stored stream. crc="
 426:                                         + entry.getCrc()
 427:                                         + " size="
 428:                                         + entry.getSize() );
 429: 
 430: 
 431:                 // Transfer data from the buffer to the zip file
 432: 
 433:                 if( buffer_stream.export_buffer_and_close(destination) )
 434:                     D.ebug("\t\t\tGot data from internal buffer");
 435:                 else
 436:                 {   
 437:                     D.ebug("\t\tCopying from temporary file");
 438: 
 439:                     // If we get here, then the data couldn't be
 440:                     // transferred from the internal buffer to the
 441:                     // destination archive because the file was
 442:                     // large enough that the whole file wasn't
 443:                     // contained in the in-memory buffer.
 444: 
 445:                     InputStream in
 446:                         = new FileInputStream(stream.temporary_file());
 447: 
 448:                     byte[]      buffer = new byte[1024];
 449:                     int         got    = 0;
 450:                     while( (got = in.read(buffer)) > 0 )
 451:                         destination.write( buffer, 0, got );
 452:                     in.close();
 453:                     stream.delete_temporary();
 454:                 }
 455:             }
 456: 
 457:             destination.closeEntry();
 458:             write_accomplished();     
 459:         }
 460:     }
 461: 
         
/********************************
A Unit-test class. Run the unit test with
java com.holub.tools.Archive\$Test
Omit the backslash if you're running a Windows shell.
Include an (arbitrary) command-line argument if you want
verbose output, otherwise output is generated (on standard
output) only if a test fails. The exit status is the number
of failed tests -- 0 if none.
*/
 462:   static public class Test
 463:   {   private static final String STORED_ZIP      ="A.stored.zip"    ;
 464:       private static final String COMPRESSED_ZIP  ="A.compressed.zip";
 465:       private static final String FILE_1          ="root.txt"        ;
 466:       private static final String FILE_2          ="/subdir/file.txt";
 467: 
 468:       private static final boolean overwrite = false;
 469:       private static final boolean append    = true;
 470: 
 471:       static public void main(String[] args)
 472:         {
 473:             Tester  t = new Tester(args.length > 0, Std.out());
 474:             try
 475:             {
 476:                 // Remove previous test files to make sure that
 477:                 // everything works as expected.
 478: 
 479:                 new File( STORED_ZIP     ).delete();
 480:                 new File( COMPRESSED_ZIP ).delete();
 481: 
 482:                 // Test to see if we can do simple reads and writes.
 483: 
 484:                 Archive stored  = new Archive( 
 485:                                 STORED_ZIP, Archive.UNCOMPRESSED  );
 486: 
 487:                 Archive compressed  = new Archive(
 488:                                 COMPRESSED_ZIP, Archive.COMPRESSED  );
 489: 
 490:                 OutputStream out = 
 491:                         stored.output_stream_for( FILE_1, overwrite);
 492:                 out.write('a');
 493:                 out.write('b');
 494:                 out.close();
 495: 
 496:                 out = stored.output_stream_for( FILE_2, overwrite);
 497:                 out.write('c');
 498:                 out.close();
 499: 
 500: 
 501:                 out = compressed.output_stream_for( FILE_1, overwrite);
 502:                 out.write('d');
 503:                 out.close();
 504: 
 505:                 out = compressed.output_stream_for( FILE_2, overwrite);
 506:                 out.write('e');
 507:                 out.close();
 508: 
 509:                 stored.close();
 510:                 compressed.close();
 511:                 stored      = new Archive(
 512:                                     STORED_ZIP, Archive.UNCOMPRESSED);
 513:                 compressed  = new Archive(
 514:                                     COMPRESSED_ZIP,Archive.COMPRESSED );
 515: 
 516:                 InputStream in = stored.input_stream_for( FILE_1 );
 517:                 t.check( "Archive.1.0", 'a', in.read() );
 518:                 t.check( "Archive.1.1", 'b', in.read() );
 519:                 t.check( "Archive.1.2", -1,  in.read() );
 520:                 in.close();
 521: 
 522:                 in = stored.input_stream_for( FILE_2 );
 523:                 t.check( "Archive.2.0", 'c', in.read() );
 524:                 t.check( "Archive.2.1", -1,  in.read() );
 525:                 in.close();
 526: 
 527:                 in = compressed.input_stream_for( FILE_1 );
 528:                 t.check( "Archive.3.0", 'd', in.read() );
 529:                 t.check( "Archive.3.1", -1,  in.read() );
 530:                 in.close();
 531: 
 532:                 in = compressed.input_stream_for( FILE_2 );
 533:                 t.check( "Archive.4.0", 'e', in.read() );
 534:                 t.check( "Archive.4.1", -1,  in.read() );
 535:                 in.close();
 536: 
 537:                 stored.close();
 538:                 compressed.close();
 539: 
 540:                 // Test to see if we can do append to existing files
 541:                 
 542:                 stored      = new Archive(
 543:                                     STORED_ZIP, Archive.UNCOMPRESSED);
 544:                 compressed  = new Archive(
 545:                                     COMPRESSED_ZIP, Archive.COMPRESSED);
 546: 
 547:                 out = stored.output_stream_for( FILE_1, append);
 548:                 out.write('B');
 549:                 out.close();
 550: 
 551:                 out = stored.output_stream_for( FILE_2, append);
 552:                 out.write('C');
 553:                 out.close();
 554: 
 555: 
 556:                 out = compressed.output_stream_for( FILE_1, append);
 557:                 out.write('D');
 558:                 out.close();
 559: 
 560:                 out = compressed.output_stream_for( FILE_2, append);
 561:                 out.write('E');
 562:                 out.close();
 563: 
 564:                 stored.close();
 565:                 compressed.close();
 566: 
 567:                 stored      = new Archive(
 568:                                     STORED_ZIP, Archive.UNCOMPRESSED);
 569:                 compressed  = new Archive(
 570:                                     COMPRESSED_ZIP, Archive.COMPRESSED);
 571: 
 572:                 in = stored.input_stream_for( FILE_1 );
 573:                 t.check( "Archive.5.0", 'a', in.read() );
 574:                 t.check( "Archive.5.1", 'b', in.read() );
 575:                 t.check( "Archive.5.2", 'B', in.read() );
 576:                 t.check( "Archive.5.3", -1,  in.read() );
 577:                 in.close();
 578: 
 579:                 in = stored.input_stream_for( FILE_2 );
 580:                 t.check( "Archive.6.0", 'c', in.read() );
 581:                 t.check( "Archive.6.1", 'C', in.read() );
 582:                 t.check( "Archive.6.2", -1,  in.read() );
 583:                 in.close();
 584: 
 585:                 in = compressed.input_stream_for( FILE_1 );
 586:                 t.check( "Archive.7.0", 'd', in.read() );
 587:                 t.check( "Archive.7.1", 'D', in.read() );
 588:                 t.check( "Archive.7.2", -1,  in.read() );
 589:                 in.close();
 590: 
 591:                 in = compressed.input_stream_for( FILE_2 );
 592:                 t.check( "Archive.8.0", 'e', in.read() );
 593:                 t.check( "Archive.8.1", 'E', in.read() );
 594:                 t.check( "Archive.8.2", -1,  in.read() );
 595:                 in.close();
 596: 
 597:                 stored.close();
 598:                 compressed.close();
 599: 
 600:                 // Test to see if we can modify an existing file. Also,
 601:                 // use a large file this time to see if it's really
 602:                 // compressing correctly.
 603: 
 604:                 t.println("Checking large read/write "
 605:                                         +"[takes a few seconds]");
 606: 
 607:                 stored      = new Archive(
 608:                                     STORED_ZIP, Archive.UNCOMPRESSED  );
 609:                 compressed  = new Archive(
 610:                                     COMPRESSED_ZIP);
 611: 
 612:                 OutputStream out2;
 613: 
 614:                 out  = stored.output_stream_for( FILE_1 );
 615:                 out2 = compressed.output_stream_for( FILE_1 );
 616: 
 617:                 in = new FileInputStream("Archive.java");
 618:                 byte[] buffer = new byte[1024];
 619:                 for( int got=0; (got = in.read(buffer)) > 0 ;)
 620:                 {   out.write(buffer,0,got);
 621:                     out2.write(buffer,0,got);
 622:                 }
 623:                 out.close();
 624:                 out2.close();
 625: 
 626:                 stored.close();
 627:                 compressed.close();
 628: 
 629:                 stored      = new Archive(
 630:                                     STORED_ZIP, Archive.UNCOMPRESSED );
 631:                 compressed  = new Archive(
 632:                                     COMPRESSED_ZIP );
 633: 
 634:                 in              = stored.input_stream_for( FILE_1 );
 635:                 InputStream in2 = compressed.input_stream_for( FILE_1 );
 636:                 InputStream sample= new FileInputStream("Archive.java");
 637:                 int         c;
 638: 
 639:                 t.verbose(Tester.OFF);
 640:                 while( (c = sample.read()) >= 0 )
 641:                 {   t.check( "Archive.9.0", c, in.read()  );
 642:                     t.check( "Archive.9.0", c, in2.read() );
 643:                 }
 644:                 in.close();
 645:                 in2.close();
 646:                 sample.close();
 647:                 t.verbose(Tester.RESTORE);
 648:                 t.check( "Archive.9.1", t.errors_were_found()==false,
 649:                                                     "Overwrite-test" );
 650: 
 651:                 stored.close();
 652:                 compressed.close();
 653: 
 654:                 // Test file removal. First remove one file
 655: 
 656:                 stored = new Archive( STORED_ZIP );
 657:                 stored.remove( FILE_1 );
 658:                 stored.close();
 659: 
 660:                 stored = new Archive( STORED_ZIP );
 661:                 try
 662:                 {
 663:                     stored.input_stream_for( FILE_1 ); // should fail
 664:                     t.check( "Archive.10.0.a", false,
 665:                                             "Removal (failure a)" );
 666:                 }
 667:                 catch( ZipException e )
 668:                 {   if( e.getMessage().indexOf(FILE_1) >= 0 )
 669:                         t.check( "Archive.10.0", true, "Removal" );
 670:                     else
 671:                         t.check( "Archive.10.0", false,
 672:                                             "Removal (failure b)" );
 673:                 }
 674: 
 675:                 // Now remove second file, should cause an exception on
 676:                 // the close since an empty archive isn't permitted.
 677: 
 678:                 stored = new Archive( STORED_ZIP );
 679:                 stored.remove( FILE_2 );
 680:                 try
 681:                 {   stored.close();
 682:                     t.check( "Archive.11.0", false, "Remove all files");
 683:                 }
 684:                 catch( ZipException e )
 685:                 {   t.check( "Archive.11.0", true, "Remove all files" );
 686:                 }
 687: 
 688:                 // Finally, test the revert method.
 689: 
 690:                 compressed = new Archive( STORED_ZIP );
 691:                 out = compressed.output_stream_for( "foo" );
 692:                 out.write('x');
 693:                 out.close();
 694:                 compressed.revert();
 695: 
 696:                 stored = new Archive( STORED_ZIP );
 697:                 try
 698:                 {
 699:                     in = stored.input_stream_for("foo"); // should fail
 700:                     t.check( "Archive.12.0.a", false,
 701:                                             "Revert (failure a)" );
 702:                 }
 703:                 catch( ZipException e )
 704:                 {   
 705:                     if( e.getMessage().indexOf("foo") >= 0 )
 706:                         t.check( "Archive.12.0", true, "Revert" );
 707:                     else
 708:                         t.check( "Archive.12.0", false,
 709:                                             "Revert (failure b)" );
 710:                 }
 711:             }
 712:             catch( Exception e )
 713:             {   t.check( "FastBufferedOutputStream.Abort", false,
 714:                                 "Terminated by Exception toss" );
 715:                 e.printStackTrace();
 716:             }
 717:             finally{ t.exit(); }
 718:         }
 719:     }
 720: }
         

Conclusion

So that's the Archive class. Judging by the volume of complaints in Sun's bug database about how hard it is to modify archives, this class should be pretty useful to many people in its own right. As I mentioned in Part 1 of this series, the reason I wrote it was to support a caching class loader, which I'll present in a subsequent column. Next month, I'm going to briefly digress into a discussion of a threading-related problem that's come over my horizon only recently, and which can seriously affect the behavior of your code on multiple-processor machines. This is an important enough issue that I wanted to get it in front of everybody as soon as possible, so I hope you won't be put off by having to wait an extra month for that class loader.

Allen Holub has been working in the computer industry since 1979. He is widely published in magazines (Dr. Dobb's Journal, Programmers Journal, Byte, and MSJ. among others). He writes the Java Toolbox column for JavaWorld and also writes the OO-Design Process column for the IBM developerWorks Component Zone. Moreover, Allen moderates the Programming Theory & Practice discussion in ITworld.com's Java forum. Allen has eight books to his credit, the latest of which covers the traps and pitfalls of Java threading (Taming Java Threads [Berkeley: Apress, 2000; ISBN 1893115100]). He's been designing and building object-oriented software for longer than he cares to remember. After eight years as a C++ programmer, Allen abandoned C++ for Java in early 1996. He now looks at C++ as a bad dream, the memory of which is mercifully fading. He's been teaching programming (first C, then C++ and MFC, now OO-design and Java) both on his own and for the University of California at Berkeley Extension since 1982. Allen offers both public classes and in-house training in Java and object-oriented design topics. He also does object-oriented design consulting and contract Java programming. Get information, and contact Allen, via his Website at http://www.holub.com.

Learn more about this topic