Zip your data and improve the performance of your network-based applications

Find out how Java can help you package and compress your application's data for better performance

After spending the last few months exploring a small region on the frontier of computer science, I've decided it's time to return to more familiar locales -- the Java class library.

This month, I'd like to show you how to use four classes in the Java class library to package and compress your application's data. Please read on -- you'll be surprised at the difference in performance these four classes can make.

We're living in the '90s, why compress data?

It may seem like ancient history now, but I remember a time when disk space was an expensive commodity -- 20 megabytes (MB) of storage cost upwards of 00! To stretch hard disk space as far as it would go, we compressed data with a number of different compression and archiving tools: DoubleSpace and PKZip spring to mind.

Even though times have changed, and disk space is now relatively inexpensive, the need to compress data hasn't disappeared. However, instead of expensive hard disk space, users are faced with expensive network bandwidth. For network-savvy languages like Java, this creates a real problem.

Remember patiently waiting for those first applets to load? It seemed to take forever to pull all the class files and associated data files across the network. Java's designers noticed this as well, and in version 1.1 they added the java.util.zip package to the Java class library to improve this situation. It provided a standard way for Java applications and applets to compress data.

The savings are impressive. The zip file algorithm compresses class files 25 to 40 percent and text files 70 to 85 percent.

The java.util.zip package includes a number of classes that either compress or support the compression of data. I'll present four. The ZipEntry class represents an entry in a zip file. The ZipOutputStream class and the ZipInputStream class allow applications to write and read zip file data in stream format. The ZipFile class allows applications to read zip file data as a file.

Conceptually speaking...

Let's begin with a handful of concepts.

It's important that you understand a little about the zip file format itself. Figure 1 illustrates the layout of a zip file. A zip file consists of zero or more zip entries -- one for each file stored in the zip file. Each entry contains initial header information followed by the compressed data making up the file. At the end of the zip file, after the last zip entry, is the directory. The directory contains information about each entry in the zip file -- its name, its size, the method used to compress it.

The zip file format

Entries can be added to zip files in two different ways: they can be stored or they can be deflated. Stored entries are not compressed -- they are added to the zip file as-is. Deflated entries are compressed.

A question springs to mind here: why choose store over deflate? It turns out that there are two reasons, one obvious and the other not so obvious (in fact, you might even find it counterintuitive).

First, it's faster to store and retrieve an entry if it's not deflated.

Second, any lossless compression algorithm (of which deflation is one) that can compress some files must expand others. In plain English, what I'm saying is that there's a catch. The deflation algorithm will compress some files, but it will expand others. Obviously, it's better to simply store the files that would be expanded.

When you compress data, you can apply a level of compression. Which level you select will depend on your time-versus-size requirements. The zip format defines ten levels of compression, ranging from 0 (no compression, but very fast) to 9 (best compression, but slow).

Now let's take a look at the classes.

The ZipEntry class

The ZipEntry class is the central figure in our cadre of classes. It represents an entry in a zip file.

ZipEntry is essentially a collection of properties that describe an entry in a zip file. A ZipEntry instance contains an entry's uncompressed size, its last modification time, the method used to compress the entry, the entry's CRC-32 checksum value (a calculated value used to verify the integrity of the entry), an optional user comment, and any platform-specific information about the entry. It also contains the entry's name and (optionally) it's compressed size.

The following methods manipulate these properties:

getSize()

setSize()

Get and set the entry's uncompressed size.

getTime()

setTime()

Get and set the entry's last modification time.

getMethod()

setMethod()

Get and set the entry's compression method.

getCrc()

setCrc()

Get and set the entry's CRC-32 checksum. The CRC-32 checksum is calculated from the uncompressed (not the compressed) data.

getComment()

setComment()

Get and set the entry's comment.

getExtra()

setExtra()

Get and set the any extra (possibly platform-specific) information about the entry.

getName()

Get the entry's name. The name is set when the ZipEntry instance is created.

getCompressedSize()

Get the entry's compressed size.

I'll show you how to use this class in the examples (available for download in the Resources section) that follow.

The ZipOutputStream class

ZipOutputStream is a subclass of class FilterOutputStream. It writes data to an output stream in zip file format. It redefines the write() method so that any data written to the stream is first compressed. It works with any output stream. For example, if the output stream were an instance of FileOutputStream, it would create a zip file on disk.

The following methods manipulate these output streams:

setMethod()

Set the default compression method.

setLevel()

Set the default compression level.

setComment()

Set the zip file's comment.

putNextEntry()

Begin writing a new entry. This method writes information from the ZipEntry instance to the output stream.

write()

Write data to the output stream.

closeEntry()

Finish writing an entry. This method writes additional information about the entry to the output stream.

finish()

Finish writing the zip file data to the output stream.

close()

Close the output stream.

You might be a little confused over the need for both the finish() and close() methods. These two methods serve two related, but slightly different purposes. The finish() method finishes writing the zip file data -- specifically, it writes the data in the zip file directory -- without closing the underlying output stream. This is useful if more than one zip file is to be written to the stream. The close() method finishes writing the zip file data (close() calls finish()) and closes the underlying stream.

The following code demonstrates how to create a ZipOutputStream and how to add entries to it:

import java.io.File; import java.io.FileInputStream;

import java.util.zip.CRC32; import java.util.zip.ZipEntry; import java.util.zip.ZipOutputStream;

import java.util.Date;

public class Zip { public static void main(String [] rgstring) throws Exception { ZipOutputStream zipoutputstream = new ZipOutputStream(System.out);

// Select your choice of stored (not compressed) or // deflated (compressed).

zipoutputstream.setMethod(ZipOutputStream.DEFLATED);

for (int i = 0; i < rgstring.length; i++) { File file = new File(rgstring[i]);

byte [] rgb = new byte [1000];

int n;

FileInputStream fileinputstream;

// Calculate the CRC-32 value. This isn't strictly necessary // for deflated entries, but it doesn't hurt.

CRC32 crc32 = new CRC32();

fileinputstream = new FileInputStream(file);

while ((n = fileinputstream.read(rgb)) > -1) { crc32.update(rgb, 0, n); }

fileinputstream.close();

// Create a zip entry.

ZipEntry zipentry = new ZipEntry(rgstring[i]);

zipentry.setSize(file.length()); zipentry.setTime(file.lastModified()); zipentry.setCrc(crc32.getValue());

// Add the zip entry and associated data.

zipoutputstream.putNextEntry(zipentry);

fileinputstream = new FileInputStream(file);

while ((n = fileinputstream.read(rgb)) > -1) { zipoutputstream.write(rgb, 0, n); }

fileinputstream.close();

zipoutputstream.closeEntry(); }

zipoutputstream.close(); } }

The ZipInputStream class

The ZipInputStream class is a subclass of class FilterInputStream. It reads data in zip file format from an input stream. It redefines the read() method so that any data read from the stream is first decompressed. It works with any input stream.

The following methods manipulate these input streams:

getNextEntry()

Begin reading a new entry. This method reads information from the input stream and creates and returns a ZipEntry instance.

read()

Reads data from the input stream.

closeEntry()

Finish reading an entry. This method skips past any additional information about the entry in the input stream.

close()

Closes the input stream.

The following code demonstrates how to create a ZipInputStream and how to read entries from it:

import java.util.zip.ZipInputStream; import java.util.zip.ZipEntry;

public class Unzip { public static void main(String [] rgstring) throws Exception { ZipInputStream zipinputstream = new ZipInputStream(System.in);

while (true) { // Get the next zip entry. Break out of the loop if there are // no more.

ZipEntry zipentry = zipinputstream.getNextEntry();

if (zipentry == null) break;

// Read data from the zip entry. The read() method will return // -1 when there is no more data to read.

byte [] rgb = new byte [1000];

int n;

while ((n = zipinputstream.read(rgb)) > -1) { // In real life, you'd probably write the data to a file. }

zipinputstream.closeEntry(); }

zipinputstream.close(); } }

The ZipFile class

The ZipFile class provides flexible access to entries in a zip file. It is very efficient and provides much faster access to zip entries than is possible by reading each entry in series. It's only drawback is that it provides only read-only access.

The following methods manipulate these files:

entries()

Get all of the entries in the zip file. This method returns an enumeration of ZipEntry instances -- one for each entry in the zip file.

getEntry()

Get an entry by name.

getInputStream()

Get an input stream for an entry. This method returns an input stream that an application can use to read the stored data for an entry.

close()

Close the zip file.

When the zip file is first opened, the ZipFile instance reads the directory information and creates an instance of ZipEntry for each entry in the zip file.

The following code demonstrates how to create a ZipFile instance and how to use it to read entries:

import java.util.zip.ZipFile; import java.util.zip.ZipEntry;

import java.util.Date; import java.util.Enumeration;

public class Info { public static void main(String [] rgstring) throws Exception { for (int i = 0; i < rgstring.length; i++) { // Create a zip file.

ZipFile zipfile = new ZipFile(rgstring[i]);

// Get all of the zip entries.

Enumeration enumeration = zipfile.entries();

// Print out information.

while (enumeration.hasMoreElements()) { ZipEntry zipentry = (ZipEntry)enumeration.nextElement();

System.out.println(); System.out.println("name = " + zipentry.getName()); System.out.println("time = " + new Date(zipentry.getTime())); System.out.println("size = " + zipentry.getSize()); System.out.println("crc = " + zipentry.getCrc()); }

zipfile.close(); } } }

Conclusion

As conceptually simple as this topic is, it was surprisingly difficult to put this column together. Sun's documentation in this area is very poor, and its design is idiosyncratic at best. I suspect just enough work was done to solve a few immediate problems without much thought given to future needs. In any event, I hope the examples I've provided will answer any lingering questions and allow you to make effective use of these tools.

Todd Sundsted has been writing programs since computers became available in convenient desktop models. Though originally interested in building distributed object applications in C++, Todd moved on to the Java programming language when it became the obvious choice for that sort of thing. In addition to writing, Todd is president of Etcee which offers training, mentoring, consulting, and software development services.

Learn more about this topic

  • Download the complete source as a zip file http://www.javaworld.com/jw-11-1998/howto/jw-11-howto.zip
  • See all of Todd's previous How-To Java columns http://www.javaworld.com/topicalindex/jw-ti-howto.html