Java: A platform for platforms
Sun's reorg may seem promising to shareholders but it's also a scramble for position. The question now is whether Sun can, or wants to, maintain its hold on Java technology. Especially with enterprise leaders like SpringSource and RedHat investing heavily in Java's future as a platform for platforms

Also see:

Discuss: Java: A platform for platforms?

Featured Whitepapers
Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Sometimes you save more space by not compressing data

Can the compressed data length be longer than the uncompressed data

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Q I am working on a highly secure application and need to compress data such as string and byte arrays. I am using the java.util.zip.* classes, but I am having some problems.

First, when using the Deflator and Inflator classes, I get DataFormatExceptions when the string is less than 30 characters.

Second, I have a question about the compression itself. I am using ByteArrayOutputStream and DeflaterOutputStream. I noticed that the compressdata.length() > OriginalData.length() where OriginalData is the uncompressed data. It doesn't seem to make sense that the compressed length is longer than the uncompressed length. Can this be right?

ABefore I directly answer your question, there are a few things that I think I need to mention in order to avoid any confusion or misconceptions. Specifically, you say that you are working on a highly secure system. Compressing your data does nothing to make it more secure. You may know this already, but for everyone's benefit I thought it worth mentioning. Anyone can decompress your compressed data. True, it will take a little work, but once someone knows how you have achieved your compression, decoding it becomes trivial. To be truly secure, you'll need to pre-encrypt your data or encrypt your compressed data using something like SSL.

In order to answer the first part of your question, I tested a string less than 30 characters and one greater than 30 characters. The only time that I could get a DataFormatException was when Inflater and Deflater were constructed with different nowrap values. Be sure that the Inflater and Deflater specify nowrap the same way. If the Deflater sets nowrap to false, the Inflater must do the same. Likewise, if the Deflater sets it to true, the Inflater must set it to true.

Whether or not to set nowrap to true or false depends on your needs. A true nowrap omits the ZLIB header and checksum data from the compressed data. A false no wrap leaves it. However, the Inflater's nowrap must be set to match the compressed input. Otherwise, as we have seen, you will get a DataFormatException.

Your second question raises an important fact about data compression. As strange as it may seem, the compressed data size can be larger than the uncomp ressed size. Depending on your Deflater settings, the Deflater may append a header to the compressed data. This header is used to decode the information and check it for errors. If you deal with very small strings, it is likely that not much real compression has gone on. Cutting a string of 30 characters to 15, while a 50 percent reduction, is only a reduction of 15 characters. As a result, the added size of the header makes the compressed string longer than the original. You will not see the benefits of compression until your data reaches a certain larger, precompression size. It's hard to say what this size is, but generically it is where: (compressed size + header size) < uncompressed size. If your data is not large enough, you're wasting time using compression.

You may also want to consider some of the other compression settings. Some compression algorithms are optimized for time, while others achieve a better compression but take longer to decompress. So the algorithm that you choose goes a long way in determining the final size of your compressed data.

About the author

Tony Sintes is a principal consultant at BroadVision. Tony, a Sun-certified Java 1.1 programmer and Java 2 developer, has worked with Java since 1997.
  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources