Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Java Tip 130: Do you know your data size?

Don't pay the price for hidden class fields

  • Print
  • Feedback

Page 3 of 4

Of course, the penalty depends on your application's data distribution. Somehow I suspected that 10 characters represents the typical String length for a variety of applications. To get a concrete data point, I instrumented the SwingSet2 demo (by modifying the String class implementation directly) that came with JDK 1.3.x to track the lengths of the Strings it creates. After a few minutes playing with the demo, a data dump showed that about 180,000 Strings were instantiated. Sorting them into size buckets confirmed my expectations:

[0-10]:  96481
[10-20]: 27279
[20-30]: 31949
[30-40]: 7917
[40-50]: 7344
[50-60]: 3545
[60-70]: 1581
[70-80]: 1247
[80-90]: 874
...


That's right, more than 50 percent of all String lengths fell into the 0-10 bucket, the very hot spot of String class inefficiency!

In reality, Strings can consume even more memory than their lengths suggest: Strings generated out of StringBuffers (either explicitly or via the '+' concatenation operator) likely have char arrays with lengths larger than the reported String lengths because StringBuffers typically start with a capacity of 16, then double it on append() operations. So, for example, createString(1) + ' ' ends up with a char array of size 16, not 2.

What do we do?

"This is all very well, but we don't have any choice but to use Strings and other types provided by Java, do we?" I hear you ask. Let's find out.

Wrapper classes

Wrapper classes like java.lang.Integer seem a bad choice for storing large data amounts in memory. If you strive to be memory-economic, avoid them altogether. Rolling your own vector class for primitive ints isn't difficult. Of course, it would be great if the Java core API already contained such libraries. Perhaps the situation will improve when Java has generic types.

Multidimensional arrays

For large data structures built with multidimensional arrays, you can oftentimes reduce the extra dimension overhead by an easy indexing change: convert every int[dim1][dim2] instance to an int[dim1*dim2] instance and change all expressions like a[i][j] to a[i*dim1 + j]. Of course, you pay a price from the lack of index-range checking on dim1 dimension (which also boosts performance).

java.lang.String

You can try a few simple tricks to reduce your application's String static memory size.

First, you can try one common technique when an application loads and caches many Strings from a data file or a network connection, and the String value range proves limited. For example, if you want to parse an XML file in which you frequently encounter a certain attribute, but the attribute is limited to just two possible values. Your goal: filter all Strings through a hash map and reduce all equal but distinct Strings to identical object references:

    public String internString (String s)
    {
        if (s == null) return null;
        
        String is = (String) m_strings.get (s);
        if (is != null)
            return is;
        else
        {
            m_strings.put (s, s);
            return s;
        }
    }
    
    private Map m_strings = new HashMap ();


When applicable, that trick can decrease your static memory requirements by hundreds of percent. An experienced reader may observe that the trick duplicates java.lang.String.intern()'s functionality. Numerous reasons exist to avoid the String.intern() method. One is that few modern JVMs can intern large amounts of data.

  • Print
  • Feedback

Resources