Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 3 of 4
Of course, the penalty depends on your application's data distribution. Somehow I suspected that 10 characters represents
the typical String length for a variety of applications. To get a concrete data point, I instrumented the SwingSet2 demo (by modifying the String class implementation directly) that came with JDK 1.3.x to track the lengths of the Strings it creates. After a few minutes playing with the demo, a data dump showed that about 180,000 Strings were instantiated. Sorting them into size buckets confirmed my expectations:
[0-10]: 96481 [10-20]: 27279 [20-30]: 31949 [30-40]: 7917 [40-50]: 7344 [50-60]: 3545 [60-70]: 1581 [70-80]: 1247 [80-90]: 874 ...
That's right, more than 50 percent of all String lengths fell into the 0-10 bucket, the very hot spot of String class inefficiency!
In reality, Strings can consume even more memory than their lengths suggest: Strings generated out of StringBuffers (either explicitly or via the '+' concatenation operator) likely have char arrays with lengths larger than the reported String lengths because StringBuffers typically start with a capacity of 16, then double it on append() operations. So, for example, createString(1) + ' ' ends up with a char array of size 16, not 2.
"This is all very well, but we don't have any choice but to use Strings and other types provided by Java, do we?" I hear you ask. Let's find out.
Wrapper classes like java.lang.Integer seem a bad choice for storing large data amounts in memory. If you strive to be memory-economic, avoid them altogether. Rolling
your own vector class for primitive ints isn't difficult. Of course, it would be great if the Java core API already contained such libraries. Perhaps the situation
will improve when Java has generic types.
For large data structures built with multidimensional arrays, you can oftentimes reduce the extra dimension overhead by an
easy indexing change: convert every int[dim1][dim2] instance to an int[dim1*dim2] instance and change all expressions like a[i][j] to a[i*dim1 + j]. Of course, you pay a price from the lack of index-range checking on dim1 dimension (which also boosts performance).
You can try a few simple tricks to reduce your application's String static memory size.
First, you can try one common technique when an application loads and caches many Strings from a data file or a network connection, and the String value range proves limited. For example, if you want to parse an XML file in which you frequently encounter a certain attribute,
but the attribute is limited to just two possible values. Your goal: filter all Strings through a hash map and reduce all equal but distinct Strings to identical object references:
public String internString (String s)
{
if (s == null) return null;
String is = (String) m_strings.get (s);
if (is != null)
return is;
else
{
m_strings.put (s, s);
return s;
}
}
private Map m_strings = new HashMap ();
When applicable, that trick can decrease your static memory requirements by hundreds of percent. An experienced reader may
observe that the trick duplicates java.lang.String.intern()'s functionality. Numerous reasons exist to avoid the String.intern() method. One is that few modern JVMs can intern large amounts of data.