Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 4 of 4
What if your Strings are all different? For the second trick, recollect that for small Strings the underlying char array takes half the memory occupied by the String that wraps it. Thus, when my application caches many distinct String values, I can just keep the arrays in memory and convert them to Strings as needed. That works well if each such String then serves as a transient, quickly discarded object. A simple experiment with caching 90,000 words taken from a sample dictionary
file shows that this data takes about 5.6 MB in String form and only 3.4 MB in char[] form, a 65 percent reduction.
The second trick contains one obvious disadvantage: you cannot convert a char[] back to a String through a constructor that would take ownership of the array without cloning it. Why? Because the entire public String API ensures that every String is immutable, so every String constructor defensively clones input data passed through its parameters.
Still, you can try a third trick when the cost of converting from char arrays to Strings proves too high. The trick exploits java.lang.String.substr()'s ability to avoid data copying: the method implementation exploits String immutability and creates a shallow String object that shares the char content array with the original String but has its internal start and end indices adjusted correspondingly. To make an example, new String("smiles").substring(1,5) is a String configured to start at index 1 and end at index 4 within a char buffer "smiles" shared by reference with the originally constructed String. You can exploit that fact as follows: given a large String set, you can merge its char content into one large char array, create a String out of it, and recreate the original Strings as subStrings of this master String, as the following method illustrates:
public static String [] compactStrings (String [] strings)
{
String [] result = new String [strings.length];
int offset = 0;
for (int i = 0; i < strings.length; ++ i)
offset += strings [i].length ();
// Can't use StringBuffer due to how it manages capacity
char [] allchars = new char [offset];
offset = 0;
for (int i = 0; i < strings.length; ++ i)
{
strings [i].getChars (0, strings [i].length (), allchars, offset);
offset += strings [i].length ();
}
String allstrings = new String (allchars);
offset = 0;
for (int i = 0; i < strings.length; ++ i)
result [i] = allstrings.substring (offset,
offset += strings [i].length ());
return result;
}
The above method returns a new set of Strings equivalent to the input set but more compact in memory. Recollect from earlier measurements that every char[] adds 16 bytes of overhead; effectively removed by this method. The savings could be significant when cached data comprises
mostly short Strings. When you apply this trick to the same 90,000-word dictionary mentioned above, the memory size drops from 5.6 MB to 4.2
MB, a 30 percent reduction. (An astute reader will observe in that particular example the Strings tend to share many prefixes and the compactString() method could be further optimized to reduce the merged char array's size.)
As a side effect, compactString() also removes StringBuffer-related inefficiencies mentioned earlier.
To many, the techniques I presented may seem like micro-optimizations not worth the time it takes to implement them. However, remember the applications I had in mind: server-side applications that cache massive amounts of data in memory to achieve performance impossible when data comes from a disk or database. Several hundred megabytes of cached data represents a noticeable fraction of maximum heap sizes of today's 32-bit JVMs. Shaving 30 percent or more off is nothing to scoff at; it could push an application's scalability limits quite noticeably. Of course, these tricks cannot substitute for beginning with well-designed data structures and profiling your application to determine its actual hot spots. In any case, you're now more aware of how much memory your objects consume.
Read more about Tools & Methods in JavaWorld's Tools & Methods section.