Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
December 12, 2003
Do interned Strings get garbage collected?
Unless you're a Java novice, I am sure you have seen numerous discussions about how Java treats interned Strings and String literals. It is common to see explanations of how s1 and s2 in:
String s1 = "JavaWorld";
String s2 = ("Java" + new String ("World")).intern ();
will always be identical (s1 == s2 always true). In a way, this is correct. However, there is another, more subtle way of looking at what happens with interned
Strings that I discuss below.
The documentation regarding java.lang.String.intern() might make you think that the JVM maintains a String "pool" that stores all interned Strings. Some people also go as far as to claim that once a String is interned, it stays in this pool forever. While such a pool might indeed be used by the implementation, to the best of
my knowledge, existing specifications do not forbid reclamation of unused data in it. (Nor do they explicitly require it;
but not allowing this pool to shrink would make for a less than stellar JVM implementation. Indeed, code that interned too
many Strings could crash earlier JVMs.)
Furthermore, it is incorrect to assume that to uphold the String.intern() contract, the VM must keep the same interned String instance in its internal pool at all times. String.intern() guarantees that if two strings s1 and s2 are equal, then s1.intern() == s2.intern(). To verify this you must have at least two strong references that are the results of s1.intern() and s2.intern(), and you must have them at the same point in time. Should all such references get cleared at any moment and another String equal to s1 and s2 get interned afterwards, the JVM would be within its rights to throw away the original interned String instance and re-intern the same value again when necessary.
Does a falling tree make a sound if no one is around to hear it? By the same token, does an interned String need to exist if there are no references to it? (As it turns out, things are trickier than this.)
Without using something like a custom JVMPI (JVM Profiler Interface) agent, can garbage collection of interned Strings be seen in pure Java code? I present some strong evidence below. My code example exploits the fact that Sun Microsystems'
JVMs use native object pointer values (perhaps after a small bit shift) as Java identity hash values. Thus, if you see different
identity hash codes (System.identityHashCode()), you can be quite certain you're looking at distinct Java object handles. This trick is necessary to release the original
interned String and allow it to be garbage collected. Here is the code:
public class Main
{
public static void main (String [] args)
{
char [] c = new char [] {'J', 'a', 'v', 'a', 'W', 'o', 'r', 'l', 'd'};
String s1 = new String (c).intern ();
System.out.println (System.identityHashCode (s1));
s1 = null; // Release s1 to GC
runGC ();
String s2 = new String (c).intern ();
System.out.println (System.identityHashCode (s2));
}
private static void runGC ()
{
for (int r = 0; r < 4; ++ r) _runGC ();
}
private static void _runGC ()
{
long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE;
for (int i = 0; (usedMem1 < usedMem2) && (i < 500); ++ i)
{
s_runtime.runFinalization ();
s_runtime.gc ();
Thread.yield ();
usedMem2 = usedMem1;
usedMem1 = usedMemory ();
}
}
private static long usedMemory ()
{
return s_runtime.totalMemory () - s_runtime.freeMemory ();
}
private static final Runtime s_runtime = Runtime.getRuntime ();
} // End of class
On Sun's 1.4.2 JVM, running Main produces:
>java -cp bin Main 17332331 26533766
As you can see, s1 is interned and later garbage collected. s2 represents the same string value but is re-interned as a new object. In other words, Java does not prevent the "same" interned
String from being different objects at different times. Comment out the s1 = null; line above and the same interned String handle will be used throughout, as you would expect:
>java -cp bin Main 17332331 17332331
Perhaps you're thinking at this point that the above applies only to Strings "manually" interned via an explicit call to String.intern() and that compile-time String literals are different?
The answer is both yes and no. The same argument from above applies to compile-time String literals as well. When a Java compiler sees "JavaWorld" in:
String foo ()
{
return "JavaWorld";
}
it compiles this code to a .class definition that embeds "JavaWorld" in a special data entry within a constant_pool table. The JVM uses this data (basically, a UTF-8 encoded string) at runtime to create a java.lang.String instance that contains the requested {'J', 'a', 'v', 'a', 'W', 'o', 'r', 'l', 'd'} sequence of characters.
Note that this doesn't happen at classloading and initialization time. It happens on demand, when (and if) foo() is executed for the first time. However, whenever such a String instance is created, it must undergo the usual interning process: all currently loaded classes that referenced this same
string literal in their code will share the same String instance. But when all such classes have been unloaded there's no need to continue to maintain this interned data.
One subtle difference exists here from the case of manually interned Strings. A String instance representing a literal must not be garbage collected for as long as its parent class is loaded and referenced. This
is so that code relying on identity hash codes behaves as expected:
IdentityHashMap map = new IdentityHashMap ();
map.put ("JavaWorld", "VALUE");
... later, in the same or different class:
String value = map.get ("JavaWorld");
It would be very confusing if the second "JavaWorld" literal above sometimes represented a different object instance than the first one (resulting in a null return from map.get()) because the first one has been garbage collected. (Well, this can't really happen above because the map implementation retains
references to all keys, but pretend that it doesn't.) Thus, unlike manually interned Strings, the lifetime of all String instances derived from compile-time string literals must be scoped to the set of their parent classes. It doesn't, however,
need to be scoped to the JVM's lifetime.
The next code example demonstrates how equal String literals do not always correspond to the same String instance in a program. It is more involved than the first example because I need to effect class unloading. To unload a class
I must clear all references to its instances, its Class object, and the class's defining classloader. Because I can't do the latter with the usual application classloader, my code
uses an extra URLClassLoader instance. This custom classloader is also constructed with a null parent to prevent the default delegation to the application classloader:
Archived Discussions (Read only)