Does an object exist if you can't test its identity?

Can interned Strings be garbage collected?

December 12, 2003

Q: Do interned String s get garbage collected?

A: Unless you're a Java novice, I am sure you have seen numerous discussions about how Java treats interned Strings and String literals. It is common to see explanations of how s1 and s2 in:

    String s1 = "JavaWorld";
    String s2 = ("Java" + new String ("World")).intern ();

will always be identical (s1 == s2 always true). In a way, this is correct. However, there is another, more subtle way of looking at what happens with interned Strings that I discuss below.

Yes, interned Strings are garbage collectable

The documentation regarding java.lang.String.intern() might make you think that the JVM maintains a String "pool" that stores all interned Strings. Some people also go as far as to claim that once a String is interned, it stays in this pool forever. While such a pool might indeed be used by the implementation, to the best of my knowledge, existing specifications do not forbid reclamation of unused data in it. (Nor do they explicitly require it; but not allowing this pool to shrink would make for a less than stellar JVM implementation. Indeed, code that interned too many Strings could crash earlier JVMs.)

Furthermore, it is incorrect to assume that to uphold the String.intern() contract, the VM must keep the same interned String instance in its internal pool at all times. String.intern() guarantees that if two strings s1 and s2 are equal, then s1.intern() == s2.intern(). To verify this you must have at least two strong references that are the results of s1.intern() and s2.intern(), and you must have them at the same point in time. Should all such references get cleared at any moment and another String equal to s1 and s2 get interned afterwards, the JVM would be within its rights to throw away the original interned String instance and re-intern the same value again when necessary.

Does a falling tree make a sound if no one is around to hear it? By the same token, does an interned String need to exist if there are no references to it? (As it turns out, things are trickier than this.)

Without using something like a custom JVMPI (JVM Profiler Interface) agent, can garbage collection of interned Strings be seen in pure Java code? I present some strong evidence below. My code example exploits the fact that Sun Microsystems' JVMs use native object pointer values (perhaps after a small bit shift) as Java identity hash values. Thus, if you see different identity hash codes (System.identityHashCode()), you can be quite certain you're looking at distinct Java object handles. This trick is necessary to release the original interned String and allow it to be garbage collected. Here is the code:

public class Main
{
    public static void main (String [] args)
    {
        char [] c = new char [] {'J', 'a', 'v', 'a', 'W', 'o', 'r', 'l', 'd'};
        
        String s1 = new String (c).intern ();
        System.out.println (System.identityHashCode (s1));
        s1 = null; // Release s1 to GC
        
        runGC ();
        
        String s2 = new String (c).intern ();
        System.out.println (System.identityHashCode (s2));
    }
    private static void runGC ()
    {
        for (int r = 0; r < 4; ++ r) _runGC ();
    }
    private static void _runGC ()
    {
        long usedMem1 = usedMemory (), usedMem2 = Long.MAX_VALUE;
        for (int i = 0; (usedMem1 < usedMem2) && (i < 500); ++ i)
        {
            s_runtime.runFinalization ();
            s_runtime.gc ();
            Thread.yield ();
            
            usedMem2 = usedMem1;
            usedMem1 = usedMemory ();
        }
    }
    private static long usedMemory ()
    {
        return s_runtime.totalMemory () - s_runtime.freeMemory ();
    }
    private static final Runtime s_runtime = Runtime.getRuntime ();
} // End of class

On Sun's 1.4.2 JVM, running Main produces:

>java -cp bin Main
17332331
26533766 

As you can see, s1 is interned and later garbage collected. s2 represents the same string value but is re-interned as a new object. In other words, Java does not prevent the "same" interned String from being different objects at different times. Comment out the s1 = null; line above and the same interned String handle will be used throughout, as you would expect:

>java -cp bin Main
17332331
17332331

String literals are no exception

Perhaps you're thinking at this point that the above applies only to Strings "manually" interned via an explicit call to String.intern() and that compile-time String literals are different?

The answer is both yes and no. The same argument from above applies to compile-time String literals as well. When a Java compiler sees "JavaWorld" in:

    String foo ()
    {
        return "JavaWorld";
    }

it compiles this code to a .class definition that embeds "JavaWorld" in a special data entry within a constant_pool table. The JVM uses this data (basically, a UTF-8 encoded string) at runtime to create a java.lang.String instance that contains the requested {'J', 'a', 'v', 'a', 'W', 'o', 'r', 'l', 'd'} sequence of characters.

Note that this doesn't happen at classloading and initialization time. It happens on demand, when (and if) foo() is executed for the first time. However, whenever such a String instance is created, it must undergo the usual interning process: all currently loaded classes that referenced this same string literal in their code will share the same String instance. But when all such classes have been unloaded there's no need to continue to maintain this interned data.

One subtle difference exists here from the case of manually interned Strings. A String instance representing a literal must not be garbage collected for as long as its parent class is loaded and referenced. This is so that code relying on identity hash codes behaves as expected:

    IdentityHashMap map = new IdentityHashMap ();
    map.put ("JavaWorld", "VALUE");
    ... later, in the same or different class:
    String value = map.get ("JavaWorld");

It would be very confusing if the second "JavaWorld" literal above sometimes represented a different object instance than the first one (resulting in a null return from map.get()) because the first one has been garbage collected. (Well, this can't really happen above because the map implementation retains references to all keys, but pretend that it doesn't.) Thus, unlike manually interned Strings, the lifetime of all String instances derived from compile-time string literals must be scoped to the set of their parent classes. It doesn't, however, need to be scoped to the JVM's lifetime.

The next code example demonstrates how equal String literals do not always correspond to the same String instance in a program. It is more involved than the first example because I need to effect class unloading. To unload a class I must clear all references to its instances, its Class object, and the class's defining classloader. Because I can't do the latter with the usual application classloader, my code uses an extra URLClassLoader instance. This custom classloader is also constructed with a null parent to prevent the default delegation to the application classloader:

import java.io.File;
import java.net.URL;
import java.net.URLClassLoader;
public class Main
{
    public static void main (String [] args) throws Exception
    {
        URL [] customClassPath = new URL [] {new File (args [0]).toURL ()};
        {
            ClassLoader loader = new URLClassLoader (customClassPath, null);
            Object obj = loader.loadClass ("Main$X").newInstance ();
            
            String s1 = obj.toString ();
            System.out.println (System.identityHashCode (s1));
            s1 = null;
            
            // Release X to GC:
            obj = null;
            loader = null;
        }
        runGC ();
        {
            ClassLoader loader = new URLClassLoader (customClassPath, null);
            Object obj = loader.loadClass ("Main$Y").newInstance ();
            
            String s2 = obj.toString ();
            System.out.println (System.identityHashCode (s2));
        }
    }
        
    public static class X
    {
        public String toString ()
        {
            return "JavaWorld";
        }
        
    } // End of nested class
    
    public static class Y
    {
        public String toString ()
        {
            return "JavaWorld";
        }
        
    } // End of nested class
    ... the rest of Main code as before ...
} // End of class

For the sake of simplicity, I also cheat a little and pass Strings via Object.toString() overrides. When you run this in Sun's 1.4.2 JVM (assume that the bin directory contains your compiled classes), you should see a result similar to this:

>java -cp bin Main bin
26533766
8187137

As you can see, after class X is unloaded, the "JavaWorld" string literal is re-interned by class Y. What appears to be the same global String literal in a program is in fact two different objects at different times during execution—did you know this was possible?

Since Java does not allow conversion of object pointers to other data types, the only way to remember an object's identity is to retain the object pointer itself. If you relinquish this pointer, you give up all rights to know what happens to the object afterwards. Since Java Strings are immutable for most programs, it won't matter if the same String literal is represented by the same String instance at all times or not. But occasionally a curious programmer might figure out what happens under the hood using tricks like System.identityHashCode().

Vladimir Roubtsov has programmed in a variety of languages for more than 14 years, including Java since 1995. Currently, he develops enterprise software as a senior engineer for Trilogy in Austin, Texas.

Learn more about this topic