Java: A platform for platforms
Sun's reorg may seem promising to shareholders but it's also a scramble for position. The question now is whether Sun can, or wants to, maintain its hold on Java technology. Especially with enterprise leaders like SpringSource and RedHat investing heavily in Java's future as a platform for platforms

Also see:

Discuss: Java: A platform for platforms?

Featured Whitepapers
Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Don't be strung along

The String class's strange behavior explained

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

September 6, 2002

Q I've received quite a collection of string questions in my JavaWorld Java Q&A mailbox. Here is just a sampling:

  1. "If I create two strings the following way:
       String string1 = "hello world";
        String string2 = new String("hello world");
    
    


    then string1 and string2 will have the same hash code. Does that mean that they are actually the same object in the JVM?"

  2. "In Java, if I create a String object, I can compare it to other String objects using the equals() method. However, if I initialize a String like this:
    String str="hello"
    
    


    and another like this:

    String s="hello"
    
    


    then I can compare them using the == sign. Why?"

  3. "If I code this:
    a = "hello"
    b = "hello"
    c = new string("hello")
    d = "hello"
    
    


    a and b both refer to the same "hello" String object in memory, whereas c refers to a separate "hello" String object that exists concurrently with the first "hello" String object. Therefore, two "hello" objects exist in memory. Which of the two "hello" objects will d refer to? How is it decided?"



A

Let's examine each question in turn.

Question 1: Are they the same object?

You are correct: both objects will have the same hash code. As stated in the Javadocs, the string's hash code is computed according to the following formula:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1] using int arithmetic, where s[i] is the ith character of the string, n is the length of the string, and ^ indicates exponentiation. (The hash value of the empty string is zero.)


Since both strings have the same character sequence, the hashcode() methods will compute the same value. That being said, string1 and string2 do not point to the same object. They point to different objects! The call String string1 = "hello world"; results in the allocation of one String object: "hello world". Explicitly calling new in String string2 = new String("hello world"); forces the creation of a second String object in memory: "hello world".

String allocation, like all object allocation, proves costly in both time and memory. To cut down the number of String objects created in the JVM, the String class keeps a pool of strings. Each time you create a string literal, the pool is checked. If the string already exists in the pool, a reference to the pooled instance returns. If the string does not exist in the pool, a new String object instantiates, then is placed in the pool. Java can make this optimization since strings are immutable and can be shared without fear of data corruption.

Unfortunately, creating a string through new defeats this pooling mechanism by creating multiple String objects, even if an equal string already exists in the pool. Considering all that, avoid new String unless you specifically know that you need it!

Question 2: Why can we use equals() and == on strings?



The answer to Question 2 directly relates to the answer to Question 1. Because string literals are pooled, the following code causes the JVM to create just one String object:

 
String str="hello"; 
String s="hello"


Thus, the reference pointed to by str and s is actually the same. Therefore, == returns the correct result. However, relying on == for string equality checking is unsafe. Let's say someone says String tricky = new String("hello"). In that case, the JVM will not check whether a "hello" string object already exists. Instead, the JVM will blindly allocate another string in memory. The new call forces the JVM to create a new object. If you say tricky == s, the expression will return false, since tricky and s are not references to the same object.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources