Java Challengers #2: String comparisons

How String methods, keywords, and operators process comparisons in the String pool

JavaWorld - challenger2 stringcomparisons
zeevveez (CC BY 2.0)

In Java, the String class encapsulates an array of char. Put simply, String is an array of characters used to compose words, sentences, or any other data you want.

Encapsulation is one of the most powerful concepts in object-oriented programming. Because of encapsulation, you don’t need to know how the String class works; you just need to know what methods to use on its interface.

When you look at the String class in Java, you can see how the array of char is encapsulated:


public String(char value[]) {
    this(value, 0, value.length, null);
}

To understand encapsulation better, consider a physical object: a car. Do you need to know how the car works under the hood in order to drive it? Of course not, but you do need to know what the interfaces of the car do: things like the accelerator, brakes, and steering wheel. Each of these interfaces supports certain actions: accelerate, brake, turn left, turn right. It’s the same in object-oriented programming.

My first blog in the Java Challengers series introduced method overloading, which is a technique the String class uses extensively. Overloading can make your classes really flexible, including String:


public String(String original) {}
public String(char value[], int offset, int count) {}
public String(int[] codePoints, int offset, int count) {}
public String(byte bytes[], int offset, int length, String charsetName) {}
// And so on…...

Rather than trying to understand how the String class works, this Java Challenger will help you understand what it does and how to use it in your code.

What is a String pool?

String is possibly the most-used class in Java. If a new object was created in the memory heap everytime we used a String, we would waste a lot of memory. The String pool solves this problem by storing just one object for each String value, as shown below.

An image showing how Strings are stored in the String pool. Rafael Chinelato Del Nero

Figure 1. Strings in the String pool

Although we created a String variable for the Duke and Juggy Strings, only two objects are created and stored in the memory heap. For proof, look at the following code sample. (Recall that the “==” operator in Java is used to compare two objects and determine whether they are the same.)


String juggy = "Juggy";
String anotherJuggy = "Juggy";
System.out.println(juggy == anotherJuggy);

This code will return true because the two Strings point to the same object in the String pool. Their values are the same.

An exception: The ‘new’ operator

Now look at this code--it looks similar to the previous sample, but there is a difference.


String duke = new String("duke");
String anotherDuke = new String("duke");

System.out.println(duke == anotherDuke);

Based on the previous example, you might think this code would return true, but it’s actually false. Adding the new operator forces the creation of a new String in the memory heap. Thus, the JVM will create two different objects.

String pools and the intern() method

To store a String in the String pool, we use a technique called String interning. Here’s what Javadoc tells us about the intern() method:


    /**
     * Returns a canonical representation for the string object.
     *
     * A pool of strings, initially empty, is maintained privately by the
     * class {@code String}.
     *
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     *
     * It follows that for any two strings {@code s} and {@code t},
     * {@code s.intern() == t.intern()} is {@code true}
     * if and only if {@code s.equals(t)} is {@code true}.
     * 
     * All literal strings and string-valued constant expressions are
     * interned. String literals are defined in section 3.10.5 of the
     * The Java™ Language Specification.
     *
     * @returns  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
     * @jls 3.10.5 String Literals
     */ public native String intern();

The intern() method is used to store Strings in a String pool. First, it verifies if the String you’ve created already exists in the pool. If not, it creates a new String in the pool. Behind the scenes, the logic of String pooling is based on the Flyweight pattern.

Now, notice what happens when we use the new keyword to force the creation of two Strings:


String duke = new String("duke");
String duke2 = new String("duke");
System.out.println(duke == duke2); // The result will be false here
System.out.println(duke.intern() == duke2.intern()); // The result will be true here

Unlike the previous example with the new keyword, in this case the comparison turns out to be true. That’s because using the intern() method ensures the Strings will be stored in the pool.

Equals method with the String class

The equals() method is used to verify if the state of two Java classes are the same. Because equals() is from the Object class, every Java class inherits it. But the equals() method has to be overridden to make it work properly. Of course, String overrides equals().

Take a look:


public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    
    if (anObject instanceof String) {
        String aString = (String)anObject;
        if (coder() == aString.coder()) {
          return isLatin1() ? StringLatin1.equals(value, aString.value)
            : StringUTF16.equals(value, aString.value);
        }
    }
    
    return false;
}

As you can see, the state of the String class value has to be equals() and not the object reference. It doesn’t matter if the object reference is different; the state of the String will be compared.

Most common String methods

There’s just one last thing you need to know before taking the String comparison challenge. Consider these common methods of the String class:


// Removes spaces from the borders
trim() 
// Gets a substring by indexes
substring(int beginIndex, int endIndex)
// Returns the characters length of the String
length() 
// Replaces String, regex can be used.
replaceAll(String regex, String replacement)
// Verifies if there is a specified CharSequence in the String
contains(CharSequences) 

Take the String comparison challenge!

Let’s try out what you’ve learned about the String class in a quick challenge.

For this challenge, you’ll compare a number of Strings using the concepts we’ve explored. Looking at the code below, can you determine the final value of each results variable?


public class ComparisonStringChallenge {
	public static void main(String... doYourBest) {
		String result = "";
		result += " powerfulCode ".trim() == "powerfulCode" 
				? "0" : "1";

		result += "flexibleCode" == "flexibleCode" ? "2" : "3";
		
		result += new String("doYourBest") 
				== new String("doYourBest") ? "4" : "5";

		result += new String("noBugsProject")
				.equals("noBugsProject") ? "6" : "7";

        result += new String("breakYourLimits").intern()
                == new String("breakYourLimits").intern() ? "8" : "9";

		System.out.println(result);
	}
}

Which output represents the final value of the results variable?

A: 02468
B: 12469
C: 12579
D: 12568

Check your answer here.

What just happened? Understanding String behavior

In the first line of the code, we see:


result += " powerfulCode ".trim() == "powerfulCode" 
				? "0" : "1";

Although the String will be the same after the trim() method is invoked, the String “ powerfulcode “ was different in the beginning. In this case the comparison is false, because when the trim() method removes spaces from the borders it forces the creation of a new String with the new operator.

Next, we see:


result += "flexibleCode" == "flexibleCode" ? "2" : "3";

No mystery here, the Strings are the same in the String pool. This comparison returns true.

Next, we have:


result += new String("doYourBest") 
				== new String("doYourBest") ? "4" : "5";

Using the new reserved keyword forces the creation of two new Strings, whether they are equal or not. In this case the comparison will be false even if the String values are the same.

Next is:


result += new String("noBugsProject")
				.equals("noBugsProject") ? "6" : "7";

Because we’ve used the equals() method, the value of the String will be compared and not the object instance. In that case, it doesn’t matter if the objects are different because the value is being compared. This comparison returns true.

Finally, we have:


result += new String("breakYourLimits").intern()
                == new String("breakYourLimits").intern() ? "8" : "9";

As you’ve seen before, the intern() method puts the String in the String pool. Both Strings point to the same object, so in this case the comparison is true.

Common mistakes with Strings

It can be difficult to know if two Strings are pointing to the same object, especially when the Strings contain the same value. It helps to remember that using the reserved keyword new always results in a new object being created in memory, even if the values are the same.

Using String methods to compare Object references can also be tricky. The key is, if the method changes something in the String, the object references will be different.

A few examples to help clarify:


System.out.println("duke".trim() == "duke".trim());; 

This comparison will be true because the trim() method does not generate a new String.


System.out.println(" duke".trim() == "duke".trim()); 

In this case, the first trim() method will generate a new String because the method will execute its action, so the references will be different.

Finally, when trim() executes its action, it creates a new String:


// Implementation of the trim method in the String class
new String(Arrays.copyOfRange(val, index, index + len),
                          LATIN1);

What to remember about Strings

  • Strings are immutable, so a String’s state can’t be changed.
  • To conserve memory, the JVM keeps Strings in a String pool. When a new String is created, the JVM checks its value and points it to an existing object. If there is no String with that value in the pool, then the JVM creates a new String.
  • Using the == operator compares the object reference. Using the equals() method compares the value of the String. The same rule will be applied to all objects.
  • When using the new operator, a new String will be created in the String pool even if there is a String with the same value.