Java 101: Java's character and assorted string classes support text-processing

Explore Character, String, StringBuffer, and StringTokenizer

1 2 3 4 5 Page 2
Page 2 of 5

Java also offers a variety of String constructors for creating String objects. I detail three below:

  1. public String(char [] value) creates a new String object that contains a copy of all characters found in the value array parameter. If value is null, this constructor throws a NullPointerException object.
  2. public String(char [] value, int offset, int count) creates a new String that contains a portion of those characters found in value. Copying begins at the offset array index and continues for count characters. If value is null, this constructor throws a NullPointerException object. If either offset or count contain values that lead to invalid array indexes, this constructor throws an IndexOutOfBoundsException object.
  3. public String(String original) creates a new String that contains the same characters as the original-referenced String.

The following code demonstrates the first two constructors:

char [] trueFalse = { 't', 'f', 'T', 'F' };
String s1 = new String (trueFalse);
String s2 = new String (trueFalse, 2, 2);

After this fragment executes, s1 references a String containing tfTF, and s2 references a String containing TF.

The following code demonstrates the third constructor:

String s3 = new String ("123");

That fragment passes a reference to a string literal-based String containing 123 to the String(String original) constructor. That constructor copies original's contents to the new s3-referenced String.

No matter how many times the same string literal appears in source code, the compiler ensures that only one copy stores in the class file's constant pool. Furthermore, the compiler ensures that only nonduplicates of all string constant expressions (such as "a" + 3) end up in the constant pool as string literals. When a classloader creates Strings from all string literal entries in the constant pool, each String's contents are unique. The classloader interns, or confines, those Strings in a common string memory pool located in JVM-managed memory.

At runtime, as new Strings are created, they do not intern in the common string memory pool for performance reasons. Verifying a String's nonexistence in that pool eats up time, especially if many Strings already exist. However, thanks to a String method, code can intern newly created String objects into the pool, which I'll show you how to accomplish later in this article.

For proof that string literal-based Strings store in the common string memory pool and new Strings do not, consider the following code fragment:

String a = "123";
String b = "123";
String c = new String (a);
System.out.println ("a == b: " + (a == b));
System.out.println ("a == c: " + (a == c));

System.out.println ("a == b: " + (a == b)); outputs a == b: true because the compiler stores one copy of 123 in the class file's constant pool. At runtime, a and b receive the same reference to the 123 String object that exists in the common string memory pool. In contrast, System.out.println ("a == c: " + (a == c)); outputs a == c: false because the a-referenced (and b-referenced) String stores in the common string memory pool and the c-referenced String does not. If c existed in the common string memory pool, c, b, and a would all reference the same String (since that pool contains no duplicates). Hence, System.out.println ("a == c: " + (a == c)); would output a == c: true.

Interning Strings in the common string memory pool poses a problem. Because that pool does not permit duplicates, what happens if code interned a String and then changed that object's contents? The pool might then contain two Strings with the same contents, which defeats the string memory savings that internment in the pool provides. For that reason, Java does not let code modify a String. Thus, Strings are immutable, or unchangeable.

Note
Some String methods (such as public String toUpperCase(Locale l)) appear to modify a String, but in actuality, don't. Instead, such methods create a new String containing a modified string.

String method sampler

String contains more than 50 methods (not including constructors), however we'll only examine 13:

Note
Many String methods require an index (also known as an offset) argument for accessing a character in the String object's value array (or a character array argument). That index/offset is always zero-based: index/offset 0 refers to the array's first character.
  • public char charAt(int index) extracts the character at the index position in the current String object's value array and returns that character. This method throws an IndexOutOfBoundsException object if index is negative or equals/exceeds the string's length. Example: String s = "Hello"; System.out.println (s.charAt (0)); (output: H).
  • public int compareToIgnoreCase(String anotherString) performs a lexicographic (dictionary order) case-insensitive comparison between characters in the current String's value array and the value array of the anotherString-referenced String. A zero return value indicates that both arrays contain the same characters; a positive return value indicates that the current String's value array identifies a string that follows the string that anotherString's value array represents; and a negative value indicates that anotherString's string follows the current String's string. This method throws a NullPointerException object if anotherString is null. Example: String s = "abc"; String t = "def"; System.out.println (s.compareToIgnoreCase (t)); (output: -3).
  • public String concat(String str) creates a new String containing the current String's characters followed by the str-referenced String's characters. A reference to the new String returns. However, if str contains no characters, a reference to the current String returns. Example: String s = "Hello,"; System.out.println (s.concat (" World")); (output: Hello, World). Although you can choose concat(String str), the string concatenation operator (+) produces more compact source code. For example, String t = "a"; String s = t + "b"; is more compact than String t = "a"; String s = t.concat ("b");. However, because the compiler converts String t = "a"; String s = t + "b"; to String t = "a"; String s = new StringBuffer ().append (t).append ("b").toString ();, using concat(String str) might seem cheaper. However, their execution times prove similar.
  • Caution
    Do not use either concat(String str) or the string concatenation operator in a loop that executes repeatedly; that can affect your program's performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.)
  • public static String copyValueOf(char [] data) creates a new String containing a copy of all characters in the data array and returns the new String's reference. Example: char [] yesNo = { 'y', 'n', 'Y', 'N' }; String s = String.copyValueOf (yesNo); System.out.println (s); (output: ynYN).
  • public boolean equalsIgnoreCase(String anotherString) performs a case-insensitive comparison of the current String's characters with anotherString's characters. If those characters match (from a case-insensitive perspective), this method returns true. But if either the characters do not match or if anotherString contains a null reference, this method returns false. Example: System.out.println ("Abc".equalsIgnoreCase ("aBC")); (output: true).
  • public int indexOf(int ch) returns the index of ch's first occurrence in the current String's value array. If that character does not exist, -1 returns. Example: System.out.println ("First index = " + "The quick brown fox.".indexOf ('o')); (output: First index = 12).
  • public String intern() interns a String in the common string memory pool. Example: String potentialNapoleanQuote = new String ("Able was I, ere I saw Elba!"); potentialNapoleanQuote.intern ();.
  • Tip
    To quicken string searches, use intern() to intern your Strings in the common string memory pool. Because that pool contains no duplicate Strings, each object has a unique reference. Plus, using == to compare references proves faster than using a method to compare a string's characters.
  • public int lastIndexOf(int ch) returns the index of ch's last occurrence in the current String's value. If that character does not exist, -1 returns. Example: System.out.println ("Last index = " + "The quick brown fox.".lastIndexOf ('o')); (output: Last index = 17).
  • public int length() returns the value stored in count. In other words, this method returns a string's length. If the string is empty, length() returns 0. Example: System.out.println ("abc".length ()); (output: 3).
  • Caution
    Confusing length() with length leads to compiler errors. length() is a method that returns the current number of characters in a String's value array, whereas length is a read-only array field that returns the maximum number of elements in an array.
  • public String substring(int beginIndex, int endIndex) creates a new String that contains every character in the string beginning at beginIndex and ending at one position less than endIndex, and returns that object's reference. However, if beginIndex contains 0 and endIndex contains the string's length, this method returns a reference to the current String. Furthermore, if beginIndex is negative, endIndex is greater than the string's length, or beginIndex is greater than endIndex, this method throws an IndexOutOfBoundsException object. Example: System.out.println ("Test string.".substring (5, 11)); (output: string).
  • public char [] toCharArray() creates a new character array, copies the contents of the current String's value array to the new character array, and returns the new array's reference. Example: String s = new String ("account"); char [] ch = s.toCharArray ();.
  • public String trim() completes one of two tasks:

    1. Creates a new String with the same contents as the current String—except for leading and trailing white space characters (that is, characters with Unicode values less than or equal to 32)—and returns that reference
    2. Returns the current String's reference if no leading/trailing white space characters exist

    Example: System.out.println ("[" + " \tabcd ".trim () + "]");(output: [abcd]).

  • public static string valueOf(int i) creates a new String containing the character representation of i's integer value and returns that object's reference. Example: String s = String.valueOf (20); s += " dollars"; System.out.println (s); (output: 20 dollars).

To demonstrate String's charAt(int index) and length() methods, I prepared a HexDec hexadecimal-to-decimal conversion application:

Listing 2: HexDec.java

// HexDec.java
// Hexadecimal to Decimal
class HexDec
{
   public static void main (String [] args)
   {
      if (args.length != 1)
      {
          System.err.println ("usage: java HexDec hex-character-sequence");
          return;
      }
      // Convert argument from hexadecimal to decimal
      int dec = 0;
      String s = args [0];
      for (int i = 0; i < s.length (); i++)
      {
           char c = s.charAt (i); // Extract character
           // If character is an uppercase letter, convert character to
           // lowercase
           if (Character.isUpperCase (c))
               c = Character.toLowerCase (c);
           if (!(c >= '0' && c <= '9') && !(c >= 'a' && c <= 'f'))
           {
               System.err.println ("invalid character detected");
               return;
           }
           dec <<= 4;
           if (c <= '9')
               dec += (c - '0');
           else
               dec += (c - 'a' + 10);
      }
      System.out.println ("decimal equivalent = " + dec);
   }
}

If you want to convert hexadecimal number 7fff to a decimal, use java HexDec 7fff. You then observe the following output:

decimal equivalent = 32767
Caution
HexDec includes the expression i < s.length () in its for loop header. For long loops, do not call length() in a for loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example: int len = s.length (); for (int i = 0; i < len; i++). For toy programs, like HexDec, it doesn't matter if I call length() in the for loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.)

For another String method demonstration, see Listing 3, which shows how the intern() method and the == operator enable a rapid search of a partial list of country names for a specific country:

Listing 3: CS.java

1 2 3 4 5 Page 2
Page 2 of 5