Java's character and assorted string classes support text-processing

Explore Character, String, StringBuffer, and StringTokenizer

Text can represent a combination of digits, letters, punctuation, words, sentences, and more. Computer programs that process text need assistance (from their associated languages) to represent and manipulate text. Java provides such assistance through the Character, String, StringBuffer, and StringTokenizer classes. In this article, you'll create objects from these classes and examine their various methods. You'll also receive answers to three mysteries: why Java regards a string literal as a String object, why String objects are immutable (and how immutability relates to string internment), and what happens behind the scenes when the string concatenation operator concatenates two strings into a single string.

Note
Future articles will cover the Character, String, StringBuffer, and StringTokenizer methods that I omit in this discussion.

The Character class

Though Java already has a character type and char keyword to represent and manipulate characters, the language also requires a Character class for two reasons:

  1. Many data structure classes require their data structure objects to store other objects—not primitive type variables. Because directly storing a char variable in these objects proves impossible, that variable's value must wrap inside a Character object, which subsequently stores in a data structure object.
  2. Java needs a class to store various character-oriented utility methods—static methods that perform useful tasks and do not require Character objects; for example, a method that converts an arbitrary character argument representing a lowercase letter to another character representing the uppercase equivalent.

Character objects

The java.lang.Character class declares a private value field of character type. A character stores in value when code creates a Character object via class Character's public Character(char c) constructor, as the following code fragment demonstrates:

Character c = new Character ('A');

The constructor stores the character that 'A' represents in the value field of a new Character object that c references. Because the Character object wraps itself around the character, Character is a wrapper class.

By calling Character's public char charValue() method, code extricates the character from the Character object. Furthermore, by calling Character's public String toString() method, code returns the character as a String object. The following code, which builds on the previous fragment, demonstrates both method calls:

System.out.println (c.charValue ());
String s = c.toString ();

System.out.println (c.charValue ()); returns value's contents and outputs those contents (A) to the standard output device. String s = c.toString (); creates a String object containing value's contents, returns the String's reference, and assigns that reference to String variable s.

Character supplies three methods that compare Character objects for ordering or other purposes. The public int compareTo(Character anotherCharacter) method compares the contents of two Characters by subtracting anotherCharacter's value field from the current Character's value field. The integer result returns. If the result is zero, both objects are the same (based on the value field only). If the result is negative, the current Character's value is numerically less than the anotherCharacter-referenced Character's value. Finally, a positive result implies that the current Character's value field is numerically greater than anotherCharacter's value field. A second overloaded public int compareTo(Object o) method works the same as compareTo(Character anotherCharacter) (and returns the same result), but compares the current Character and the o-referenced object (which must be of type Character, or the method throws a ClassCastException object). compareTo(Object o) allows Java's Collections Framework to sort Characters according to natural order. (A future article will discuss that method, sorting, and natural order.) Finally, the public final boolean equals(Object o) method compares the contents of the value field in the current Character with the contents of the value field in o. A Boolean true value returns if o is of type Character and if both value fields contain the same contents. Otherwise, false returns. To see the compareTo(Character anotherCharacter) and equals(Object o) methods in action, examine the following code fragment:

Character c1 = new Character ('A');
Character c2 = new Character ('B');
Character c3 = new Character ('A');
System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2));
System.out.println ("c1.equals (c2): " + c1.equals (c2));
System.out.println ("c1.equals (c3): " + c1.equals (c3));

System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2)); outputs -1 because A is (numerically) less than B. System.out.println ("c1.equals (c2): " + c1.equals (c2)); outputs false because the Characters that c1 and c2 reference contain different characters (A and B). Finally, System.out.println ("c1.equals (c3): " + c1.equals (c3)); outputs true because, although c1 and c3 reference different Characters, both objects contain the same character (A).

Character-oriented utility methods

Character serves as a repository for character-oriented utility methods. Examples of those methods include:

  • public static boolean isDigit(char c), which returns a Boolean true value if c's character is a digit. Otherwise, false returns.
  • public static boolean isLetter(char c), which returns a Boolean true value if c's character is a letter. Otherwise, false returns.
  • public static boolean isUpperCase(char c), which returns a Boolean true value if c's character is an uppercase letter. Otherwise, false returns.
  • public static char toLowerCase(char c), which returns the lowercase equivalent of c's character if it is uppercase. Otherwise c's character returns.
  • public static char toUpperCase(char c), which returns the uppercase equivalent of c's character if it is lowercase. Otherwise c's character returns.

The following code fragment demonstrates those five methods:

System.out.println (Character.isDigit ('4')); // Output: true
System.out.println (Character.isLetter (';')); // Output: false
System.out.println (Character.isUpperCase ('X')); // Output: true
System.out.println (Character.toLowerCase ('B')); // Output: b
System.out.println (Character.toUpperCase ('a')); // Output: A

Another useful utility method is Character's public static char forDigit(int digit, int radix), which converts digit's integer value to its character equivalent in the number system that radix specifies and returns the result. However, if digit identifies an integer less than zero or greater than or equal to radix's value, forDigit(int digit, int radix) returns the null character (represented in source code as Unicode escape sequence '\u0000'). Similarly, if radix identifies an integer less than Character's MIN_RADIX constant or greater than Character's MAX_RADIX constant, forDigit(int digit, int radix) returns the null character. The following code demonstrates that method:

for (int i = 0; i < 16; i++)
     System.out.println (Character.forDigit (i, 16));

That fragment converts integer numbers 0 through 15 to their character equivalents in the hexadecimal number system and outputs those character equivalents (0 through f).

To complement the forDigit(int digit, int radix) method, Character provides the public static int digit(char c, int radix) method, which converts the c-specified character value in the radix-specified number system, to the value's integer equivalent and returns the result. If c contains a nondigit character for the specified number system or radix is not in the MIN_RADIX/MAX_RADIX range, digit(char c, int radix) returns -1. The following code demonstrates that method:

char [] digits = { '0', '1', '2', '3', '4', '5', '6', '7',
                   '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'x' };
for (int i = 0; i < digits.length; i++)
     System.out.println (Character.digit (digits [i], 16));

The fragment above converts the digits array's digit characters to their integer equivalents and outputs the results. Apart from the last character, each character represents a hexadecimal digit. (Passing 16 as the radix argument informs digit(char c, int radix) that the number system is hexadecimal.) Because x does not represent a hexadecimal digit, digit(char c, int radix) outputs -1 when it encounters that character.

To demonstrate Character's isDigit(char c) and isLetter(char c) methods, I've created a CA (character analysis) application that counts a text file's digits, letters, and other characters. In addition to printing those counts, CA calculates and prints each count's percentage of the total count. Listing 1 presents CA's source code (don't worry about the file-reading logic: I'll explain FileInputStream and other file-related concepts in a future article):

Listing 1: CA.java

// CA.java
// Character Analysis
import java.io.*;
class CA
{
   public static void main (String [] args)
   {
      int ch, ndigits = 0, nletters = 0, nother = 0;
      if (args.length != 1)
      {
          System.err.println ("usage: java CA filename");
          return;
      }
      FileInputStream fis = null;
      try
      {
          fis = new FileInputStream (args [0]);
          while ((ch = fis.read ()) != -1)
             if (Character.isLetter ((char) ch))
                 nletters++;
             else
             if (Character.isDigit ((char) ch))
                 ndigits++;
             else
                 nother++;
          System.out.println ("num letters = " + nletters);
          System.out.println ("num digits = " + ndigits);
          System.out.println ("num other = " + nother + "\r\n");
          int total = nletters + ndigits + nother;
          System.out.println ("% letters = " +
                              (double) (100.0 * nletters / total));
          System.out.println ("% digits = " +
                              (double) (100.0 * ndigits / total));
          System.out.println ("% other = " +
                              (double) (100.0 * nother / total));
      }
      catch (IOException e)
      {
          System.err.println (e);
      }
      finally
      {
          try
          {
              fis.close ();
          }
          catch (IOException e)
          {
          }
      }
   }
}

If you want to perform a character analysis on CA's source file—CA.java—execute java CA ca.java. You see the following output:

num letters = 609
num digits = 18
num other = 905
% letters = 39.75195822454308
% digits = 1.174934725848564
% other = 59.07310704960835

The String class

The String class contrasts with Character in that a String object stores a sequence of characters—a string—whereas a Character object stores one character. Because strings are pervasive in text-processing and other programs, Java offers two features that simplify developer interaction with String objects: simplified assignment and an operator that concatenates strings. This section examines those features.

String objects

A java.lang.String object stores a character sequence in a character array that String's private value field variable references. Furthermore, String's private count integer field variable maintains the number of characters in that array. Each String has its own copy of those fields, and Java's simplified assignment shortcut offers the easiest way to create a String and store a string in the String's value array, as the following code demonstrates:

public static void main (String [] args)
{
   String s = "abc";
   System.out.println (s); // Output: abc
}

When the compiler compiles the preceding fragment, it stores the abc string literal in a special area of the class file—the constant pool, which is a collection of string literals, integer literals, and other constants. The compiler also generates a byte code instruction (ldc—load constant) that pushes a reference to a String object containing abc onto the calling thread's stack, and generates another instruction (astore_1) that pops that reference from the stack and stores it in the s local variable, which corresponds to local variable 1 at the JVM level.

What creates that String object and when? Neither the Java Language Specification nor the Java Virtual Machine Specification offer answers that I can find. Instead, I speculate the following: When a classloader—a concept I'll discuss in a future article—loads a class file, it scans its constant pool's memory copy. For each string literal in that pool, the classloader creates a String, populates that object with the string literal's characters, and modifies the string literal's entry in the constant pool's memory copy so ldc pushes the String's reference onto the calling thread's stack.

Because the compiler and classloader treat string literals as String objects, "abc".length() and synchronized ("sync object") are legal. "abc".length() returns the length of the String containing abc; and synchronized ("sync object") grabs the lock associated with the String containing sync object. Java regards these and other string literals as String objects to serve as a convenience for developers. As with the simplified assignment shortcut, substituting string literals for String object reference variables reduces the amount of code you must write.

Java also offers a variety of String constructors for creating String objects. I detail three below:

  1. public String(char [] value) creates a new String object that contains a copy of all characters found in the value array parameter. If value is null, this constructor throws a NullPointerException object.
  2. public String(char [] value, int offset, int count) creates a new String that contains a portion of those characters found in value. Copying begins at the offset array index and continues for count characters. If value is null, this constructor throws a NullPointerException object. If either offset or count contain values that lead to invalid array indexes, this constructor throws an IndexOutOfBoundsException object.
  3. public String(String original) creates a new String that contains the same characters as the original-referenced String.

The following code demonstrates the first two constructors:

char [] trueFalse = { 't', 'f', 'T', 'F' };
String s1 = new String (trueFalse);
String s2 = new String (trueFalse, 2, 2);

After this fragment executes, s1 references a String containing tfTF, and s2 references a String containing TF.

The following code demonstrates the third constructor:

String s3 = new String ("123");

That fragment passes a reference to a string literal-based String containing 123 to the String(String original) constructor. That constructor copies original's contents to the new s3-referenced String.

No matter how many times the same string literal appears in source code, the compiler ensures that only one copy stores in the class file's constant pool. Furthermore, the compiler ensures that only nonduplicates of all string constant expressions (such as "a" + 3) end up in the constant pool as string literals. When a classloader creates Strings from all string literal entries in the constant pool, each String's contents are unique. The classloader interns, or confines, those Strings in a common string memory pool located in JVM-managed memory.

At runtime, as new Strings are created, they do not intern in the common string memory pool for performance reasons. Verifying a String's nonexistence in that pool eats up time, especially if many Strings already exist. However, thanks to a String method, code can intern newly created String objects into the pool, which I'll show you how to accomplish later in this article.

For proof that string literal-based Strings store in the common string memory pool and new Strings do not, consider the following code fragment:

String a = "123";
String b = "123";
String c = new String (a);
System.out.println ("a == b: " + (a == b));
System.out.println ("a == c: " + (a == c));

System.out.println ("a == b: " + (a == b)); outputs a == b: true because the compiler stores one copy of 123 in the class file's constant pool. At runtime, a and b receive the same reference to the 123 String object that exists in the common string memory pool. In contrast, System.out.println ("a == c: " + (a == c)); outputs a == c: false because the a-referenced (and b-referenced) String stores in the common string memory pool and the c-referenced String does not. If c existed in the common string memory pool, c, b, and a would all reference the same String (since that pool contains no duplicates). Hence, System.out.println ("a == c: " + (a == c)); would output a == c: true.

Interning Strings in the common string memory pool poses a problem. Because that pool does not permit duplicates, what happens if code interned a String and then changed that object's contents? The pool might then contain two Strings with the same contents, which defeats the string memory savings that internment in the pool provides. For that reason, Java does not let code modify a String. Thus, Strings are immutable, or unchangeable.

Note
Some String methods (such as public String toUpperCase(Locale l)) appear to modify a String, but in actuality, don't. Instead, such methods create a new String containing a modified string.

String method sampler

String contains more than 50 methods (not including constructors), however we'll only examine 13:

Note
Many String methods require an index (also known as an offset) argument for accessing a character in the String object's value array (or a character array argument). That index/offset is always zero-based: index/offset 0 refers to the array's first character.
  • public char charAt(int index) extracts the character at the index position in the current String object's value array and returns that character. This method throws an IndexOutOfBoundsException object if index is negative or equals/exceeds the string's length. Example: String s = "Hello"; System.out.println (s.charAt (0)); (output: H).
  • public int compareToIgnoreCase(String anotherString) performs a lexicographic (dictionary order) case-insensitive comparison between characters in the current String's value array and the value array of the anotherString-referenced String. A zero return value indicates that both arrays contain the same characters; a positive return value indicates that the current String's value array identifies a string that follows the string that anotherString's value array represents; and a negative value indicates that anotherString's string follows the current String's string. This method throws a NullPointerException object if anotherString is null. Example: String s = "abc"; String t = "def"; System.out.println (s.compareToIgnoreCase (t)); (output: -3).
  • public String concat(String str) creates a new String containing the current String's characters followed by the str-referenced String's characters. A reference to the new String returns. However, if str contains no characters, a reference to the current String returns. Example: String s = "Hello,"; System.out.println (s.concat (" World")); (output: Hello, World). Although you can choose concat(String str), the string concatenation operator (+) produces more compact source code. For example, String t = "a"; String s = t + "b"; is more compact than String t = "a"; String s = t.concat ("b");. However, because the compiler converts String t = "a"; String s = t + "b"; to String t = "a"; String s = new StringBuffer ().append (t).append ("b").toString ();, using concat(String str) might seem cheaper. However, their execution times prove similar.
  • Caution
    Do not use either concat(String str) or the string concatenation operator in a loop that executes repeatedly; that can affect your program's performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.)
  • public static String copyValueOf(char [] data) creates a new String containing a copy of all characters in the data array and returns the new String's reference. Example: char [] yesNo = { 'y', 'n', 'Y', 'N' }; String s = String.copyValueOf (yesNo); System.out.println (s); (output: ynYN).
  • public boolean equalsIgnoreCase(String anotherString) performs a case-insensitive comparison of the current String's characters with anotherString's characters. If those characters match (from a case-insensitive perspective), this method returns true. But if either the characters do not match or if anotherString contains a null reference, this method returns false. Example: System.out.println ("Abc".equalsIgnoreCase ("aBC")); (output: true).
  • public int indexOf(int ch) returns the index of ch's first occurrence in the current String's value array. If that character does not exist, -1 returns. Example: System.out.println ("First index = " + "The quick brown fox.".indexOf ('o')); (output: First index = 12).
  • public String intern() interns a String in the common string memory pool. Example: String potentialNapoleanQuote = new String ("Able was I, ere I saw Elba!"); potentialNapoleanQuote.intern ();.
  • Tip
    To quicken string searches, use intern() to intern your Strings in the common string memory pool. Because that pool contains no duplicate Strings, each object has a unique reference. Plus, using == to compare references proves faster than using a method to compare a string's characters.
  • public int lastIndexOf(int ch) returns the index of ch's last occurrence in the current String's value. If that character does not exist, -1 returns. Example: System.out.println ("Last index = " + "The quick brown fox.".lastIndexOf ('o')); (output: Last index = 17).
  • public int length() returns the value stored in count. In other words, this method returns a string's length. If the string is empty, length() returns 0. Example: System.out.println ("abc".length ()); (output: 3).
  • Caution
    Confusing length() with length leads to compiler errors. length() is a method that returns the current number of characters in a String's value array, whereas length is a read-only array field that returns the maximum number of elements in an array.
  • public String substring(int beginIndex, int endIndex) creates a new String that contains every character in the string beginning at beginIndex and ending at one position less than endIndex, and returns that object's reference. However, if beginIndex contains 0 and endIndex contains the string's length, this method returns a reference to the current String. Furthermore, if beginIndex is negative, endIndex is greater than the string's length, or beginIndex is greater than endIndex, this method throws an IndexOutOfBoundsException object. Example: System.out.println ("Test string.".substring (5, 11)); (output: string).
  • public char [] toCharArray() creates a new character array, copies the contents of the current String's value array to the new character array, and returns the new array's reference. Example: String s = new String ("account"); char [] ch = s.toCharArray ();.
  • public String trim() completes one of two tasks:

    1. Creates a new String with the same contents as the current String—except for leading and trailing white space characters (that is, characters with Unicode values less than or equal to 32)—and returns that reference
    2. Returns the current String's reference if no leading/trailing white space characters exist

    Example: System.out.println ("[" + " \tabcd ".trim () + "]");(output: [abcd]).

  • public static string valueOf(int i) creates a new String containing the character representation of i's integer value and returns that object's reference. Example: String s = String.valueOf (20); s += " dollars"; System.out.println (s); (output: 20 dollars).

To demonstrate String's charAt(int index) and length() methods, I prepared a HexDec hexadecimal-to-decimal conversion application:

Listing 2: HexDec.java

// HexDec.java
// Hexadecimal to Decimal
class HexDec
{
   public static void main (String [] args)
   {
      if (args.length != 1)
      {
          System.err.println ("usage: java HexDec hex-character-sequence");
          return;
      }
      // Convert argument from hexadecimal to decimal
      int dec = 0;
      String s = args [0];
      for (int i = 0; i < s.length (); i++)
      {
           char c = s.charAt (i); // Extract character
           // If character is an uppercase letter, convert character to
           // lowercase
           if (Character.isUpperCase (c))
               c = Character.toLowerCase (c);
           if (!(c >= '0' && c <= '9') && !(c >= 'a' && c <= 'f'))
           {
               System.err.println ("invalid character detected");
               return;
           }
           dec <<= 4;
           if (c <= '9')
               dec += (c - '0');
           else
               dec += (c - 'a' + 10);
      }
      System.out.println ("decimal equivalent = " + dec);
   }
}

If you want to convert hexadecimal number 7fff to a decimal, use java HexDec 7fff. You then observe the following output:

decimal equivalent = 32767
Caution
HexDec includes the expression i < s.length () in its for loop header. For long loops, do not call length() in a for loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example: int len = s.length (); for (int i = 0; i < len; i++). For toy programs, like HexDec, it doesn't matter if I call length() in the for loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.)

For another String method demonstration, see Listing 3, which shows how the intern() method and the == operator enable a rapid search of a partial list of country names for a specific country:

Listing 3: CS.java

// CS.java
// Country search
import java.io.*;
class CS
{
   static String [] countries =
   {
      "Argentina",
      "Australia",
      "Bolivia",
      "Brazil",
      "Canada",
      "Chile",
      "China",
      "Denmark",
      "Egypt",
      "England",
      "France",
      "India",
      "Iran",
      "Ireland",
      "Iraq",
      "Israel",
      "Japan",
      "Jordan",
      "Pakistan",
      "Russia",
      "Scotland",
      "South Africa",
      "Sweden",
      "Syria",
      "United States"
   };
   public static void main (String [] args)
   {
      int i;
      if (args.length != 1)
      {
          System.err.println ("usage: java CS country-name");
          return;
      }
      String country = args [0];
      // First search attempt using == operator
      for (i = 0; i < countries.length; i++)
           if (country == countries [i])
           {
               System.out.println (country + " found");
               break;
           }
      if (i == countries.length)
          System.out.println (country + " not found");
      // Intern country string
      country = country.intern ();
      // Second search attempt using == operator
      for (i = 0; i < countries.length; i++)
           if (country == countries [i])
           {
               System.out.println (country + " found");
               break;
           }
      if (i == countries.length)
          System.out.println (country + " not found");
   }       
}

CS attempts twice to locate a specific country name in an array of country names with the == operator. The first attempt fails because the country name string literals end up as Strings in the common string memory pool, and the String containing the name being searched is not in that pool. After the first search attempt, country = country.intern (); interns that String in the pool; this second search most likely succeeds, depending on the name being searched. For example, java CS Argentina produces the following output:

Argentina not found
Argentina found

The StringBuffer class

String is not always the best choice for representing strings in a program. The reason: Its immutability causes String methods, such as substring(int beginIndex, int endIndex), to create new String objects, rather than modify the original String objects. In many situations, that leads to unreferenced Strings that become eligible for garbage collection. When many unreferenced Strings are created within a long loop, overall heap memory reduces, and the garbage collector might need to perform many collections, which can affect a program's performance, as the following code demonstrates:

String s = "abc";
String t = "def";
String u = "";
for (int i = 0; i < 100000; i++)
     u = u.concat (s).concat (t);

u.concat (s) creates a String containing the u-referenced String's characters followed by the s-referenced String's characters. The new String's reference subsequently returns and identifies a String, named a to prevent confusion, on which concat (t) is called. The concat (t) method call results in a new String object, b, that contains a's characters followed by the t-referenced String's characters. a is discarded (because its reference disappears) and b's reference assigns to u (which results in u becoming eligible for garbage collection).

During each loop iteration, two Strings are discarded. By the loop's end, assuming garbage collection has not occurred, 200,000 Strings that occupy around 2,000,000 bytes await garbage collection. If garbage collection occurs during the loop, this portion of a program's execution takes longer to complete. That could prove problematic if the above code must complete within a limited time period. The StringBuffer class solves this problem.

StringBuffer objects

In many ways, the java.lang.StringBuffer class resembles its String counterpart. For example, as with String, a StringBuffer object stores a character sequence in a character array that StringBuffer's private value field variable references. Also, StringBuffer's private count integer field variable records that array's character number. Finally, both classes declare a few same-named methods with identical signatures, such as public int indexOf(String str).

Unlike String objects, StringBuffer objects represent mutable, or changeable, strings. As a result, a StringBuffer method can modify a StringBuffer object. If the modification produces more characters than value can accommodate, the StringBuffer object automatically creates a new value array with double the capacity (plus two additional array elements) of the current value array, and copies all characters from the old array to the new array. (After all, Java arrays have a fixed size.) Capacity represents the maximum number of characters a StringBuffer's value array can store.

Create a StringBuffer object via any of the following constructors:

  • public StringBuffer() creates a new StringBuffer object that contains no characters but can contain up to 16 characters before automatically expanding. StringBuffer has an initial capacity of 16 characters.
  • public StringBuffer(int initCap) creates a new StringBuffer that contains no characters and up to initCap characters before automatically expanding. If initCap is negative, this constructor throws a NegativeArraySizeException object. StringBuffer has an initial capacity of initCap.
  • public StringBuffer(String str) creates a new StringBuffer that contains all characters in the str-referenced String and up to 16 additional characters before automatically expanding. StringBuffer's initial capacity is the length of str's string plus 16.

The following code fragment demonstrates all three constructors:

StringBuffer sb1 = new StringBuffer ();
StringBuffer sb2 = new StringBuffer (100);
StringBuffer sb3 = new StringBuffer ("JavaWorld");

StringBuffer sb1 = new StringBuffer (); creates a StringBuffer with no characters and an initial capacity of 16. StringBuffer sb2 = new StringBuffer (100); creates a StringBuffer with no characters and an initial capacity of 100. Finally, StringBuffer sb3 = new StringBuffer ("JavaWorld"); creates a StringBuffer containing JavaWorld and an initial capacity of 25.

StringBuffer method sampler

Since we already examined StringBuffer's constructor methods, we now examine the nonconstructor methods. For brevity, I focus on only 13 methods.

Note
Like String, many StringBuffer methods require an index argument for accessing a character in the StringBuffer's value array (or a character array argument). That index/offset is always zero-based.
  • public StringBuffer append(char c) appends c's character to the contents of the current StringBuffer's value array and returns a reference to the current StringBuffer. Example: StringBuffer sb = new StringBuffer ("abc"); sb.append ('d'); System.out.println (sb); (output: abcd).
  • public StringBuffer append(String str) appends the str-referenced String's characters to the contents of the current StringBuffer's value array and returns a reference to the current StringBuffer. Example: StringBuffer sb = new StringBuffer ("First,"); sb.append (" second"); System.out.println (sb); (output: First, second).
  • public int capacity() returns the current StringBuffer's current capacity (that is, value's length). Example: StringBuffer sb = new StringBuffer (); System.out.println (sb.capacity ()); (output: 16).
  • public char charAt(int index) extracts and returns the character at the index position in the current StringBuffer's value array. This method throws an IndexOutOfBoundsException object if index is negative, equals the string's length, or exceeds that length. Example: StringBuffer sb = new StringBuffer ("Test string"); for (int i = 0; i < sb.length (); i++) System.out.print (sb.charAt (i)); (output: Test string).
  • public StringBuffer deleteCharAt(int index) removes the character at the index position in the current StringBuffer's value array. If index is negative, equals the string's length, or exceeds that length, this method throws a StringIndexOutOfBoundsException object. Example: StringBuffer sb = new StringBuffer ("abc"); sb.deleteCharAt (1); System.out.println (sb); (output: ac).
  • public void ensureCapacity(int minimumCapacity) ensures the current StringBuffer's current capacity is larger than minimumCapacity and twice the current capacity. If minimumCapacity is negative, this method returns without doing anything. The following code demonstrates this method:

    StringBuffer sb = new StringBuffer ("abc"); 
    System.out.println (sb.capacity ()); 
    sb.ensureCapacity (20);
    System.out.println (sb.capacity ());
    

    The fragment produces the following output:

    19
    40
    
  • Tip
    Because it takes time for a StringBuffer to create a new character array and copy characters from the old array to the new array (during an expansion), use ensureCapacity(int minimumCapacity) to minimize expansions prior to entering a loop that appends many characters to a StringBuffer. That improves performance.
  • public StringBuffer insert(int offset, String str) inserts the str-referenced String's characters into the current StringBuffer beginning at the index that offset identifies. Any characters starting at offset move upwards. If str contains a null reference, the null character sequence is inserted into the StringBuffer. Example: StringBuffer sb = new StringBuffer ("ab"); sb.insert (1, "cd"); System.out.println (sb); (output: acdb).
  • public int length() returns the value stored in count. In other words, this method returns a string's length. If the string is empty, length() returns 0. A StringBuffer's length differs from its capacity; length specifies value's current character count, whereas capacity specifies the maximum number of characters that store in that array. Example: StringBuffer sb = new StringBuffer (); System.out.println (sb.length ()); (output: 0).
  • public StringBuffer replace(int start, int end, String str) replaces all characters in the current StringBuffer's value array that range between indexes start and one position less than end (inclusive) with characters from the str-referenced String. This method throws a StringIndexOutOfBoundsException object if start is negative, exceeds the value array's length, or is greater than end. Example: StringBuffer sb = new StringBuffer ("abcdef"); sb.replace (0, 3, "x"); System.out.println (sb); (output: xdef).
  • public StringBuffer reverse() reverses the character sequence in the current StringBuffer's value array. Example: StringBuffer sb = new StringBuffer ("reverse this"); System.out.println (sb.reverse ()); (output: siht esrever).
  • public void setCharAt(int index, char c) sets the character at position index in the current StringBuffer's value array to c's contents. If index is negative, equals value's length, or exceeds that length, this method throws an IndexOutOfBoundsException object. Example: StringBuffer sb = new StringBuffer ("abc"); sb.setCharAt (0, 'd'); System.out.println (sb); (output: dbc).
  • public void setLength(int newLength) establishes a new length for the current StringBuffer's value array. Every character in that array located at an index less than newLength remains unchanged. If newLength exceeds the current length, null characters append to the array beginning at the newLength index. If necessary, StringBuffer expands by creating a new value array of the appropriate length. This method throws an IndexOutOfBoundsException object if newLength is negative. The following fragment demonstrates this method:

    StringBuffer sb = new StringBuffer ("abc");
    System.out.println (sb.capacity ());
    System.out.println (sb.length ());
    sb.setLength (100);
    System.out.println (sb.capacity ());
    System.out.println (sb.length ());
    System.out.println ("[" + sb + "]");
    

    The fragment produces this output (in the last line, null characters, after abc, appear as spaces):

    19
    3
    100
    100
    [abc                                        ]
    
  • public String toString() creates a new String object containing the same characters as the current StringBuffer's value array and returns a reference to String. The following code demonstrates toString() in a more efficient (and faster) alternative to String's concat(String str) method for concatenating strings within a loop:

    String s = "abc";
    String t = "def";
          
    StringBuffer sb = new StringBuffer (2000000);
    for (int i = 0; i < 100000; i++)
         sb.append (s).append (t);
    String u = sb.toString ();
    sb = null;
    System.out.println (u);
    

    As the output is large, I don't include it here. Try converting this code into a program and compare its performance with the earlier String s = "abc"; String t = "def"; String u = ""; for (int i = 0; i < 100000; i++) u = u.concat (s).concat (t); code.

For a demonstration of StringBuffer's append(String str) and toString() methods, and the StringBuffer() constructor, examine Listing 4's DigitsToWords, which converts an integer value's digits to its equivalent spelled-out form (for example, 10 verses ten):

Listing 4: DigitsToWords.java

// DigitsToWords.java
class DigitsToWords
{
   public static void main (String [] args)
   {
      for (int i = 0; i < 10000; i++)
           System.out.println (convertDigitsToWords (i));
   }
   static String convertDigitsToWords (int integer)
   {
      if (integer < 0 || integer > 9999)
          throw new IllegalArgumentException ("Out of range: " + integer);
      if (integer == 0)
          return "zero";
      String [] group1 =
      {
         "one",
         "two",
         "three",
         "four",
         "five",
         "six",
         "seven",
         "eight",
         "nine"
      };
      String [] group2 =
      {
         "ten",
         "eleven",
         "twelve",
         "thirteen",
         "fourteen",
         "fifteen",
         "sixteen",
         "seventeen",
         "eighteen",
         "nineteen"
      };
      String [] group3 =
      {
         "twenty",
         "thirty",
         "fourty",
         "fifty",
         "sixty",
         "seventy",
         "eighty",
         "ninety"
      };
      StringBuffer result = new StringBuffer ();
      if (integer >= 1000)
      {
          int tmp = integer / 1000;
          result.append (group1 [tmp - 1] + " thousand");
          integer -= tmp * 1000;
          if (integer == 0)
              return result.toString ();
          result.append (" ");
      }
      if (integer >= 100)
      {
          int tmp = integer / 100;
          result.append (group1 [tmp - 1] + " hundred");
          integer -= tmp * 100;
          if (integer == 0)
              return result.toString ();
          result.append (" and ");
      }
      if (integer >= 10 && integer <= 19)
      {
          result.append (group2 [integer - 10]);
          return result.toString ();
      }
      if (integer >= 20)
      {
          int tmp = integer / 10;
          result.append (group3 [tmp - 2]);
          integer -= tmp * 10;
          if (integer == 0)
              return result.toString ();
          result.append ("-");
      }
      result.append (group1 [integer - 1]);
      return result.toString ();
   }
}

DigitsToWords has a limit of 9,999; it cannot convert integer values that exceed 9,999. Below are the first 22 lines of output:

zero
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
twenty-one

For another practical illustration of StringBuffer's append(String str) method, as well as StringBuffer(int length), append(char c), and deleteCharAt(int index), I created an Editor application that demonstrates a basic line-oriented text editor:

Listing 5: Editor.java

// Editor.java
import java.io.IOException;
class Editor
{
   public static int MAXLINES = 100;
   static int curline = -1; // Current line.
   static int lastline = -1; // Last appended line index.
   // The following array holds all lines of text. (Maximum is MAXLINES.)
   static StringBuffer [] lines = new StringBuffer [MAXLINES];
   static
   {
      // We assume 80-character lines. But who knows? Because StringBuffers
      // dynamically expand, you could end up with some very long lines.
      for (int i = 0; i < lines.length; i++)
           lines [i] = new StringBuffer (80);
   }
   public static void main (String [] args)
   {
      do
      {
          // Prompt user to enter a command
          System.out.print ("C: ");
          // Obtain the command, and make sure there is no leading/trailing
          // white space
          String cmd = readString ().trim ();
          // Process command
          if (cmd.equalsIgnoreCase ("QUIT"))
              break;
          if (cmd.equalsIgnoreCase ("ADD"))
          {
              if (lastline == MAXLINES - 1)
              {
                  System.out.println ("FULL");
                  continue;
              }
              String line = readString ();
              lines [++lastline].append (line);
              curline = lastline;
              continue;
          }
          if (cmd.equalsIgnoreCase ("DELFCH"))
          {
              if (curline > -1 && lines [curline].length () > 0)
                  lines [curline].deleteCharAt (0);
              continue;
          }
          if (cmd.equalsIgnoreCase ("DUMP"))
              for (int i = 0; i <= lastline; i++)
                   System.out.println (i + ": " + lines [i]);
      }
      while (true);
   }
   static String readString ()
   {
      StringBuffer sb = new StringBuffer (80);
      try
      {
         do
         {
             int ch = System.in.read ();
             if (ch == '\n')
                 break;
             sb.append ((char) ch);
         }
         while (true);
      }
      catch (IOException e)
      {
      }
      return sb.toString ();
   }
}

To see how Editor works, type java Editor. Here is one example of this program's output:

C: add
some text
C: dump
0: some text
C: delfch
C: dump
0: ome text
C: quit

Among Editor's various commands, add appends a line of text to the StringBuffer strings array, dump dumps all lines to the standard output device, and delfch removes the current line's first character. Obviously, delfch is not very useful: a better program would specify an index after the command name and delete the character at that index. However, before you can accomplish that task, you must learn about the StringTokenizer class.

The StringTokenizer class

What do the Java compiler, a text-based adventure game, and a Linux shell program have in common? Each program contains code that extracts, from user-specified text, the fundamental character sequences, or tokens, such as identifiers and punctuation (compiler), game-play instructions (adventure game), or command name and arguments (Linux shell). Java accomplishes the token extraction process—known as string tokenizing because user-specified text exists as one or more character strings— via the StringTokenizer class.

Unlike the frequently-used Character, String, and StringBuffer language classes, the less-frequently-used StringTokenizer utility class exists in package java.util and requires an explicit import directive to import that class into a program.

StringTokenizer objects

Before a program can extract tokens from a string, the program must create a StringTokenizer object by calling one of the following constructors:

  • public StringTokenizer(String s), which creates a StringTokenizer that extracts tokens from the s-referenced String. Furthermore, the constructor specifies the space character (' '), tab character ('\t'), new-line character ('\n'), carriage-return character ('\r'), and form-feed character ('\f') as delimiters—characters that separate tokens from each other. Delimiters do not return as tokens.
  • public StringTokenizer(String s, String delim), which is identical to the previous constructor except you also specify a string of delimiter characters via the delim-referenced String. During string tokenizing, StringTokenizer ignores all delimiter characters as it searches for the next token's beginning. Delimiters do not return as tokens.
  • public StringTokenizer(String s, String delim, boolean returnDelim), which resembles the previous constructors except you also specify whether delimiter characters should return as tokens. Delimiter characters return when you pass true to returnDelim.

Examine the following fragment to learn how these constructors create StringTokenizer objects:

String s = "A sentence to tokenize.|A second sentence.";
StringTokenizer stok1 = new StringTokenizer (s);
StringTokenizer stok2 = new StringTokenizer (s, "|");
StringTokenizer stok3 = new StringTokenizer (s, " |", true);

stok1 references a StringTokenizer that extracts tokens from the s-referenced String—and also recognizes space, tab, new-line, carriage-return, and form-feed characters as delimiters. stok2 references a StringTokenizer that also extracts tokens from s. This time, however, only a vertical bar character (|) classifies as a delimiter. Finally, in the stok3-referenced StringTokenizer, the white space and vertical bar classify as delimiters and return as tokens. Now that these StringTokenizers exist, how do you extract tokens from their s-referenced Strings? Let's find out.

Token extraction

StringTokenizer provides four methods for extracting tokens: public int countTokens(), public boolean hasMoreTokens(), public String nextToken(), and public String nextToken(String delim). The countTokens() method returns an integer containing a count of a string's tokens. Use this return value to determine the maximum tokens to extract. However, you should call hasMoreTokens() to determine when to end tokenizing because countTokens() is undependable (as you will see). hasMoreTokens() returns a Boolean true value if at least one more token exists to extract. Otherwise, that method returns false. Finally, the nextToken() and nextToken(String delim) methods return a String's next token. But if no more tokens are available, either method throws a NoSuchElementException object. nextToken() and nextToken(String delim) differ only in that nextToken(String delim) lets you reset a StringTokenizer's delimiter characters to those characters in the delim-referenced String. Given this information, the following code, which builds on the previous fragment, shows how to use the previous three StringTokenizers to extract a string's tokens:

System.out.println ("count1 = " + stok1.countTokens ());
while (stok1.hasMoreTokens ())
   System.out.println ("token = " + stok1.nextToken ());
System.out.println ("\r\ncount2 = " + stok2.countTokens ());
while (stok2.hasMoreTokens ())
   System.out.println ("token = " + stok2.nextToken ());
System.out.println ("\r\ncount3 = " + stok3.countTokens ());
while (stok3.hasMoreTokens ())
   System.out.println ("token = " + stok3.nextToken ());

The fragment above divides into three parts. The first part focuses on stok1. After retrieving and printing a token count, a while loop calls nextToken() to extract all tokens if hasMoreTokens() returns true. The second and third parts use identical logic for the other StringTokenizers. If you execute the code fragment, you observe the following output:

count1 = 6
token = A
token = sentence
token = to
token = tokenize.|A
token = second
token = sentence.
count2 = 2
token = A sentence to tokenize.
token = A second sentence.
count3 = 13
token = A
token =  
token = sentence
token =  
token = to
token =  
token = tokenize.
token = |
token = A
token =  
token = second
token =  
token = sentence.

The output above reveals three different token counts for the same string. The counts differ because the sets of delimiters differ. For stok1, the default delimiter set applies. For stok2, only one delimiter is present: the vertical bar. stok3 records a space and a vertical bar as its delimiters. The output's final portion reveals that the space and vertical bar delimiters return as tokens due to passing true as returnDelim's value in the stok3 call.

Earlier, I cautioned you against relying on countTokens() for determining the number of tokens to extract. countTokens()'s return value is often meaningless when a program dynamically changes a StringTokenizer's delimiters with a nextToken(String delim) method call, as the following fragment demonstrates:

String record = "Ricard Santos,Box 99,'Sacramento,CA'";
StringTokenizer st = new StringTokenizer (record, ",");
int ntok = st.countTokens ();
System.out.println ("Number of tokens = " + ntok);
for (int i = 0; i < ntok; i++)
{
     String token = st.nextToken ();
     System.out.println (token);
     if (token.startsWith ("Box"))
         st.nextToken ("'"); // Throw away comma between Box 99 and
                             // 'Sacramento,CA'
}

The code creates a String that simulates a database record. Within that record, commas delimit fields (record portions). Although there are four commas, only three fields exist: a name, a box number, and a city-state. A pair of single quotes surround the city-state field to indicate that the comma between Sacramento and CA is part of the field.

After creating a StringTokenizer recognizing only comma characters as delimiters, the current thread counts the number of tokens, which subsequently print. The thread then uses that count to control the duration of the loop that extracts and prints tokens. When the Box 99 token returns, the thread executes st.nextToken ("'"); to change the delimiter from a comma to a single quote and discard the comma token between Box 99 and 'Sacramento,CA'. The comma token returns because st.nextToken ("'"); first replaces the comma with a single quote before extracting the next token. The code produces this output:

Number of tokens = 4
Ricard Santos
Box 99
Sacramento,CA
Exception in thread "main" java.util.NoSuchElementException
        at java.util.StringTokenizer.nextToken(StringTokenizer.java:232)
        at STDemo.main(STDemo.java:18)

The output indicates four tokens because three commas imply four tokens. But after displaying three tokens, a NoSuchElementException object is thrown from st.nextToken ();. The exception occurs because the program assumes that countTokens()'s return value indicates the exact number of tokens to extract. However, countTokens() can only base its count on the current set of delimiters. Because the fragment changes those delimiters during the loop, via st.nextToken ("'");, method countTokens()'s return value is no longer valid.

Caution
Do not use countTokens()'s return value to control a string tokenization loop's duration if the loop changes the set of delimiters via a nextToken(String delim) method call. Failure to heed that advice often leads to one of the nextToken() methods throwing a NoSuchElementException object and the program terminating prematurely.

For a practical demonstration of StringTokenizer's methods, I created a PigLatin application that translates English text to its pig Latin equivalent. For those unfamiliar with the pig Latin game, this coded language moves a word's first letter to its end and then adds ay. For example: computer becomes omputercay; Java becomes Avajay, etc. Punctuation is not affected. Listing 6 presents PigLatin's source code:

Listing 6: PigLatin.java

// PigLatin.java
import java.util.StringTokenizer;
class PigLatin
{
   public static void main (String [] args)
   {
      if (args.length != 1)
      {
          System.err.println ("usage: java PigLatin phrase");
          return;
      }
      StringTokenizer st = new StringTokenizer (args [0], " \t:;,.-?!");
      while (st.hasMoreTokens ())
      {
         StringBuffer sb = new StringBuffer (st.nextToken ());
         sb.append (sb.charAt (0));
         sb.append ("ay");
         sb.deleteCharAt (0);
         System.out.print (sb.toString () + " ");
      }
      System.out.print ("\r\n");
   }
}

To see what Hello, world! looks like in pig Latin, execute java PigLatin "Hello, world!". You see the following output:

elloHay orldWay 

According to pig Latin's rules, the output is not quite correct. First, the wrong letters are capitalized. Second, the punctuation is missing. The correct output is:

Ellohay, Orldway! 

Use what you've learned in this article to fix those problems.

Review

Java's Character, String, StringBuffer, and StringTokenizer classes support text-processing programs. Such programs use Character to indirectly store char variables in data structure objects and access a variety of character-oriented utility methods; use String to represent and manipulate immutable strings; use StringBuffer to represent and manipulate mutable strings; and use StringTokenizer to extract a string's tokens.

This article also cleared up three mysteries about strings. First, you saw how the compiler and classloader allow you to treat string literals (at the source-code level) as if they were String objects. Thus, you can legally specify synchronized ("sync object") in a multithreaded program requiring synchronization. Second, you learned why Strings are immutable, and how immutability works with internment to save heap memory when a program requires many strings and to allow fast string searches. Finally, you learned what happens when you use the string concatenation operator to concatenate strings and how StringBuffer is involved in that task.

I encourage you to email me with any questions you might have involving either this or any previous article's material. (Please keep such questions relevant to material discussed in this column's articles.) Your questions and my answers will appear in the relevant study guides.

Next month, I will deviate from my roadmap and introduce you to the world of Java tools.

Jeff Friesen has been involved with computers for the past 20 years. He holds a degree in computer science and has worked with many computer languages. Jeff has also taught introductory Java programming at the college level. In addition to writing for JavaWorld, he has written his own Java book for beginners— Java 2 by Example, Second Edition (Que Publishing, 2001; ISBN: 0789725932)—and helped write Using Java 2 Platform, Special Edition (Que Publishing, 2001; ISBN: 0789724685). Jeff goes by the nickname Java Jeff (or JavaJeff). To see what he's working on, check out his Website at http://www.javajeff.com.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies