Java's character and assorted string classes support text-processing

Explore Character, String, StringBuffer, and StringTokenizer

Text can represent a combination of digits, letters, punctuation, words, sentences, and more. Computer programs that process text need assistance (from their associated languages) to represent and manipulate text. Java provides such assistance through the Character, String, StringBuffer, and StringTokenizer classes. In this article, you'll create objects from these classes and examine their various methods. You'll also receive answers to three mysteries: why Java regards a string literal as a String object, why String objects are immutable (and how immutability relates to string internment), and what happens behind the scenes when the string concatenation operator concatenates two strings into a single string.

Note
Future articles will cover the Character, String, StringBuffer, and StringTokenizer methods that I omit in this discussion.

The Character class

Though Java already has a character type and char keyword to represent and manipulate characters, the language also requires a Character class for two reasons:

  1. Many data structure classes require their data structure objects to store other objects—not primitive type variables. Because directly storing a char variable in these objects proves impossible, that variable's value must wrap inside a Character object, which subsequently stores in a data structure object.
  2. Java needs a class to store various character-oriented utility methods—static methods that perform useful tasks and do not require Character objects; for example, a method that converts an arbitrary character argument representing a lowercase letter to another character representing the uppercase equivalent.

Character objects

The java.lang.Character class declares a private value field of character type. A character stores in value when code creates a Character object via class Character's public Character(char c) constructor, as the following code fragment demonstrates:

Character c = new Character ('A');

The constructor stores the character that 'A' represents in the value field of a new Character object that c references. Because the Character object wraps itself around the character, Character is a wrapper class.

By calling Character's public char charValue() method, code extricates the character from the Character object. Furthermore, by calling Character's public String toString() method, code returns the character as a String object. The following code, which builds on the previous fragment, demonstrates both method calls:

System.out.println (c.charValue ());
String s = c.toString ();

System.out.println (c.charValue ()); returns value's contents and outputs those contents (A) to the standard output device. String s = c.toString (); creates a String object containing value's contents, returns the String's reference, and assigns that reference to String variable s.

Character supplies three methods that compare Character objects for ordering or other purposes. The public int compareTo(Character anotherCharacter) method compares the contents of two Characters by subtracting anotherCharacter's value field from the current Character's value field. The integer result returns. If the result is zero, both objects are the same (based on the value field only). If the result is negative, the current Character's value is numerically less than the anotherCharacter-referenced Character's value. Finally, a positive result implies that the current Character's value field is numerically greater than anotherCharacter's value field. A second overloaded public int compareTo(Object o) method works the same as compareTo(Character anotherCharacter) (and returns the same result), but compares the current Character and the o-referenced object (which must be of type Character, or the method throws a ClassCastException object). compareTo(Object o) allows Java's Collections Framework to sort Characters according to natural order. (A future article will discuss that method, sorting, and natural order.) Finally, the public final boolean equals(Object o) method compares the contents of the value field in the current Character with the contents of the value field in o. A Boolean true value returns if o is of type Character and if both value fields contain the same contents. Otherwise, false returns. To see the compareTo(Character anotherCharacter) and equals(Object o) methods in action, examine the following code fragment:

Character c1 = new Character ('A');
Character c2 = new Character ('B');
Character c3 = new Character ('A');
System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2));
System.out.println ("c1.equals (c2): " + c1.equals (c2));
System.out.println ("c1.equals (c3): " + c1.equals (c3));

System.out.println ("c1.compareTo (c2): " + c1.compareTo (c2)); outputs -1 because A is (numerically) less than B. System.out.println ("c1.equals (c2): " + c1.equals (c2)); outputs false because the Characters that c1 and c2 reference contain different characters (A and B). Finally, System.out.println ("c1.equals (c3): " + c1.equals (c3)); outputs true because, although c1 and c3 reference different Characters, both objects contain the same character (A).

Character-oriented utility methods

Character serves as a repository for character-oriented utility methods. Examples of those methods include:

  • public static boolean isDigit(char c), which returns a Boolean true value if c's character is a digit. Otherwise, false returns.
  • public static boolean isLetter(char c), which returns a Boolean true value if c's character is a letter. Otherwise, false returns.
  • public static boolean isUpperCase(char c), which returns a Boolean true value if c's character is an uppercase letter. Otherwise, false returns.
  • public static char toLowerCase(char c), which returns the lowercase equivalent of c's character if it is uppercase. Otherwise c's character returns.
  • public static char toUpperCase(char c), which returns the uppercase equivalent of c's character if it is lowercase. Otherwise c's character returns.

The following code fragment demonstrates those five methods:

System.out.println (Character.isDigit ('4')); // Output: true
System.out.println (Character.isLetter (';')); // Output: false
System.out.println (Character.isUpperCase ('X')); // Output: true
System.out.println (Character.toLowerCase ('B')); // Output: b
System.out.println (Character.toUpperCase ('a')); // Output: A

Another useful utility method is Character's public static char forDigit(int digit, int radix), which converts digit's integer value to its character equivalent in the number system that radix specifies and returns the result. However, if digit identifies an integer less than zero or greater than or equal to radix's value, forDigit(int digit, int radix) returns the null character (represented in source code as Unicode escape sequence '\u0000'). Similarly, if radix identifies an integer less than Character's MIN_RADIX constant or greater than Character's MAX_RADIX constant, forDigit(int digit, int radix) returns the null character. The following code demonstrates that method:

for (int i = 0; i < 16; i++)
     System.out.println (Character.forDigit (i, 16));

That fragment converts integer numbers 0 through 15 to their character equivalents in the hexadecimal number system and outputs those character equivalents (0 through f).

To complement the forDigit(int digit, int radix) method, Character provides the public static int digit(char c, int radix) method, which converts the c-specified character value in the radix-specified number system, to the value's integer equivalent and returns the result. If c contains a nondigit character for the specified number system or radix is not in the MIN_RADIX/MAX_RADIX range, digit(char c, int radix) returns -1. The following code demonstrates that method:

char [] digits = { '0', '1', '2', '3', '4', '5', '6', '7',
                   '8', '9', 'a', 'b', 'c', 'd', 'e', 'f', 'x' };
for (int i = 0; i < digits.length; i++)
     System.out.println (Character.digit (digits [i], 16));

The fragment above converts the digits array's digit characters to their integer equivalents and outputs the results. Apart from the last character, each character represents a hexadecimal digit. (Passing 16 as the radix argument informs digit(char c, int radix) that the number system is hexadecimal.) Because x does not represent a hexadecimal digit, digit(char c, int radix) outputs -1 when it encounters that character.

To demonstrate Character's isDigit(char c) and isLetter(char c) methods, I've created a CA (character analysis) application that counts a text file's digits, letters, and other characters. In addition to printing those counts, CA calculates and prints each count's percentage of the total count. Listing 1 presents CA's source code (don't worry about the file-reading logic: I'll explain FileInputStream and other file-related concepts in a future article):

Listing 1: CA.java

// CA.java
// Character Analysis
import java.io.*;
class CA
{
   public static void main (String [] args)
   {
      int ch, ndigits = 0, nletters = 0, nother = 0;
      if (args.length != 1)
      {
          System.err.println ("usage: java CA filename");
          return;
      }
      FileInputStream fis = null;
      try
      {
          fis = new FileInputStream (args [0]);
          while ((ch = fis.read ()) != -1)
             if (Character.isLetter ((char) ch))
                 nletters++;
             else
             if (Character.isDigit ((char) ch))
                 ndigits++;
             else
                 nother++;
          System.out.println ("num letters = " + nletters);
          System.out.println ("num digits = " + ndigits);
          System.out.println ("num other = " + nother + "\r\n");
          int total = nletters + ndigits + nother;
          System.out.println ("% letters = " +
                              (double) (100.0 * nletters / total));
          System.out.println ("% digits = " +
                              (double) (100.0 * ndigits / total));
          System.out.println ("% other = " +
                              (double) (100.0 * nother / total));
      }
      catch (IOException e)
      {
          System.err.println (e);
      }
      finally
      {
          try
          {
              fis.close ();
          }
          catch (IOException e)
          {
          }
      }
   }
}

If you want to perform a character analysis on CA's source file—CA.java—execute java CA ca.java. You see the following output:

num letters = 609
num digits = 18
num other = 905
% letters = 39.75195822454308
% digits = 1.174934725848564
% other = 59.07310704960835

The String class

The String class contrasts with Character in that a String object stores a sequence of characters—a string—whereas a Character object stores one character. Because strings are pervasive in text-processing and other programs, Java offers two features that simplify developer interaction with String objects: simplified assignment and an operator that concatenates strings. This section examines those features.

String objects

A java.lang.String object stores a character sequence in a character array that String's private value field variable references. Furthermore, String's private count integer field variable maintains the number of characters in that array. Each String has its own copy of those fields, and Java's simplified assignment shortcut offers the easiest way to create a String and store a string in the String's value array, as the following code demonstrates:

public static void main (String [] args)
{
   String s = "abc";
   System.out.println (s); // Output: abc
}

When the compiler compiles the preceding fragment, it stores the abc string literal in a special area of the class file—the constant pool, which is a collection of string literals, integer literals, and other constants. The compiler also generates a byte code instruction (ldc—load constant) that pushes a reference to a String object containing abc onto the calling thread's stack, and generates another instruction (astore_1) that pops that reference from the stack and stores it in the s local variable, which corresponds to local variable 1 at the JVM level.

What creates that String object and when? Neither the Java Language Specification nor the Java Virtual Machine Specification offer answers that I can find. Instead, I speculate the following: When a classloader—a concept I'll discuss in a future article—loads a class file, it scans its constant pool's memory copy. For each string literal in that pool, the classloader creates a String, populates that object with the string literal's characters, and modifies the string literal's entry in the constant pool's memory copy so ldc pushes the String's reference onto the calling thread's stack.

Because the compiler and classloader treat string literals as String objects, "abc".length() and synchronized ("sync object") are legal. "abc".length() returns the length of the String containing abc; and synchronized ("sync object") grabs the lock associated with the String containing sync object. Java regards these and other string literals as String objects to serve as a convenience for developers. As with the simplified assignment shortcut, substituting string literals for String object reference variables reduces the amount of code you must write.

Java also offers a variety of String constructors for creating String objects. I detail three below:

1 2 3 4 5 Page
Recommended
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more