Study guide: Java's character and assorted string classes support text-processing

Brush up on Java terms, learn tips and cautions, review homework assignments, and read Jeff's answers to student questions

Glossary of terms

capacity
Length of the internal array in a data structure object, such as a StringBuffer object.
delimiters
Characters that separate tokens.
interns
Contains.
immutable
Unchangeable.
mutable
Changeable.
text
Digits, letters, punctuation, words, sentences, and so on.
wrapper class
A class whose objects wrap themselves around primitive type variables for storing those variables in a data structure object that only stores objects—not variables. Character is an example.

Tips and cautions

These tips and cautions will help you write better programs and save you from agonizing over why the compiler produces error messages.

Tips

  • To quicken string searches, use intern() to intern your Strings in the common string memory pool. Because that pool contains no duplicate Strings, each object has a unique reference. Plus, using == to compare references proves faster than using a method to compare a string's characters.
  • Because it takes time for a StringBuffer to create a new character array and copy characters from the old array to the new array (during an expansion), use ensureCapacity(int minimumCapacity) to minimize expansions prior to entering a loop that appends many characters to a StringBuffer. That improves performance.

Cautions

  • Do not use either concat(String str) or the string concatenation operator in a loop that executes repeatedly; that can affect your program's performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.)
  • Confusing length() with length leads to compiler errors. length() is a method that returns the current number of characters in a String's value array, whereas length is a read-only array field that returns the maximum number of elements in an array.
  • HexDec includes the expression i < s.length () in its for loop header. For long loops, do not call length() in a for loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example: int len = s.length (); for (int i = 0; i < len; i++). For toy programs, like HexDec, it doesn't matter if I call length() in the for loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.)
  • Do not use countTokens()'s return value to control a string tokenization loop's duration if the loop changes the set of delimiters via a nextToken(String delim) method call. Failure to heed that advice often leads to one of the nextToken() methods throwing a NoSuchElementException object and the program terminating prematurely.

Miscellaneous notes and thoughts

Subsequent to the publication of my article on character and string classes, a problem with the StringTokenizer class was brought to my attention. The problem deals with the String delim parameter that is part of two StringTokenizer constructors. To many developers, that parameter's String type suggests that StringTokenizer recognizes multicharacter delimiters (such as ###). Instead, StringTokenizer interprets that parameter as a set of one-character delimiters. This confusion over what delim means would disappear if delim had character array type (char []). For more information on the problem and a multicharacter delimiter solution, read "Steer Clear of Java Pitfalls," Michael Daconta (JavaWorld, September 2000).

Reader questions

Find out what questions your fellow readers are asking and my answers to those questions.

Jeff,

When you need to create strings dynamically, what should you use performance wise, String or StringBuffer? I thought StringBuffer performs better than String. So I always start with a StringBuffer, and after completing the String, return a buffer.toString().

Michel

Michel, The answer to your question depends on the nature of the string. If it is immutable (that is, unchangeable), use

String

. You could use

StringBuffer

, but doing so would allow code to directly change the immutable string's characters, and the string would no longer be immutable. Similarly, if the string is mutable, use

StringBuffer

. You could use

String

, but then you would end up creating additional

String

objects during string modification—because

String

methods that could potentially modify a string create additional

String

objects that contain modified strings. Eventually, the garbage collector would see the additional (and probably unreferenced)

String

s and perform a collection—possibly affecting the program's performance.

StringBuffer

does not suffer from that performance problem because it does not create additional

StringBuffer

(or

String

) objects. Additional

String

s that arise from using

String

(instead of

StringBuffer

) to represent a mutable string are one performance problem. A second problem, which can prove just as serious, could occur when you use

StringBuffer

to represent a mutable string.

StringBuffer

's character array (which holds a mutable string) has finite length. Any modification that results in a mutable string whose length exceeds the character array's length causes

StringBuffer

to create a new character array of appropriate length, copy characters from the old character array to the new character array, and erase the reference to the old character array (making that character array eligible for garbage collection). Because array creation, array copying, and garbage collection take time, how do you solve this potential performance problem? Either create a

StringBuffer

object with large enough initial capacity—character array length—or call

StringBuffer

's

ensureCapacity()

method to set an appropriate character array length prior to changing the array. That way, you minimize the number of extra activities. Both performance problems manifest themselves during looped string concatenation. Consider the following code fragment:

String s = "a";
for (int i = 0; i < 2000; i++)
    s = s + "b";.

The code fragment translates into this byte code equivalent:

String s = "a";
for (int i = 0; i < 2000; i++)
    s = new StringBuffer ().append (s).append ("b").toString ();

The code fragment above creates a

StringBuffer

and a

String

(via

toString()

) during each loop iteration. These objects are temporary and disappear after each loop iteration (although the last-created

String

is still referenced after the loop completes). Eventually, the garbage collector will probably run. How do you solve this potential performance problem? Consider the following code fragment:

String s = "a";
StringBuffer sb = new StringBuffer (2500); // Assume a maximum character array length of 2500 characters.
sb.append (s);
for (int i = 0; i < 2000; i++)
    sb.append ("b");
s = sb.toString ();

The code fragment does not create any

StringBuffer

or

String

objects during the loop. Therefore, the potential for garbage collection is quite low. (Garbage collection can still occur because the garbage collector thread runs at various times and there may be unreferenced objects from previously-executed code to collect.) To sum up, understanding whether strings should be immutable or mutable will lead you to select the appropriate

String

/

StringBuffer

classes, which benefits performance. Furthermore, performance improves when you set an appropriate

StringBuffer

capacity prior to making many modifications and use care when dealing with looped string concatenation. Jeff

Homework

  • Why does Java require a Character class?
  • Why does Java regard string literals as String objects?
  • Enhance the Editor application with the following capabilities:

    • Rename DELFCH to DELCH and modify that command to take a single integer argument identifying the zero-based index of the character to delete in the current line. Example: delch 2 deletes the current line's third character. Provide appropriate error checking to warn users when they specify an invalid index (or curline contains -1, indicating no lines of text). If no more characters are in the current line, delete it and update curline as appropriate.
    • Create a DEL command that deletes the current line. Use error checking to deal with the situation when curline contains -1. Update curline as appropriate.
    • Create a REPL command that replaces all occurrences of a specific character in the current line with another character. Two character arguments should follow REPL: the first argument identifies the character to replace, and the second argument identifies its replacement character. Example: repl # * replaces all occurrences of # with *. Use appropriate error checking in case no current line exists (i.e., curline contains -1).
    • Create a SETCL command that takes a single zero-based integer argument and sets curline to that value. Use appropriate error checking in case the value is out of range or curline contains -1.
    • If you feel ambitious, include LOAD and SAVE commands that let you load the contents of arbitrary text files and save the current text to a specific text file. What sort of error checking will you need?

Answers to last month's homework

Last month, I asked you answer some questions and create a package. My answers appear in red.

  • What is the unnamed package?
  • The unnamed package is the package to which a source file's classes/interfaces belong when their source file lacks a package directive. From an implementation perspective, the unnamed package corresponds to whatever directory is current when you invoke the java command.

  • What is the purpose of classpath?
  • classpath is an environment variable that helps the JVM's classloader locate class and jar files.

  • Create a shapes package with classes Point, Circle, Rectangle, and Square. Of those classes, ensure that Point is the only class not accessible outside its package. Use implementation inheritance to derive Circle from Point and Rectangle from Square. Provide an Area interface with a double getArea() method that returns the area of a Circle, a Square, or a Rectangle. Once you finish creating the package, create a TestShapes program that imports class and interface names from shapes, creates objects from shape classes, and computes the area of the shape each object represents. After compiling and running TestShapes (successfully), move shapes to another location on your hard drive and change classpath so that a second attempt to run TestShapes results in the same output as the previous run.
  • Complete the following steps:

    1. Ensure no classpath environment variable exists.
    2. Create a shapes directory.
    3. Copy the following source code into an Area.java file that appears in shapes:

      // Area.java
      package shapes;
      public interface Area
      {
         double getArea ();
      }
      
      
    4. Copy the following source code into a Circle.java file that appears in shapes:

      // Circle.java
      package shapes;
      public class Circle extends Point implements Area
      {
         private int radius;
         public Circle (int x, int y, int radius)
         {
            super (x, y);
            this.radius = radius;
         }
         // Why do I need to redeclare getX () and getY ()? Hint: Comment
         // out both methods and try to call them from TestShapes.
         public int getX () { return super.getX (); }
         public int getY () { return super.getY (); }
         public int getRadius () { return radius; }
         public double getArea () { return 3.14159 * radius * radius; }
      }
      
      
    5. Copy the following source code into a Point.java file that appears in shapes:

      // Point.java
      package shapes;
      class Point
      {
         private int x, y;
         Point (int x, int y)
         {
            this.x = x;
            this.y = y;
         }
         int getX () { return x; }
         int getY () { return y; }
      }
      
      
    6. Copy the following source code into a Rectangle.java file that appears in shapes:

      // Rectangle.java
      package shapes;
      public class Rectangle extends Square
      {
         private int length;
         public Rectangle (int width, int length)
         {
            super (width);
            this.length = length;
         }
         public int getLength () { return length; }
         public double getArea () { return getWidth () * length; }
      }
      
      
    7. Copy the following source code into a Square.java file that appears in shapes:

      // Square.java
      package shapes;
      public class Square implements Area
      {
         private int width;
         public Square (int width)
         {
            this.width = width;
         }
         public int getWidth () { return width; }
         public double getArea () { return width * width; }
      }
      
      
    8. Copy the following source code into a TestShapes.java file that appears in shapes's parent directory:

      // TestShapes.java
      import shapes.*;
      class TestShapes
      {
         public static void main (String [] args)
         {
            Area [] a = { new Circle (10, 10, 20),
                          new Square (5),
                          new Rectangle (10, 15) };
            for (int i = 0; i < a.length; i++)
                 System.out.println (a [i].getArea ());
         }
      }
      
      
    9. Assuming the directory that contains TestShapes.java is the current directory, execute javac TestShapes.java to compile TestShapes.java and all files in the shapes directory. Then execute java TestShapes to run this application.
    10. Move shapes to another directory and set classpath to refer to that directory and the current directory. For example, under Windows, move shapes \temp moves shapes into the temp directory. set classpath=\temp;. points classpath to the temp directory (just below the root directory) and current directory so java TestShapes still runs.