Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Study guide: Java's character and assorted string classes support text-processing

Brush up on Java terms, learn tips and cautions, review homework assignments, and read Jeff's answers to student questions

  • Print
  • Feedback

Glossary of terms

capacity
Length of the internal array in a data structure object, such as a StringBuffer object.


delimiters
Characters that separate tokens.


interns
Contains.


immutable
Unchangeable.


mutable
Changeable.


text
Digits, letters, punctuation, words, sentences, and so on.


wrapper class
A class whose objects wrap themselves around primitive type variables for storing those variables in a data structure object that only stores objects—not variables. Character is an example.


Tips and cautions

These tips and cautions will help you write better programs and save you from agonizing over why the compiler produces error messages.

Tips

  • To quicken string searches, use intern() to intern your Strings in the common string memory pool. Because that pool contains no duplicate Strings, each object has a unique reference. Plus, using == to compare references proves faster than using a method to compare a string's characters.
  • Because it takes time for a StringBuffer to create a new character array and copy characters from the old array to the new array (during an expansion), use ensureCapacity(int minimumCapacity) to minimize expansions prior to entering a loop that appends many characters to a StringBuffer. That improves performance.


Cautions

  • Do not use either concat(String str) or the string concatenation operator in a loop that executes repeatedly; that can affect your program's performance. (Each approach creates several objects—which can increase garbage collections—and several methods are called behind the scenes.)
  • Confusing length() with length leads to compiler errors. length() is a method that returns the current number of characters in a String's value array, whereas length is a read-only array field that returns the maximum number of elements in an array.
  • HexDec includes the expression i < s.length () in its for loop header. For long loops, do not call length() in a for loop header because of method call overhead (which can affect performance). Instead, call that method and save its return value before entering the loop, and then use the saved value in the loop header. Example: int len = s.length (); for (int i = 0; i < len; i++). For toy programs, like HexDec, it doesn't matter if I call length() in the for loop header. But for professional programs where performance matters, every time-saving trick helps. (Some Java compilers perform this optimization for you.)
  • Do not use countTokens()'s return value to control a string tokenization loop's duration if the loop changes the set of delimiters via a nextToken(String delim) method call. Failure to heed that advice often leads to one of the nextToken() methods throwing a NoSuchElementException object and the program terminating prematurely.


Miscellaneous notes and thoughts

Subsequent to the publication of my article on character and string classes, a problem with the StringTokenizer class was brought to my attention. The problem deals with the String delim parameter that is part of two StringTokenizer constructors. To many developers, that parameter's String type suggests that StringTokenizer recognizes multicharacter delimiters (such as ###). Instead, StringTokenizer interprets that parameter as a set of one-character delimiters. This confusion over what delim means would disappear if delim had character array type (char []). For more information on the problem and a multicharacter delimiter solution, read "Steer Clear of Java Pitfalls," Michael Daconta (JavaWorld, September 2000).

Reader questions

Find out what questions your fellow readers are asking and my answers to those questions.



Jeff,

When you need to create strings dynamically, what should you use performance wise, String or StringBuffer? I thought StringBuffer performs better than String. So I always start with a StringBuffer, and after completing the String, return a buffer.toString().

Michel

Michel, The answer to your question depends on the nature of the string. If it is immutable (that is, unchangeable), use String. You could use StringBuffer, but doing so would allow code to directly change the immutable string's characters, and the string would no longer be immutable. Similarly, if the string is mutable, use StringBuffer. You could use String, but then you would end up creating additional String objects during string modification—because String methods that could potentially modify a string create additional String objects that contain modified strings. Eventually, the garbage collector would see the additional (and probably unreferenced) Strings and perform a collection—possibly affecting the program's performance. StringBuffer does not suffer from that performance problem because it does not create additional StringBuffer (or String) objects. Additional Strings that arise from using String (instead of StringBuffer) to represent a mutable string are one performance problem. A second problem, which can prove just as serious, could occur when you use StringBuffer to represent a mutable string. StringBuffer's character array (which holds a mutable string) has finite length. Any modification that results in a mutable string whose length exceeds the character array's length causes StringBuffer to create a new character array of appropriate length, copy characters from the old character array to the new character array, and erase the reference to the old character array (making that character array eligible for garbage collection). Because array creation, array copying, and garbage collection take time, how do you solve this potential performance problem? Either create a StringBuffer object with large enough initial capacity—character array length—or call StringBuffer's ensureCapacity() method to set an appropriate character array length prior to changing the array. That way, you minimize the number of extra activities. Both performance problems manifest themselves during looped string concatenation. Consider the following code fragment:
String s = "a";
for (int i = 0; i < 2000; i++)
    s = s + "b";.
The code fragment translates into this byte code equivalent:
String s = "a";
for (int i = 0; i < 2000; i++)
    s = new StringBuffer ().append (s).append ("b").toString ();
The code fragment above creates a StringBuffer and a String (via toString()) during each loop iteration. These objects are temporary and disappear after each loop iteration (although the last-created String is still referenced after the loop completes). Eventually, the garbage collector will probably run. How do you solve this potential performance problem? Consider the following code fragment:
String s = "a";
StringBuffer sb = new StringBuffer (2500); // Assume a maximum character array length of 2500 characters.
sb.append (s);
for (int i = 0; i < 2000; i++)
    sb.append ("b");
s = sb.toString ();
The code fragment does not create any StringBuffer or String objects during the loop. Therefore, the potential for garbage collection is quite low. (Garbage collection can still occur because the garbage collector thread runs at various times and there may be unreferenced objects from previously-executed code to collect.) To sum up, understanding whether strings should be immutable or mutable will lead you to select the appropriate String/StringBuffer classes, which benefits performance. Furthermore, performance improves when you set an appropriate StringBuffer capacity prior to making many modifications and use care when dealing with looped string concatenation. Jeff


Homework

  • Why does Java require a Character class?
  • Why does Java regard string literals as String objects?
  • Enhance the Editor application with the following capabilities:

    • Rename DELFCH to DELCH and modify that command to take a single integer argument identifying the zero-based index of the character to delete in the current line. Example: delch 2 deletes the current line's third character. Provide appropriate error checking to warn users when they specify an invalid index (or curline contains -1, indicating no lines of text). If no more characters are in the current line, delete it and update curline as appropriate.
    • Create a DEL command that deletes the current line. Use error checking to deal with the situation when curline contains -1. Update curline as appropriate.
    • Create a REPL command that replaces all occurrences of a specific character in the current line with another character. Two character arguments should follow REPL: the first argument identifies the character to replace, and the second argument identifies its replacement character. Example: repl # * replaces all occurrences of # with *. Use appropriate error checking in case no current line exists (i.e., curline contains -1).
    • Create a SETCL command that takes a single zero-based integer argument and sets curline to that value. Use appropriate error checking in case the value is out of range or curline contains -1.
    • If you feel ambitious, include LOAD and SAVE commands that let you load the contents of arbitrary text files and save the current text to a specific text file. What sort of error checking will you need?


Answers to last month's homework

Last month, I asked you answer some questions and create a package. My answers appear in red.

  • Print
  • Feedback