Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 16 of 16
Found bc starting at index 1 and ending at index 3 Found bc starting at index 4 and ending at index 6 Found bc starting at index 7 and ending at index 9
The output shows we are interested in displaying only all matches associated with capturing group number 2, as well as those matches' starting and ending positions.
| Note |
|---|
String incorporates two convenience methods that invoke their equivalent Matcher methods: public String replaceFirst(String regex, String replacement) and public String replaceAll(String regex, String replacement).
|
Pattern's compilation methods throw PatternSyntaxException objects when they detect illegal syntax in a regex's pattern. An exception handler may call the following PatternSyntaxException methods to obtain information from a thrown PatternSyntaxException object about the syntax error:
public String getDescription(): returns the syntax error's description
public int getIndex(): returns either the approximate index (within a pattern) where the syntax error occurs or -1 if the index is unknown
public String getMessage(): builds a multiline string that contains the combined information the other three methods return along with a visual indication
of the syntax error position within the pattern
public String getPattern(): returns the erroneous regex pattern
Because PatternSyntaxException inherits from java.lang.RuntimeException, code doesn't need to specify an exception handler. This proves appropriate when regexes are known to have correct patterns.
But when potential for bad pattern syntax exists, an exception handler is necessary. Thus, RegexDemo's source code (see Listing 1) includes try { ... } catch (ParseSyntaxException e) { ... }, which calls each of the four previous PatternSyntaxException methods to obtain information about an illegal pattern.
What constitutes an illegal pattern? Not specifying the closing parentheses metacharacter in an embedded flag expression represents
one example. Suppose you execute java RegexDemo (?itree Treehouse. That command line's illegal (?tree pattern causes p = Pattern.compile (args [0]); to throw a PatternSyntaxException object. You then observe the following output:
Regex syntax error: Unknown inline modifier near index 3 (?itree ^ Error description: Unknown inline modifier Error index: 3 Erroneous pattern: (?itree
| Note |
|---|
The public PatternSyntaxException(String desc, String regex, int index) constructor lets you create your own PatternSyntaxException objects. That constructor comes in handy should you ever create your own preprocessing compilation method that recognizes
your own pattern syntax, translates that syntax to syntax recognized by Pattern's compilation methods, and calls one of those compilation methods. If your method's caller violates your custom pattern syntax,
you can throw an appropriate PatternSyntaxException object from that method.
|
Regexes let you create powerful text-processing applications. One application you might find helpful extracts comments from a Java, C, or C++ source file, and records those comments in another file. Listing 2 presents that application's source code:
Listing 2. ExtCmnt.java
// ExtCmnt.java
import java.io.*;
import java.util.regex.*;
class ExtCmnt
{
public static void main (String [] args)
{
if (args.length != 2)
{
System.err.println ("usage: java ExtCmnt infile outfile");
return;
}
Pattern p;
try
{
// The following pattern lets this extract multiline comments that
// appear on a single line (e.g., /* same line */) and single-line
// comments (e.g., // some line). Furthermore, the comment may
// appear anywhere on the line.
p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
}
catch (PatternSyntaxException e)
{
System.err.println ("Regex syntax error: " + e.getMessage ());
System.err.println ("Error description: " + e.getDescription ());
System.err.println ("Error index: " + e.getIndex ());
System.err.println ("Erroneous pattern: " + e.getPattern ());
return;
}
BufferedReader br = null;
BufferedWriter bw = null;
try
{
FileReader fr = new FileReader (args [0]);
br = new BufferedReader (fr);
FileWriter fw = new FileWriter (args [1]);
bw = new BufferedWriter (fw);
Matcher m = p.matcher ("");
String line;
while ((line = br.readLine ()) != null)
{
m.reset (line);
if (m.matches ()) /* entire line must match */
{
bw.write (line);
bw.newLine ();
}
}
}
catch (IOException e)
{
System.err.println (e.getMessage ());
return;
}
finally // Close file.
{
try
{
if (br != null)
br.close ();
if (bw != null)
bw.close ();
}
catch (IOException e)
{
}
}
}
}
After creating Pattern and Matcher objects, ExtCmnt reads a text file's contents line by line. For each read line, the matcher attempts to match that line against a pattern,
identifying either a single-line comment or a multiline comment that appears on a single line. If the line matches the pattern,
ExtCmnt writes that line to another text file. For example, java ExtCmnt ExtCmnt.java out reads each ExtCmnt.java line, attempts to match that line against the pattern, and outputs matched lines to a file named out. (Don't worry about understanding the file reading and writing logic. I will explore that logic in a future article.) After
ExtCmnt completes, out contains the following lines:
// ExtCmnt.java
// The following pattern lets this extract multiline comments that
// appear on a single line (e.g., /* same line */) and single-line
// comments (e.g., // some line). Furthermore, the comment may
// appear anywhere on the line.
p = Pattern.compile (".*/\\*.*\\*/|.*//.*$");
if (m.matches ()) /* entire line must match */
finally // Close file.
The output shows that ExtCmnt is not perfect: p = Pattern.compile (".*/\\*.*\\*/|.*//.*$"); doesn't represent a comment. That line appears in out because ExtCmnt's matcher matches the // characters.
There is something interesting about the pattern in ".*/\\*.*\\*/|.*//.*$": the vertical bar metacharacter (|). According to the SDK documentation, the parentheses metacharacters in a capturing group and the vertical bar metacharacter
are logical operators. The vertical bar tells a matcher to use that operator's left regex construct operand to locate a match
in the matcher's text. If no match exists, the matcher uses that operator's right regex construct operand in another match
attempt.
Although regexes simplify pattern-matching code in text-processing applications, you cannot effectively use regexes in your
applications until you understand them. This article gave you a basic understanding of regexes by introducing you to regex
terminology, the java.util.regex package, and a program that demonstrates regex constructs. Now that you possess a basic understanding of regexes, build onto
that understanding by reading additional articles (see Resources) and studying java.util.regex's SDK documentation, where you can learn about more regex constructs, such as POSIX (Portable Operating System Interface
for Unix) character classes.
I encourage you to email me with any questions you might have involving either this or any previous article's material. (Please keep such questions relevant to material discussed in this column's articles.) Your questions and my answers will appear in the relevant study guides.
After writing Java 101 articles for 28 consecutive months, I'm taking a two-month break. I'll return in May and introduce a series on data structures and algorithms.