Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Study guide: Regular expressions simplify pattern-matching code

Brush up on Java terms, learn tips and cautions, review homework assignments, and read answers to student questions

  • Print
  • Feedback

Glossary of terms

back reference
A regex construct, specified as a backslash character (\) followed by a digit character denoting a capturing group number, that recalls a capturing group's captured text characters.


boundary matcher
A regex construct that identifies a match location.


capturing group
A regex construct, specified as a sequence of characters surrounded by parentheses metacharacters (()), that captures a match's characters for later recall.


character class
A regex construct that identifies a set of characters between open and close square bracket metacharacters ([]).


character sequences
Objects whose classes implement the java.lang.CharSequence interface and serve as text sources.


embedded flag expression
A regex construct, specified as parentheses metacharacters surrounding a question mark metacharacter (?) followed by a specific lowercase letter, that overrides a matcher default.


line terminator
A one- or two-character sequence that identifies the end of a line of text.


matchers
Matcher objects.


matches
Strings that match a regex's pattern.


metacharacters
Characters that have special meaning instead of a literal meaning.


noncapturing group
A regex construct, specified as a sequence of characters surrounded by parentheses metacharacters, that does not capture text characters.


pattern
A template. Pattern objects are also known as patterns.


quantifier
A regex construct that implicitly or explicitly binds a numeric value to a pattern.


quote
Convert from meta status to literal status.


pattern matching
The process of searching text to identify matches.


regex constructs
Regex pattern categories.


regular expression
A string whose pattern describes a set of strings. Also known as a regex or regexp.


zero-length match
A match of zero length where the start and end indexes are the same.


Tips and cautions

These tips and cautions will help you write better programs and save you from agonizing over why the compiler produces error messages.

Tips

  • To specify . or any metacharacter as a literal character in a regex construct, quote—convert from meta status to literal status—the metacharacter in one of two ways:
    • Precede the metacharacter with a backslash character.
    • Place the metacharacter between \Q and \E (e.g., \Q.\E).
    In either scenario, don't forget to double each backslash character (as in \\. or \\Q.\\E) that appears in a string literal (e.g., String regex = "\\.";). Do not double the backslash character when it appears as part of a command-line argument.
  • Combine multiple ranges within the same range character class by placing them side by side. Example: [a-zA-Z] matches all lowercase and uppercase alphabetic characters.
  • To specify multiple embedded flag expressions in a regex, either place them side by side (e.g., (?m)(?i)) or place their lowercase letters side by side (e.g., (?mi)).


Cautions

  • Do not assume that Pattern's and Perl 5's regex constructs are identical. Although they share many similarities, they also share differences, ranging from disparities in the constructs they support to their treatment of dangling metacharacters. (For more information, examine Pattern's SDK documentation.)

Homework

  • Not specifying the closing parentheses metacharacter in an embedded flag expression is one example of an illegal pattern. Identify two other illegal pattern examples.
  • Create a regex that matches phone numbers with or without area codes. If present, a three-digit area code must appear between parentheses characters. Optional space characters may appear between the area code and the number. The number consists of three digit characters followed by a hyphen character followed by four digit characters. No space characters may appear on either side of the hyphen. Example: (123) 555-4678.


Answers to last month's homework

Last month, I asked you to answer two questions. Here are my answers (which appear in red).

  • Print
  • Feedback