Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Regular expressions simplify pattern-matching code

Discover the elegance of regular expressions in text-processing scenarios that involve pattern matching

  • Print
  • Feedback

Page 3 of 16

Regex = .ox
Text = The quick brown fox jumps over the lazy ox.
Found fox
  starting at index 16 and ending at index 19
Found  ox
  starting at index 39 and ending at index 42


The output reveals two matches: fox and ox (with a leading space character). The . metacharacter matches the f in the first match and the space character in the second match.

What happens if we replace .ox with the period metacharacter? That is, what outputs when we specify java . "The quick brown fox jumps over the lazy ox."? Because the period metacharacter matches any character, RegexDemo outputs a match for each character in its text command-line argument, including the terminating period character.

Tip
To specify . or any metacharacter as a literal character in a regex construct, quote—convert from meta status to literal status—the metacharacter in one of two ways:
  • Precede the metacharacter with a backslash character.
  • Place the metacharacter between \Q and \E (e.g., \Q.\E).
In either scenario, don't forget to double each backslash character (as in \\. or \\Q.\\E) that appears in a string literal (e.g., String regex = "\\.";). Do not double the backslash character when it appears as part of a command-line argument.


Character classes

We sometimes limit those characters that produce matches to a specific set of characters. For example, we might search text for vowels a, e, i, o, and u, where any occurrence of any vowel indicates a match. A character class, a regex construct that identifies a set of characters between open and close square bracket metacharacters ([ ]), helps us accomplish that task. Pattern supports the following character classes:

  • Simple: consists of characters placed side by side and matches only those characters. Example: [abc] matches characters a, b, and c. The following command line offers a second example:

    java RegexDemo [csw] cave
    


    java RegexDemo [csw] cave matches c in [csw] with c in cave. No other matches exist.

  • Print
  • Feedback

Resources