Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Regular expressions simplify pattern-matching code

Discover the elegance of regular expressions in text-processing scenarios that involve pattern matching

  • Print
  • Feedback

Page 10 of 16

  • java RegexDemo .*+end "This is the end": uses a possessive quantifier to match all characters followed by end in This is the end zero or more times. The following output results:

    Regex = .*+end
    Text = This is the end
    


    The possessive quantifier produces no matches because it causes a matcher to consume the entire text, leaving nothing left to match end. In contrast, the greedy quantifier in java RegexDemo .*end "This is the end" produces a match because it causes a matcher to keep backing off one character at a time until the rightmost end matches.



  • Boundary matchers

    We sometimes want to match patterns at the beginning of lines, at word boundaries, at the end of text, and so on. Accomplish that task with a boundary matcher, a regex construct that identifies a match location. Table 2 presents Pattern's supported boundary matchers.

    Table 2. Boundary matchers

    Boundary Matcher Description
    ^ The beginning of a line
    $ The end of a line
    \b A word boundary
    \B A nonword boundary
    \A The beginning of the text
    \G The end of the previous match
    \Z The end of the text (but for the final line terminator, if any)
    \z The end of the text


    The following command-line example uses the ^ boundary matcher metacharacter to ensure that a line begins with The followed by zero or more word characters:

    java RegexDemo ^The\w* Therefore
    


    ^ indicates that the first three text characters must match the pattern's subsequent T, h, and e characters. Any number of word characters may follow. The command line above produces the following output:

    • Print
    • Feedback

    Resources