Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Regular expressions simplify pattern-matching code

Discover the elegance of regular expressions in text-processing scenarios that involve pattern matching

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Page 2 of 16

To accomplish pattern matching, RegexDemo calls various methods in java.util.regex's classes. Don't concern yourself with understanding those methods right now; we'll explore them later in this article. More importantly, compile Listing 1: you need RegexDemo.class to explore Pattern's regex constructs.

Explore Pattern's regex constructs

Pattern's SDK documentation presents a section on regular expression constructs. Unless you're an avid regex user, an initial examination of that section might confuse you. What are quantifiers and the differences among greedy, reluctant, and possessive quantifiers? What are character classes, boundary matchers, back references, and embedded flag expressions? To answer those and other questions, we explore many of the regex constructs, or regex pattern categories, that Pattern recognizes. We begin with the simplest regex construct: literal strings.

Caution
Do not assume that Pattern's and Perl 5's regex constructs are identical. Although they share many similarities, they also share differences, ranging from disparities in the constructs they support to their treatment of dangling metacharacters. (For more information, examine your SDK documentation on the Pattern class, which you should have on your platform.)


Literal strings

You specify the literal string regex construct whenever you type a literal string in the search text field of your word processor's search dialog box. Execute the following RegexDemo command line to see this regex construct in action:

java RegexDemo apple applet


The command line above identifies apple as a literal string regex construct that consists of literal characters a, p, p, l, and e (in that order). The command line also identifies applet as text for pattern-matching purposes. After executing the command line, observe the following output:

Regex = apple
Text = applet
Found apple
  starting at index 0 and ending at index 5


The output identifies the regex and text command-line arguments, indicates a successful match of apple within applet, and presents the starting and ending indexes of that match: 0 and 5, respectively. The starting index identifies the first text location where a pattern match occurs, and the ending index identifies the first text location after the match. In other words, the range of matching text is inclusive of the starting index and exclusive of the ending index.

Metacharacters

Although literal string regex constructs are useful, more powerful regex constructs combine literal characters with metacharacters. For example, in a.b, the period metacharacter (.) represents any character that appears between a and b. To see the period metacharacter in action, execute the following command line:

java RegexDemo .ox "The quick brown fox jumps over the lazy ox."


The command line above specifies .ox as the regex and The quick brown fox jumps over the lazy ox. as the text command-line argument. RegexDemo searches the text for matches that begin with any character and end with ox, and produces the following output:

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comments (1)
Login
Forgot your account info?

zero-length matchesBy Anonymous on October 10, 2009, 11:48 amSo these are essentially spurious matches appended to genuine ones, for no apparent reason. Can anyone explain why this absurd behaviour wasn't stopped immediately...

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources