Wizard API updated!
Tim Boudreau has released a new version of the Swing Wizard library (version 0.997) that fixes the WizardException bug reported in JavaWorld's recent Open Source Java Project profile. The article's examples have been reworked to test out the new, improved WizardException. Thanks, Tim, for this helpful fix!
Open Source Java Projects: The Wizard API

Newsletter sign-up

Sign up for our technology specific newsletters.

Enterprise Java
View all newsletters

Email Address:

Matchmaking with regular expressions

Use the power of regular expressions to ease text parsing and processing

If you've programmed in Perl or any other language with built-in regular-expression capabilities, then you probably know how much easier regular expressions make text processing and pattern matching. If you're unfamiliar with the term, a regular expression is simply a string of characters that defines a pattern used to search for a matching string.

Many languages, including Perl, PHP, Python, JavaScript, and JScript, now support regular expressions for text processing, and some text editors use regular expressions for powerful search-and-replace functionality. What about Java? At the time of this writing, a Java Specification Request that includes a regular expression library for text processing has been approved; you can expect to see it in a future version of the JDK.

But what if you need a regular expression library now? Luckily, you can download the open source Jakarta ORO library from Apache.org. In this article, I'll first give you a short primer on regular expressions, and then I'll show you how to use regular expressions with the open source Jakarta-ORO API.

Regular expressions 101

Let's start simple. Suppose you want to search for a string with the word "cat" in it; your regular expression would simply be "cat". If your search is case-insensitive, the words "catalog", "Catherine", or "sophisticated" would also match:

Regular expression: cat
Matches: cat, catalog, Catherine, sophisticated


The period notation

Imagine you are playing Scrabble and need a three-letter word starting with the letter "t" and ending with the letter "n". Imagine also that you have an English dictionary and will search through its entire contents for a match using a regular expression. To form such a regular expression, you would use a wildcard notation -- the period (.) character. The regular expression would then be "t.n" and would match "tan", "Ten", "tin", and "ton"; it would also match "t#n", "tpn", and even "t n", as well as many other nonsensical words. This is because the period character matches everything, including the space, the tab character, and even line breaks:

Regular expression: t.n
Matches: tan, Ten, tin, ton, t n, t#n, tpn, etc.


The bracket notation

To solve the problem of the period's indiscriminate matches, you can specify characters you consider meaningful with the bracket ("[]") expression, so that only those characters would match the regular expression. Thus, "t[aeio]n" would just match "tan", "Ten", "tin", and "ton". "Toon" would not match because you can only match a single character within the bracket notation:

Regular expression: t[aeio]n
Matches: tan, Ten, tin, ton


The OR operator

If you want to match "toon" in addition to all the words matched in the previous section, you can use the "|" notation, which is basically an OR operator. To match "toon", use the regular expression "t(a|e|i|o|oo)n". You cannot use the bracket notation here because it will only match a single character. Instead, use parentheses -- "()". You can also use parentheses for groupings (more on that later):

1 | 2 | 3 |  Next >
Resources