Regular Expressions in Groovy (via Java)

1 2 Page 2
Page 2 of 2

/**

* Regular Expression: [$"'\n\d/\\]"

* For Java, must escape the double quote, the \d, and the \\

*/

private final static String REG_EX_COOKBOOK_31_REGEX_STRING = "[$\"'\n\\d/\\\\]";

/**

* Superset String to be used in various Matcher demonstrations, but its key

* differentiating characteristic is that it does NOT begin with a match for

* the regular expression (does not being with $, ", ', new line, numeric

* digit, or backslash).

*/

private final static String SUPERSET_STRING = "regular\\expressions$can_be_2tons\"of'fun.";

/**

* String that should always be match (even exact) for Regular Expressions

* Cookbook Recipe 3.1 regular expression.

*/

private final static String EXACT_MATCH_STRING = "$";

/**

* Superset String set up to start with a match for the Recipe 3.1 regular

* expression pattern from Regular Expressions Cookbook.

*/

private final static String SUPERSET_STRING_STARTING_WITH_MATCH =

EXACT_MATCH_STRING + SUPERSET_STRING;

/**

* Demonstrate Matcher.matches().

*/

private static void demonstrateMatches()

{

final Pattern regExCookbook31Pattern = obtainPatternForRegularExpressionsCookbookRecipe3_1();

final String formatString = "%n%s%s%san EXACT match for regular expression %s";

final Matcher matcher1 = regExCookbook31Pattern.matcher(SUPERSET_STRING);

final boolean exactMatch1 = matcher1.matches();

out.println(

String.format(

formatString,

exactMatch1 ? "YES! " : "NO :( ",

SUPERSET_STRING,

exactMatch1 ? " IS " : " is NOT ",

regExCookbook31Pattern.pattern()));

final Matcher matcher2 = regExCookbook31Pattern.matcher(EXACT_MATCH_STRING);

final boolean exactMatch2 = matcher2.matches();

out.println(

String.format(

formatString,

exactMatch2 ? "YES! " : "NO :( ",

EXACT_MATCH_STRING,

exactMatch2 ? " IS " : " is NOT ",

regExCookbook31Pattern.pattern()));

final Matcher matcher3 = regExCookbook31Pattern.matcher(SUPERSET_STRING_STARTING_WITH_MATCH);

final boolean exactMatch3 = matcher3.matches();

out.println(

String.format(

formatString,

exactMatch3 ? "YES! " : "NO :( ",

SUPERSET_STRING_STARTING_WITH_MATCH,

exactMatch3 ? " IS " : " is NOT ",

regExCookbook31Pattern.pattern()));

}

/**

* Demonstrate Matcher.lookingAt().

*/

private static void demonstrateLookingAt()

{

final Pattern regExCookbook31Pattern = obtainPatternForRegularExpressionsCookbookRecipe3_1();

final String formatString = "%n%s%s%sbegin with a match for regular expression %s";

final Matcher matcher1 = regExCookbook31Pattern.matcher(SUPERSET_STRING);

final boolean portionMatch1 = matcher1.lookingAt();

out.println(

String.format(

formatString,

portionMatch1 ? "YES! " : "NO :( ",

SUPERSET_STRING,

portionMatch1 ? " DOES " : " does NOT ",

regExCookbook31Pattern.pattern()));

final Matcher matcher2 = regExCookbook31Pattern.matcher(EXACT_MATCH_STRING);

final boolean portionMatch2 = matcher2.lookingAt();

out.println(

String.format(

formatString,

portionMatch2 ? "YES! " : "NO :( ",

EXACT_MATCH_STRING,

portionMatch2 ? " DOES " : " does NOT ",

regExCookbook31Pattern.pattern()));

final Matcher matcher3 = regExCookbook31Pattern.matcher(SUPERSET_STRING_STARTING_WITH_MATCH);

final boolean portionMatch3 = matcher3.lookingAt();

out.println(

String.format(

formatString,

portionMatch3 ? "YES! " : "NO :( ",

SUPERSET_STRING_STARTING_WITH_MATCH,

portionMatch3 ? " DOES " : " does NOT ",

regExCookbook31Pattern.pattern()));

}

/**

* Apply Matcher.find() to determine the number of matches in the provided

* sequences to the provided Pattern.

*/

private static void demonstrateFindToCountMatches()

{

final Pattern regExCookbook31Pattern = obtainPatternForRegularExpressionsCookbookRecipe3_1();

final String formatString = "%n%s contains %d matches for regular expression %s";

final Matcher matcher1 = regExCookbook31Pattern.matcher(SUPERSET_STRING);

int numMatches1 = 0;

while (matcher1.find())

{

numMatches1++;

}

out.println(

String.format(

formatString,

SUPERSET_STRING,

numMatches1,

regExCookbook31Pattern.pattern()));

final Matcher matcher2 = regExCookbook31Pattern.matcher(EXACT_MATCH_STRING);

int numMatches2 = 0;

while (matcher2.find())

{

numMatches2++;

}

out.println(

String.format(

formatString,

EXACT_MATCH_STRING,

numMatches2,

regExCookbook31Pattern.pattern()));

final Matcher matcher3 = regExCookbook31Pattern.matcher(SUPERSET_STRING_STARTING_WITH_MATCH);

int numMatches3 = 0;

while (matcher3.find())

{

numMatches3++;

}

out.println(

String.format(

formatString,

SUPERSET_STRING_STARTING_WITH_MATCH,

numMatches3,

regExCookbook31Pattern.pattern()));

}

When the code above is executed (as part of a class and after being invoked), the output appears as shown in the next screen snapshot.

This example confirms what the Javadoc states about the Matcher.matches() and Matcher.lookingAt() methods. In particular, we see that Matcher.matches() looks for an exact match from the beginning of the provided String against the regular expression while Matcher.lookingAt() only verifies that the provided expression begins with a portion matching the regular expression and does NOT require an exact match. The Matcher.find() method finds and allows action upon any and all matches. The Methods of the Matcher Class portion of the Java Tutorial provides another example of these two methods.

Whew! It's been a long way getting here, but I've now covered enough Java handling of regular expressions to move onto what Groovy can do with regular expressions.

I mentioned previously that Groovy doesn't require the developer to explicitly instantiate a Pattern instance to get access to one. Instead this can be done implicitly by prefixing a String with the ~ character. Groovy's goodness doesn't stop there. Groovy also supports "easier" (certainly more concise) syntax for using Java's Matcher. This Groovy syntax is more Perl-like in nature than the Java API counterpart.

The Groovy =~ operator acts something like instantiating Java's Matcher while the ==~ operator acts more like Java's Matcher.matches() (exact match) method. However, it's better than that. The Matcher instance provided by =~ implicitly returns multiple boolean values when used in a conditional statement. Looking at Groovy code makes it more obvious what's happening.

NetBeans 6.9 includes Groovy support and it helps us to see that plopping the escape character-ridden Java String used for a regular expression in the Java example above is actually not allowed in Groovy. Here's what NetBeans 6.9 shows for such a case (note the red squiggly line and the error message; thanks NetBeans!).

The small snippet of code shown above in the NetBeans 6.9 editor window indicates that the dollar sign ($) needs to be escaped in Groovy when specifying the regular expression using double quotes. However, the next line shows that this is not necessary when the "slashy syntax" is used instead of the double quotes. For many developers who use regular expressions, the slashy syntax may be more appealing anyway, but the fact that it doesn't require escaping of $ and the fact that it isn't as confusing when there are double quotes naturally present in the regular expression are sweet morsels.

Using the slashy syntax helped avoid the need to escape the $ or the ", but how does one handle a slash in the regular expression when using slashy syntax? It turns out that NetBeans 6.9 warns us about that.

In response to the NetBeans flagged errors, I can fix the regular expression Pattern definitions to get the following Groovy code that creates Pattern instances with both quoted strings and slashy strings:

#!/usr/bin/env groovy
/*
 * Demonstrate how Groovy simplifies regular expression handling.
 */

// Setting up Pattern instances in Groovy

def patternQuoted = ~"[\$\"'\n\\d/\\\\]"
println "It's a Pattern: ${patternQuoted.class} (quoted) for regular expression ${patternQuoted.pattern()}"
def patternSlashy = ~/[$"'\n\d\/\\]/
println "It's a Pattern: ${patternSlashy.class} (slashy) for regular expression ${patternSlashy.pattern()}"

When the above Groovy script is executed, the output looks like that shown in the next screen snapshot.

The snapshot shows the general advantage in presentation of the slashy syntax as compared to the quoted syntax. The slashy syntax required far less escaping than the quoted syntax or the equivalent Java syntax, making the String closer to the original regular expression.

There is one downside evident from this. Note that the new line \n is left in place as two characters rather than being treated as the new line character in the slashy syntax. This can be addressed by using the syntax ${"\n"} in place of the "\n" in the slashy syntax String as shown next:

def patternQuoted = ~"[\$\"'\n\\d/\\\\]"
println "It's a Pattern: ${patternQuoted.class} (quoted) for regular expression ${patternQuoted.pattern()}"
def patternSlashy = ~/[$"'${"\n"}\d\/\\]/
println "It's a Pattern: ${patternSlashy.class} (slashy) for regular expression ${patternSlashy.pattern()}"

Being required to express the newline as ${"\n"} instead of simply "\n" is less than desirable. Fortunately, I don't need a match to a newline for the examples in this post and I generally don't need them in real life use either. Even when I might, I prefer this small price to buy the advantages of the slashy syntax.

The next code listing demonstrates Matcher handling in Groovy:

// Setting up Matcher instances in Groovy

def findRegExCookbook31MatchesQuoted = "regular\\expressions\$can_be_2tons\"of'fun." =~ patternQuoted
println "It's a Matcher for Quoted Pattern!: ${findRegExCookbook31MatchesQuoted.class}"
println "\tNumber of matches (count): ${findRegExCookbook31MatchesQuoted.count}"
println "\tNumber of matches (size()): ${findRegExCookbook31MatchesQuoted.size()}"

def findRegExCookbook31MatchesSlashy = "regular\\expressions\$can_be_2tons\"of'fun." =~ patternSlashy
println "It's a Matcher for Slashy Pattern!: ${findRegExCookbook31MatchesQuoted.class}"
println "\tNumber of matches (count): ${findRegExCookbook31MatchesQuoted.count}"
println "\tNumber of matches (size()): ${findRegExCookbook31MatchesQuoted.size()}"

def findRegExCookbook31ExactMatchQuoted = "regular\\expressions\$can_be_2tons\"of'fun." ==~ patternQuoted
println "It's a Boolean for Quoted Pattern!: ${findRegExCookbook31ExactMatchQuoted.class}"
println "\t${findRegExCookbook31ExactMatchQuoted ? 'Exact Match!' : 'NOT Exact Match.'}"

def findRegExCookbook31ExactMatchSlashy = "regular\\expressions\$can_be_2tons\"of'fun." ==~ patternSlashy
println "It's a Boolean for Quoted Pattern!: ${findRegExCookbook31ExactMatchSlashy.class}"
println "\t${findRegExCookbook31ExactMatchSlashy ? 'Exact Match!' : 'NOT Exact Match.'}"

def findRegExCookbook31ExactMatchQuoted2 = '$' ==~ patternQuoted
println "It's a Boolean for Quoted Pattern!: ${findRegExCookbook31ExactMatchQuoted2.class}"
println "\t${findRegExCookbook31ExactMatchQuoted2 ? 'Exact Match!' : 'NOT Exact Match.'}"

def findRegExCookbook31ExactMatchSlashy2 = '$' ==~ patternSlashy
println "It's a Boolean for Quoted Pattern!: ${findRegExCookbook31ExactMatchSlashy2.class}"
println "\t${findRegExCookbook31ExactMatchSlashy2 ? 'Exact Match!' : 'NOT Exact Match.'}"

The output from running this, shown in the next screen snapshot, tells the tale.

This output confirms that the Groovy =~ operator provides a Matcher instance that is smarter than your average Matcher. That's because the Groovy Matcher (part of Groovy GDK) provides many additional utility methods including two used in the code above (getCount() used as count property and size() method). The GDK's Matcher.asBoolean() method is behind the magic that allows a Groovy Matcher to return a boolean in a conditional expression. I don't discuss it here, but the GDK does provide one small extension to the Pattern class as well.

Mr. Haki provides a nice overview of Groovy's treatment of regular expressions via Matchers in his post Groovy Goodness: Matchers for Regular Expressions. He similarly covers Groovy handling of Patterns in the post Groovy Goodness: Using Regular Expression Pattern Class.

Conclusion

The ability to use more natural-looking (at least by regular expressions standards) regular expressions, the ability to use operators rather than APIs and method calls, and the extra "smarts" added to GDK's extensions of the Java RegEx library make using regular expressions in Groovy easier and more natural to the people who are probably most familiar with regular expressions: script writers. As is true with most of Groovy, Groovy's regular expression support is a reflection of Java's regular expression support. Generally speaking, anything one knows about regular expressions in Java (including the syntax supported in Java's flavor/dialect) can be applied when using regular expressions in Groovy. In several cases, though, Grooy makes it easier to use. With anything, but especially with regular expressions, easier is always better.

Additional Resources

I mentioned previously that there are numerous great online resources on Groovy's support for regular expressions. Some of them are listed here. I especially recommend the first listed resource (Groovy: Don't Fear the RegExp) and book I frequently cited in this post (Regular Expressions Cookbook).

Regular Expressions Cookbook

Groovy: Don’t Fear the RegExp

Groovy Regular Expressions

Groovy Tutorial 4 - Groovy Regular Expressions Basics

⇒ Groovy Goodness: Using Regular Expression Pattern Class

⇒ Groovy Goodness: Matchers for Regular Expressions

Big Collection of Regular Expressions (not specific to Groovy)

Finding Files by Name with Groovy

Online Regular Expression Test Page (uses java.util.regex)

RegexBuddy

RegexPal

Regular Expression Tool

Original Post Available at http://marxsoftware.blogspot.com/

.

Related:
1 2 Page 2
Page 2 of 2