Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Matchmaking with regular expressions

Use the power of regular expressions to ease text parsing and processing

  • Print
  • Feedback

Page 5 of 5

The output of the example is as follows:

        face: Arial, Serif
        size: +1
        color: red


More HTML processing

Let's continue with another HTML example. This time, imagine that your Web server has moved from widgets.acme.com to newserver.acme.com. You'll need to change the links on some of your Webpages from:

<a href="http://widgets.acme.com/interface.html#How_To_Buy">
<a href="http://widgets.acme.com/interface.html#How_To_Sell">
etc.


to

<a href="http://newserver.acme.com/interface.html#How_To_Buy">
<a href="http://newserver.acme.com/interface.html#How_To_Sell">
etc.


The regular expression to perform the search is shown in Figure 13.


Figure 13. Matches: The link "http://widgets.acme.com/interface.html#(any anchor). Click on thumbnail to view full-size image. (30 KB)


If this regular expression is found, you can make your substitution for the link in Figure 13 with the following expression:

<a href="http://newserver.acme.com/interface.html#">


Notice that you use after the # character. Perl regular expression syntax uses , , and so forth to represent groups that have been matched and extracted. The expression shown in Figure 13 appends whatever text has been matched and extracted as Group 1 to the link.

Now, back to Java. As usual, you must create your testing strings, the necessary object for compiling the regular expression into a Pattern object, and a PatternMatcher object:

        String link="<a href=\"http://widgets.acme.com/interface.html#How_To_Trade\">";
        String regexpForLink="<\\s*a\\s+href\\s*=\\s*\"http://widgets.acme.com/interface.html#([^\"]+)\">";
        PatternCompiler compiler=new Perl5Compiler();
        Pattern patternForLink=compiler.compile(regexpForLink,Perl5Compiler.CASE_INSENSITIVE_MASK);
        PatternMatcher matcher=new Perl5Matcher();


Next, use the static method substitute() from the Util class in the com.oroinc.text.regex package for performing a substitution, and print out the resulting string:

        String result=Util.substitute(matcher,
                                      patternForLink,
                                      new Perl5Substitution(
                                        "<a href=\"http://newserver.acme.com/interface.html#\">"),
                                      link,
                                      Util.SUBSTITUTE_ALL);
        System.out.println(result);


The syntax of the Util.substitute() method is as follows:

        public static String substitute(PatternMatcher matcher,
                                        Pattern pattern,
                                        Substitution sub,
                                        String input,
                                        int numSubs)


The first two parameters for this call are the PatternMatcher and Pattern objects created earlier. The input for the third parameter is a Substitution object that determines how the substitution is to be performed. In this case, use the Perl5Substitution object, which lets you use a Perl 5-style substitution. The fourth parameter is the actual string on which you wish to perform the substitution, and the last parameter lets you specify whether you wish to substitute on every occurrence of the pattern found (Util.SUBSTITUTE_ALL) or only substitute a specified number of times.

About the author

Benedict Chng is a Sun-certified developer currently consulting in the Boston area. He hails from sunny and tropical Singapore and has been working in the software development field for close to four years. His current interests include writing applications for Palm devices and sightseeing in the New England region.

Express yourself

In this article, I've shown you the powerful features of regular expressions. When used appropriately, they can help a great deal in string extraction and text changes. I have also shown how you can incorporate regular expressions into your Java application using the open source Jakarta-ORO library. Now, it's up to you to decide whether the old string manipulation approach (using StringTokenizers, charAt, or substring) or a regular expression library, like Jakarta-ORO, works for you.

  • Print
  • Feedback

Resources