|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 4 of 5
After initializing the strings, instantiate the PatternCompiler object and create a Pattern object by using the PatternCompiler to compile the regular expression:
PatternCompiler compiler=new Perl5Compiler();
Pattern pattern=compiler.compile(regexp);
Now, create the PatternMatcher object and call the contain() method in the PatternMatcher interface to see if you have a match:
PatternMatcher matcher=new Perl5Matcher();
if (matcher.contains(logEntry,pattern)) {
MatchResult result=matcher.getMatch();
System.out.println("IP: "+result.group(1));
System.out.println("Timestamp: "+result.group(2));
}
Next, print out the matched groups using the MatchResult object returned from the PatternMatcher interface. Since the logEntry string contains the pattern to be matched, you could expect the following output:
IP: 172.26.155.241
Timestamp: 26/Feb/2001:10:56:03 -0500
Your next task is to churn through your company's HTML pages and perform an analysis of all of a font tag's attributes. The typical font tag in your HTML looks like this:
<font face="Arial, Serif" size="+2" color="red">
Your program will print out the attributes for every font tag encountered in the following format:
face=Arial, Serif
size=+2
color=red
In this case, I would suggest that you use two regular expressions. The first, shown in Figure 11, extracts "face="Arial, Serif" size="+2" color="red" from the font tag:

Figure 11. Matches: The all-attribute part of the font tag
The second regular expression, shown in Figure 12, breaks down each individual attribute into a name-value pair:

Figure 12. Matches: Each individual attribute, broken down into a name-value pair
Figure 12 breaks into:
font Arial, Serif
size +2
color red
Let's now discuss the code to achieve this. First, create the two regular expression strings and compile them into a Pattern object using the Perl5Compiler. Use the Perl5Compiler.CASE_INSENSITIVE_MASK option here when compiling the regular expression for a case-insensitive match.
Next, create a Perl5Matcher object to perform matching:
String regexpForFontTag="<\\s*font\\s+([^>]*)\\s*>";
String regexpForFontAttrib="([a-z]+)\\s*=\\s*\"([^\"]+)\"";
PatternCompiler compiler=new Perl5Compiler();
Pattern patternForFontTag=compiler.compile(regexpForFontTag,Perl5Compiler.CASE_INSENSITIVE_MASK);
Pattern patternForFontAttrib=compiler.compile(regexpForFontAttrib,Perl5Compiler.CASE_INSENSITIVE_MASK);
PatternMatcher matcher=new Perl5Matcher();
Assume you have a variable called html of type String that represents a line in the HTML file. If the content of the html string contains the font tag, the matcher will return true, and you'll use the MatchResult object returned from the matcher object to get your first group, which includes all of your font attributes:
if (matcher.contains(html,patternForFontTag)) {
MatchResult result=matcher.getMatch();
String attribs=result.group(1);
PatternMatcherInput input=new PatternMatcherInput(attribs);
while (matcher.contains(input,patternForFontAttrib)) {
result=matcher.getMatch();
System.out.println(result.group(1)+": "+result.group(2));
}
}
Next, create a PatternMatcherInput object. As previously mentioned, this object lets you continue matching from where the last match was found in the string;
thus, it's perfect for extracting the font tag's name-value pair. Create a PatternMatcherInput object by passing in the string to be matched. Then, use the matcher instance to extract each font attribute as it is encountered.
This is done by repeatedly calling the contains() method of the PatternMatcher object with the PatternMatcherInput object instead of a string. Every iteration through the PatternMatcherInput object will advance a pointer within it, so the next test will start where the previous one left off.