|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 2 of 5
Regular expression: t(a|e|i|o|oo)n
Matches: tan, Ten, tin, ton, toon
Table 1 shows the quantifier notations used to determine how many times a given notation to the immediate left of the quantifier notation should repeat itself:
| Notation | Number of Times |
|---|---|
| * | 0 or more times |
| + | 1 or more times |
| ? | 0 or 1 time |
| {n} | Exactly n number of times |
| {n,m} | n to m number of times |
Let's say you want to search for a social security number in a text file. The format for US social security numbers is 999-99-9999. The regular expression you would use to match this is shown in Figure 1. In regular expressions, the hyphen ("-") notation has special meaning; it indicates a range that would match any number from 0 to 9. As a result, you must escape the "-" character with a forward slash ("\") when matching the literal hyphens in a social security number.

Figure 1. Matches: All social security numbers of the form 123-12-1234
If, in your search, you wish to make the hyphen optional -- if, say, you consider both 999-99-9999 and 999999999 acceptable formats -- you can use the "?" quantifier notation. Figure 2 shows that regular expression:

Figure 2. Matches: All social security numbers of the forms 123-12-1234 and 123121234
Let's take a look at another example. One format for US car plate numbers consists of four numeric characters followed by two letters. The regular expression first comprises the numeric part, "[0-9]{4}", followed by the textual part, "[A-Z]{2}". Figure 3 shows the complete regular expression:

Figure 3. Matches: Typical US car plate numbers, such as 8836KV

Figure 4. Matches: All words except those that start with the letter X
Say you're trying to extract the birth month from a person's birthdate. The typical birthdate is in the following format: June 26, 1951. The regular expression to match the string would be like the one in Figure 5:

Figure 5. Matches: All dates with the format of Month DD, YYYY
The new "\s" notation is the space notation and matches all blank spaces, including tabs. If the string matches perfectly, how do you extract the month field? You simply put parentheses around the month field, creating a group, and later retrieve the value using the ORO API (discussed in a following section). The appropriate regular expression is in Figure 6:

Figure 6. Matches: All dates with the format Month DD, YYYY, and extracts Month field as Group 1
To make life easier, some shorthand notations for commonly used regular expressions have been created, as shown in Table 2:
| Notation | Equivalent Notation |
|---|---|
| \d | [0-9] |
| \D | [^0-9] |
| \w | [A-Z0-9] |
| \W | [^A-Z0-9] |
| \s | [ \t\n\r\f] |
| \S | [^ \t\n\r\f] |
To illustrate, we can use "\d" for all instances of "[0-9]" we used before, as was the case with our social security number expressions. The revised regular expression is in Figure 7: