Recommended: Sing it, brah! 5 fabulous songs for developers
JW's Top 5
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 4 of 5
The final example you can enable is treatment of the end of each input line. If it is important to know when the end of a line is reached, you can tell the tokenizer to return an indication to that effect. You could also simply declare the ASCII linefeed character (0x0a) as an ordinary character, except that on platforms that ended lines with just an ASCII carriage return (0x0d), you would not see end-of-line indications. So the analyzer notes internally when an appropriate line terminating character has been reached and then returns that indication to your class. This feature has the additional benefit of hiding a small piece of platform-dependent behavior.
I know this all sounds tremendously complicated, so to help you out I've put together a StreamTokenizer exerciser applet along the same lines as the StringTokenizer exerciser above. The source to the StreamTokenizer exerciser applet is here. This applet is much larger than the StringTokenizer exerciser, as it offers many additional capabilities. The applet is on the page below, and you may want to open up a new
copy of the browser on this page so that you can keep the applet up while rereading the text.
It's pretty straightforward to operate the applet. On the left-hand side is an input text area. Type in one or more lines
of text to be analyzed here. The middle box is where the list of tokens will appear, and these will be prefaced with the characters
W for a word token, O for an ordinary character token, Q for a quote token, N for a number token, and <EOL> or <EOF> for the end-of-line and end-of-file meta tokens. The third box shows how the ASCII
characters are divided up into "O" ordinary, "W" word, and "B" blank (or whitespace) characters. To read this list, note that
each entry is applied in sequence starting with the first one and moving down. So the item "B[0, 32]" is read "The characters
with values 0 through 32 are treated as whitespace." The item "W[48, 122]" is read "The characters with values between 48
and 122 are treated as word characters." Later in the list you will see the item "O[91,96]," which means that characters 91
through 96 are treated as ordinary characters. Because this item is lower in the list than the word item above it, it overrides
that word item for characters in this range. These character ranges and the checkboxes on the right-hand side are only used
if the check box labelled "custom syntax" is selected; however, they are set to the "default" syntax that StreamTokenizer uses. This allows you to see the rules that are in effect, even if you don't have a custom syntax selected. On the bottom
of the applet are three sets of boxes; you can use these to add new characters to the word, ordinary, and blank character
ranges. Finally, the row of command buttons in the middle carry out the exact functions described by their names.
You may want to play around a bit with the applet. Here are some ideas to get you started.