|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Page 3 of 5
| State | Input | Action | New state |
|---|---|---|---|
| idle | word character | push back character | accumulate |
| ordinary character | return character | idle | |
| whitespace character | consume character | idle | |
| accumulate | word character | add to current word | accumulate |
| ordinary character | return current word push back character |
idle | |
| whitespace character | return current word consume character |
idle |
On top of this simple mechanism the StreamTokenizer class adds several heuristics. These include number processing, quoted string processing, comment processing, and end-of-line
processing.
The first example is number processing. Certain character sequences can be interpreted as representing a numerical value.
For example, the sequence of characters 1, 0, 0, ., and 0 adjacent to each other in the input stream represent the numerical
value 100.0. When all of the digit characters (0 through 9), the dot character (.), and the minus (-) character are specified
as being part of the word set, the StreamTokenizer class can be told to interpret the word it is about to return as a possible number. Setting this mode is achieved by calling
the parseNumbers method on the tokenizer object that you instantiated (this is the default). If the analyzer is in the accumulate state, and
the next character would not be part of a number, the currently accumulated word is checked to see if it is a valid number. If it is valid, it is returned,
and the scanner moves to the next appropriate state.
The next example is quoted string processing. It is often desirable to pass a string that is surrounded by a quotation character
(typically double (") or single (') quote) as a single token. The StreamTokenizer class allows you to specify any character as being a quoting character. By default they are the single quote (') and double
quote (") characters. The state machine is modified to consume characters in the accumulate state until either another quote
character or an end-of-line character is processed. To allow you to quote the quote character, the analyzer treats the quote
character preceded by a back slash (\) in the input stream and inside a quotation as a word character.
The third example is the processing of comments. Comments are generally considered to be text that is inserted into the input
stream for the human reader and are irrelevant to the machine consumer of the data. The StreamTokenizer supports ignoring comments by eliminating comments on the input and never returning them. The default comment processing
is to use the slash character (/) to delineate the start of a comment and to use the end of the line to delineate the end
of the comment. In many situations this is fine, however at times you may want to process C-like comments. The class supports
this if you turn off generic comment processing and then enable processing of either slash star (/* ... */) comments, slash
slash (// ...) comments, or both. For these methods to work, the slash character (/) must not be set to the comment character. As in the case of quotes, zero or more characters can be specified as the comment character.
When the comment character is encountered, the rest of the line is silently discarded.