Page 2 of 5
XMLPull offers two access levels to the document data, letting you choose the detail level your program wants to see. When
you call the next() method, the parser ignores a document's minor details and only reports the meatier components: elements and text. The next() method limits the values to four:
START_TAG for an element's start tag
TEXT for character data content
END_TAG for an element's end tag
END_DOCUMENT for when you've reached the end of the document data
In contrast, the nextToken() method provides more detailed access to the document structure, including components such as processing instructions, comments,
entity references, and more. In fact, the nextToken() method gives a "full disclosure" document view; where next() silently skips components it doesn't report, nextToken() reports everything.
Why support full disclosure in a parser API? Reporting everything present in the input stream allows you to layer functionality.
For example, neither current XMLPull implementation supports document validation, but the nextToken() parse view of the document offers enough detail that validation could sit as a wrapper layer on top of the basic parsers.
Using that approach, only one validation code implementation adds validation support for all XMLPull implementations.
Layering represents a powerful feature. The original SAX interface did not report all the information needed for document validation, so parser writers had to build validation into the parser if they wanted to support it at all. That led to duplicated effort to implement validation within different parsers. Even now many SAX2 parsers do not support validation. In contrast, XMLPull's design avoids the problem completely.
Most XML applications need only the five basic document components the next() method reports. Of the five, only START_TAG and TEXT warrant a closer look, as START_DOCUMENT, END_TAG, and END_DOCUMENT are self explanatory.
START_TAG provides information from an element's start tag, including the element's attributes. The XmlPullParser interface defines three methods for accessing the element name information: getName() for the local name, along with getNamespace() and getPrefix() for namespace information. The interface also defines six methods for accessing attribute values: getAttributeValue(namespace, name) to retrieve an attribute value by name, along with getAttributeCount(), getAttributeName(index), getAttributeNamespace(index), getAttributePrefix(index), and getAttributeValue(index) for direct indexed access to attributes.
TEXT supplies character-data content information. You can access the character data in two ways: First, the getText() method can get just the text as a string and avoid any details. Second, the getTextCharacters(holder) method can access the raw characters (as with the characters(ch, start, length) handler call in the SAX2 interface). The latter method requires some explanation: it directly returns an array that holds
the characters, but the starting position in the array and the length of the character data are returned as values in the
int[2] array passed as a call parameter—the start position at [0] and the number of characters at [1].
That's all you need to know for most XMLPull uses. You'll find much more in the API, including access to the internal namespace stack, document text position, and element nesting depth, but you can dig into these details directly in the Javadocs if you're interested.
In Part 2, I included code for processing a financial-trade history document using the XPP2 pull-parser interface. Let's look at the changes required to bring that code up to XMLPull compatibility.
Fortunately, you'll need to substantially change only the PullWrapper class, since it has most of the parser-dependent code. Here's the new version:
public class PullWrapper
{
/** Parser in use. */
protected XmlPullParser m_parser;
/** Constructor. Builds the shared objects used for parsing. */
public PullHandler() throws XmlPullParserException {
XmlPullParserFactory factory = XmlPullParserFactory.newInstance();
m_parser = factory.newPullParser();
}
/** Parse start of element from document. */
protected void parseStartTag(String tag)
throws IOException, XmlPullParserException {
while (true) {
switch (m_parser.next()) {
case XmlPullParser.START_TAG:
if (m_parser.getName().equals(tag)) {
return;
}
// Fall through for error handling.
case XmlPullParser.END_TAG:
case XmlPullParser.END_DOCUMENT:
throw new XmlPullParserException
("Missing expected start tag " + tag);
}
}
}
/** Parse end of element from document. */
protected String parseEndTag(String tag)
throws IOException, XmlPullParserException {
String text = null;
while (true) {
switch (m_parser.next()) {
case XmlPullParser.TEXT:
text = m_parser.getText().trim());
break;
case XmlPullParser.END_TAG:
if (m_parser.getName().equals(tag)) {
return text;
}
// Fall through for error handling.
case XmlPullParser.START_TAG:
case XmlPullParser.END_DOCUMENT:
throw new XmlPullParserException
("Missing expected end tag " + tag);
}
}
}
/** Parse element, returning content with white space trimmed. */
protected String parseElementContent(String tag)
throws IOException, XmlPullParserException {
parseStartTag(tag);
return parseEndTag(tag);
}
/** Get attribute value from current start tag. */
protected String attributeValue(String name)
throws IOException, XmlPullParserException {
String value = m_parser.getAttributeValue(null, name);
if (value == null) {
throw new XmlPullParserException("Missing attribute " + name);
} else {
return value;
}
}
}
Not much has changed, except that the XPP2 interface used separate objects (XmlStartTag and XmlEndTag) to report information about a start or end tag, while the XMLPull common API makes the information directly available from
the parser.
The only other necessary change: Remove the call to the parser's reset() method from the TradePullHandler class. When that's done, everything works as expected, and the example program can now use any XMLPull implementation (currently
XPP3 and kXML, but more will be coming soon).
A new Java Community Process (JCP) specification request specifies a standard API for Java pull parsers. As of yet, I can't say what will happen because the project, JSR-173: Streaming API for XML, has just started, but the results will prove important for the long term.