jaxp.jar and parser.jar libraries are 1.4 MB each. If you are running with limited memory (for example, in a J2ME (Java 2 Platform, Micro Edition)
environment), or bandwidth is at a premium (for example, in an applet), using those large parsers might not be a viable solution.Those libraries' large size is partly due to having a lot of functionality—perhaps more than you require. They validate XML DTDs (document type definitions), possibly schemas, and more. However, you might already know that your application will receive valid XML. Also, you might already decide that you want just the UTF-8 character set. Therefore, you really want event-based processing of XML elements and translation of standard XML entities—you want a nonvalidating parser.
Note: You can download this article's source code in Resources.
You could implement SAX (Simple API for XML) interfaces with limited functionality, throwing an exception named NotImplemented when you encountered something unnecessary.
Undoubtedly, you could develop something much smaller than the 1.4 MB jaxp.jar/parser.jar libraries. But instead, you can cut down the code size even more by defining your own classes. In fact, the package we construct
here will be considerably smaller than the jar file containing the SAX interface definitions.
Our quick-and-dirty parser is event-based like the SAX parser. Also like the SAX parser, it lets you implement an interface to catch and process events corresponding to attributes and start/end element tags. Hopefully, those of you who have used SAX will find this parser familiar.
Many people want XML's simple, self-describing textual data format. They want to easily pick out elements, attributes and their values, and elements' textual content. With that in mind, let's consider what functionality we need to preserve.
Our simple parsing package has just one class, QDParser, and one interface, DocHandler. The QDParser itself has one public static method, parse(DocHandler,Reader), which we will implement as a finite state machine.
Our limited functionality parser treats the DTD <!DOCTYPE> and processing instructions <?xml version="1.0"?> simply as comments, so it won't be confused by their presence nor use their content.
Because we won't process DOCTYPE, our parser cannot read custom entity definitions. We will have only the standard ones available: &, <, >, ',
and ". If this is a problem, you can insert code to expand custom definitions, as the source code shows. Alternatively,
you could preprocess the document—replacing custom entity definitions with their expanded text before handing the document
to the QDParser.
Our parser also cannot support conditional sections; for example, <![INCLUDE[ ... ]]> or <![IGNORE[ ... ]]>. Without the ability to define custom entity definitions in DOCTYPE, we don't really need this functionality anyway. We could process such sections, if any, before the data is sent to our limited-space
application.
xmlBy Anonymous on June 20, 2009, 9:23 ami'm looking for mxl read progs
Reply | Read entire comment
SalamBy Anonymous on November 30, 2008, 10:00 amDear all I need some source code of How to create a Parser for XML program. If any one has a good suggestion then Plz send me on this e mail address. javid_icup@yahoo.com
Reply | Read entire comment
View all comments