Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
parser.jarlibraries are 1.4 MB each. If you are running with limited memory (for example, in a J2ME (Java 2 Platform, Micro Edition) environment), or bandwidth is at a premium (for example, in an applet), using those large parsers might not be a viable solution.
Those libraries' large size is partly due to having a lot of functionality—perhaps more than you require. They validate XML DTDs (document type definitions), possibly schemas, and more. However, you might already know that your application will receive valid XML. Also, you might already decide that you want just the UTF-8 character set. Therefore, you really want event-based processing of XML elements and translation of standard XML entities—you want a nonvalidating parser.
Note: You can download this article's source code in Resources.
You could implement SAX (Simple API for XML) interfaces with limited functionality, throwing an exception named
NotImplemented when you encountered something unnecessary.
Undoubtedly, you could develop something much smaller than the 1.4 MB
jaxp.jar/parser.jar libraries. But instead, you can cut down the code size even more by defining your own classes. In fact, the package we construct
here will be considerably smaller than the jar file containing the SAX interface definitions.
Our quick-and-dirty parser is event-based like the SAX parser. Also like the SAX parser, it lets you implement an interface to catch and process events corresponding to attributes and start/end element tags. Hopefully, those of you who have used SAX will find this parser familiar.
Many people want XML's simple, self-describing textual data format. They want to easily pick out elements, attributes and their values, and elements' textual content. With that in mind, let's consider what functionality we need to preserve.
Our simple parsing package has just one class,
QDParser, and one interface,
QDParser itself has one public static method,
parse(DocHandler,Reader), which we will implement as a finite state machine.
Our limited functionality parser treats the DTD
<!DOCTYPE> and processing instructions
<?xml version="1.0"?> simply as comments, so it won't be confused by their presence nor use their content.
Because we won't process
DOCTYPE, our parser cannot read custom entity definitions. We will have only the standard ones available: &, <, >, ',
and ". If this is a problem, you can insert code to expand custom definitions, as the source code shows. Alternatively,
you could preprocess the document—replacing custom entity definitions with their expanded text before handing the document
Our parser also cannot support conditional sections; for example,
<![INCLUDE[ ... ]]> or
<![IGNORE[ ... ]]>. Without the ability to define custom entity definitions in
DOCTYPE, we don't really need this functionality anyway. We could process such sections, if any, before the data is sent to our limited-space