Java: A platform for platforms
Sun's reorg may seem promising to shareholders but it's also a scramble for position. The question now is whether Sun can, or wants to, maintain its hold on Java technology. Especially with enterprise leaders like SpringSource and RedHat investing heavily in Java's future as a platform for platforms

Also see:

Discuss: Tim Bray on 'What Sun Should Do'

Featured Whitepapers
Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Java Tip 128: Create a quick-and-dirty XML parser

Parse valid XML using minimal code

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
XML is a popular data format for several reasons: it is human readable, self-describing, and portable. Unfortunately, many Java-based XML parsers are very large; for example, Sun Microsystems' jaxp.jar and parser.jar libraries are 1.4 MB each. If you are running with limited memory (for example, in a J2ME (Java 2 Platform, Micro Edition) environment), or bandwidth is at a premium (for example, in an applet), using those large parsers might not be a viable solution.

Those libraries' large size is partly due to having a lot of functionality—perhaps more than you require. They validate XML DTDs (document type definitions), possibly schemas, and more. However, you might already know that your application will receive valid XML. Also, you might already decide that you want just the UTF-8 character set. Therefore, you really want event-based processing of XML elements and translation of standard XML entities—you want a nonvalidating parser.

Note: You can download this article's source code in Resources.

Why not just use SAX?

You could implement SAX (Simple API for XML) interfaces with limited functionality, throwing an exception named NotImplemented when you encountered something unnecessary.

Undoubtedly, you could develop something much smaller than the 1.4 MB jaxp.jar/parser.jar libraries. But instead, you can cut down the code size even more by defining your own classes. In fact, the package we construct here will be considerably smaller than the jar file containing the SAX interface definitions.

Our quick-and-dirty parser is event-based like the SAX parser. Also like the SAX parser, it lets you implement an interface to catch and process events corresponding to attributes and start/end element tags. Hopefully, those of you who have used SAX will find this parser familiar.

Limit XML functionality

Many people want XML's simple, self-describing textual data format. They want to easily pick out elements, attributes and their values, and elements' textual content. With that in mind, let's consider what functionality we need to preserve.

Our simple parsing package has just one class, QDParser, and one interface, DocHandler. The QDParser itself has one public static method, parse(DocHandler,Reader), which we will implement as a finite state machine.

Our limited functionality parser treats the DTD <!DOCTYPE> and processing instructions <?xml version="1.0"?> simply as comments, so it won't be confused by their presence nor use their content.

Because we won't process DOCTYPE, our parser cannot read custom entity definitions. We will have only the standard ones available: &amp, &lt;, &gt;, &apos;, and &quot;. If this is a problem, you can insert code to expand custom definitions, as the source code shows. Alternatively, you could preprocess the document—replacing custom entity definitions with their expanded text before handing the document to the QDParser.

Our parser also cannot support conditional sections; for example, <![INCLUDE[ ... ]]> or <![IGNORE[ ... ]]>. Without the ability to define custom entity definitions in DOCTYPE, we don't really need this functionality anyway. We could process such sections, if any, before the data is sent to our limited-space application.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comments (1)
Login
Forgot your account info?

SalamBy Anonymous on November 30, 2008, 10:00 amDear all I need some source code of How to create a Parser for XML program. If any one has a good suggestion then Plz send me on this e mail address. javid_icup@yahoo.com

Reply | Read entire comment

View all comments

Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources