Newsletter sign-up
View all newsletters

Sign up for our technology specific newsletters.

Enterprise Java
Email Address:

Breaking news in XML

Despite the scanty turnout, the recent XTech 2000 show produced several important XML/Java-related announcements

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone

Page 2 of 4

Prescod noted that SAX programming requires that you write your own dispatch code when processing elements. When a SAX startElement event occurs, for example, you must write code like this:

   if (element.equals("shoe") {
      ...
   } else if (element.equals("size") {
      ...


Furthermore, if the processing you do for an element depends on the current context, then you have to save your own state, as seen in this example:

  startElement(String element) {
    if (element.equals("title")) titleText = true;
    ...
  characters(...)
    if (titleText) {
       ++fontSize;  bold=true;
       ...
    else
       ...


One goal for EasySAX, then, was to eliminate such issues by allowing context-sensitive processing.

Another goal for EasySAX: improve on the DOM mechanism by putting into memory only those parts of the tree that you visit, rather than the entire tree. By doing so, it becomes possible to process huge data sets, for example, that would be difficult to process any other way. But to improve efficiency, Prescod's mechanism also allows the developer to ask that an entire subtree be loaded into memory. (For more information, see the Resources section below.)

The combination of large-model processing, efficient in-memory representation, and context-sensitive processing capability creates an appealing mechanism for processing XML. There is every reason to think that some clever Java programmer will take the same basic idea and implement it in Java.

SML: Is a simpler XML a good idea?

One of the more controversial proposals was the concept that XML -- and the parsers that depend on it -- should be simplified by taking away some of the little-used ingredients, like notations, that cause parser-development headaches. The resulting slimmed-down XML would be called simple XML, or SML.

Software AG's Mike Champion, author Simon St. Laurent, and DocuVerse's Don Park led the pro-SML discussion. They stressed that simplifying XML would mean smaller, faster, and easier-to-develop parsers, which in turn would make it easier to embed XML processing in small devices. It also would also make it easier for plain-text filters to be built using Perl scripts and the like, because the number of strange, seldom-used cases would decrease.

On the other hand, there was no clear agreement on the right ingredients to eliminate. The SML proposal calls for the elimination of attributes and CDATA sections, as well as processing instructions, comments, notations, DTDs, mixed content (text and elements), and external parsed entities. The so-called Common XML proposal, on the other hand, aims at voluntary restrictions on which parts of XML you use. It leaves in attributes and mixed content, but in other respects calls for XML users to leave out the same XML constructs that SML disallows.

For an opposing voice, Evan Lenz, a student at North Seattle Community College advanced an interesting philosophical argument against the proposed simplification. He posited that XML's real power stems from the standardization it enjoys. Because so many parsers, utilities, and XML-based languages are coming online, and because XML erases the distinction between documents and data, a whole host of fascinating applications, including AI projects, are becoming possible. If that train is to stay on track, he argued, XML should not be simplified in any way.

  • Digg
  • Reddit
  • SlashDot
  • Stumble
  • del.icio.us
  • Technorati
  • dzone
Comment
Login
Forgot your account info?
Add comment
Anonymous comments subject to approval. Register here for member benefits.
Have a JavaWorld account? Log in here. Register now for a free account.
Resources