Page 3 of 4
As is the case in most interesting arguments, neither camp is entirely wrong. XML could benefit from simplification, most notably in the area of notations -- a SGML holdover for binary (multimedia) objects that is better done with MIME standards.
On the other hand, anyone attempting to simplify the standard needs to ask a fundamental question: what is the basic subset we need to keep all of the XML tools we have developed, and need to develop, using XML? For example, the RELAX schema specification, discussed in more detail below, uses attributes. So perhaps attributes really need to be retained. The RELAX presentation also implied the need for mixed content, so perhaps that is necessary as well.
One questioner pointed out that CDATA sections require a lot of parser code, which make XML processing difficult. But without CDATA sections, how could you put a line drawing into XML? Would you be forced to use graphic-authoring tools? More importantly, since Extensible Style Language (XSL) uses CDATA sections for embedding processing scripts in a stylesheet, CDATA would appear to be necessary there as well.
Similarly, without external-parsed entities, how would you reference material from another document and include it inline? To be fair, SML is targeted at a world of pure data, rather than at defining a general-purpose standard useful for both data and documents, as XML is. For that purpose, then, perhaps external references are not required. (The RELAX schema standard described below doesn't include them, either, so perhaps they really are more trouble than they are worth.)
So, while some simplification seems like a good idea, it is not clear that we know exactly which simplification is in order. For the time being then, we should probably let things sit -- right after we get rid of notations.
(For a more detailed discussion on the advantages of schemata over DTDs, see the Sidebar below.)
However, the hoped-for W3C XML Schema standard remains in development. The industry players who are developing the standard have a long list of must-have features. The eventual result, by all accounts, will turn out to be something of a monolith. It will do what everyone says they need it to do, but it's going to take a lot of complex code to do it, and it's taking quite a bit of time for it to take shape.
Meanwhile, a former member of the schema-standards team came up with a better way, the Regular Language for XML, or RELAX. This Japanese standard is due to be submitted as a fast-track ISO proposal this summer. Makoto Murata, its author, took what appears in retrospect to be a simple idea: take the DTD, reformulate it in XML, take advantage of the structuring to provide context-sensitive definitions, and add the vitally important content validation.