The World Wide Web Consortium (W3C) was founded in 1994 to develop common protocols for the evolution of the World Wide Web. W3C is the international standards organization that brought you HTML. Currently, the W3C is reviewing, among other technologies, XML (eXtensible Markup Language) 1.0, a "meta-grammar" that allows for Web automation and data interchange across multiple platforms and applications.
So why should you as a Java developer be concerned with this emerging technology? Well, Java and XML complement each other. Java provides platform-independence, XML provides application-independence; Java gives the consumer a choice of platforms, XML gives the consumer a choice of applications. XML furthers the cause of Java by furthering the cause for consumer freedom.
Java provides a platform-independent coding environment, and XML provides a similar universality in terms of how it expresses and formats data. In essence, XML provides a grammar that can be used to create self-describing data file formats. Thus Java can be viewed as the universal Virtual Machine, and XML can be viewed as the universal Virtual Document. Java is a perfect architecture and vendor-neutral language for processing these architecture and vendor-neutral documents.
Now I'm no XML expert, but I have been doing my homework -- reading the XML 1.0 specification, corresponding with members of the Working Group, and mulling over many of the XML FAQ and tutorials that are available. XML is chock-full of bewildering new acronyms and specialized language. My goal for this article is not to teach you XML -- you can study up on your own with one of several good tutorials (see Resources). Rather, my aim is to cut through as much of this language as possible and explain the significance of XML to you, the Java programmer.
Before I begin, I must acknowledge that the connections between Java and XML have been admirably elucidated by John Bosak in his seminal paper: "XML, Java, and the future of the Web." I simply aim to add my perspective on this emerging technology.
XML is a simplified dialect of SGML (Standard Generalized Markup Language). For those of you unfamiliar with SGML, it is an international standard (ISO-8879) for defining descriptions of the structure and content of documents in an electronic form. XML simplifies SGML by capturing about 80 percent of SGML's functionality with only 20 percent of the complexity.
HTML, which is a description of the structure and content of a single type of document called a "Web page," is just one instance of what can be created with SGML. In other words, if HTML is a single knit sweater, SGML and XML are how-to books on knitting. By learning XML, you can create sweaters, socks, leg warmers, or any kind of knitted apparel you want!
As I noted earlier, XML currently is working its way through the W3C standards process. For more information on what this means for XML, see the sidebar W3C reviews XML.
Let's look at each of these characteristics in more detail.