Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

Practical XML Schema

A Java programmers guide to XML Schema and Namespaces

  • Print
  • Feedback

XML has two main advantages: first, it offers a standard way of structuring data, and, second, we can specify the vocabulary the data uses. We can define the vocabulary (what elements and attributes an XML document can use) using either a document type definition (DTD) or the XML Schema language.

DTDs were inherited from XML's origins as SGML (Standard Generalized Markup Language) and, as such, are limited in their expressiveness. DTDs are for expressing a text document's structure, so all entities are assumed to be text. The XML Schema language more closely resembles the way a database describes data.

Schemas provide the ability to define an element's type (string, integer, etc.) and much finer constraints (a positive integer, a string starting with an uppercase letter, etc.). DTDs enforce a strict ordering of elements; schemas have a more flexible range of options (elements can be optional as a group, in any order, in strict sequence, etc.). Finally schemas are written in XML, whereas DTDs have their own syntax.

As you'll see in this article, schemas themselves are quite straightforward—I find them easier than DTDs as there is no extra syntax to remember. The difficulties arise in using XML Namespaces and in getting the Java parsers to validate XML against a schema.

In this article, I first cover the basics of XML Schema, then validate XML against some schema using several popular APIs, and finally cover some of the more powerful elements of the XML Schema language. But first, a short detour.

A detour via the W3C

XML, the XML Schema language, XML Namespaces, and a whole range of other standards (such as Cascading Style Sheets (CSS), HTML and XHTML, SOAP, and pretty much any standard that starts with an X) are defined by the World Wide Web Consortium, otherwise known as the W3C. A document only is XML if it conforms to the XML Recommendation issued by the W3C.

Various experts and interested parties gather under the umbrella of the W3C and, after much deliberation, issue a recommendation. Companies, individuals, or foundations such as Apache, will then write implementations of those recommendations.

This article's documents are a combination of these three recommendations:

  • XML 1.0
  • XML Namespaces
  • XML Schema

XML 1.0 or 1.1

XML exists in two versions: 1.0 defined in 1998 and 1.1 defined in 2004. XML 1.1 adds very little to 1.0: support for defining elements and attributes in languages such as Mongolian or Burmese, support for IBM mainframe end-of-line characters, and almost nothing else. For the vast majority of applications, these changes are not needed. Plus, a document declared as XML 1.1 will be rejected by a 1.0 parser. So stick with 1.0.

Well-formed and valid XML

For an application to accept an XML document, it must be both well formed and valid. These terms are defined in the XML 1.0 Recommendation, with XML Schema extending the meaning of valid.

To be well formed, an XML document must follow these rules:

  • The document must have exactly one root element.
  • Every element is either self closing (like <this />) or has a closing tag.
  • Elements are nested properly (i.e., <this><and></this></and> is not allowed).
  • The document has no angle brackets that are not part of tags. Characters <, >, and & outside of tags are replaced by &lt;, &gt;, and &amp;.
  • Attribute values are quoted.

For the full formal details, see Resources.

When producing XML, remember to escape text fields that might contain special characters such as &. This is a common oversight.

A document that is not well formed is not really XML and doesn't conform to the W3C's stipulations for an XML document. A parser will fail when given that document, even if validation is turned off.

To be valid, a document must be well formed, it must have an associated DTD or schema, and it must comply with that DTD or schema. Ensuring a document is well formed is easy. In this article, we focus on ensuring our documents are valid.

Let's get right down to it. First, we're going to need an XML file to validate.

The XML document

Let's assume we have a client (say a terminal in a shop) that posts an XML order back to a server. The XML might look like this:

 <?xml version="1.0" encoding="UTF-8"?>
<order>
    <user>
        <fullname>Bob Jones</fullname>
        <deliveryAddress>
            123 This road,
            That town,
            Bobsville
        </deliveryAddress>
    </user>
    <products>
        <product id="12345" quantity="1" />
        <product id="3232" quantity="3" />
    </products>
</order>



Save this document somewhere. We will use it later in this article to try validation and interesting schema rules later.

The first line <?xml version="1.0"?> is the prologue. It is optional in XML 1.0 and compulsory in XML 1.1. If it is absent, parsers assume we're using XML 1.0—but we like to be thorough.

The schema

For the server to validate our XML, we need a schema:

 <?xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
            elementFormDefault="qualified"
            xmlns="urn:nonstandard:test" 
            targetNamespace="urn:nonstandard:test">

<xsd:element name="order" type="Order" /> <xsd:complexType name="Order"> <xsd:all> <xsd:element name="user" type="User" minOccurs="1" maxOccurs="1" /> <xsd:element name="products" type="Products" minOccurs="1" maxOccurs="1" /> </xsd:all> </xsd:complexType>

<xsd:complexType name="User"> <xsd:all> <xsd:element name="deliveryAddress" type="xsd:string" /> <xsd:element name="fullname"> <xsd:simpleType> <xsd:restriction base="xsd:string"> <xsd:maxLength value="30" /> </xsd:restriction> </xsd:simpleType> </xsd:element> </xsd:all> </xsd:complexType>

<xsd:complexType name="Products">

<xsd:sequence> <xsd:element name="product" type="Product" minOccurs="1" maxOccurs="unbounded" /> </xsd:sequence> </xsd:complexType>

<xsd:complexType name="Product"> <xsd:attribute name="id" type="xsd:long" use="required" /> <xsd:attribute name="quantity" type="xsd:positiveInteger" use="required" /> </xsd:complexType>

</xsd:schema>


Save this schema as test.xsd in the same directory as the XML document. And, for the moment, ignore the root node's attributes and the fact that everything is prefixed with xsd.

The first entry after the root schema element is:

 <xsd:element name="order" type="Order" />  



This says our document will have an element called order of type Order. This element is a global declaration (with scope like a global variable). In fact, it is our only global element, so it will be the root element of any document that conforms to this schema.

  • Print
  • Feedback

Resources