XML APIs for databases

Blend the power of XML and databases using custom SAX and DOM APIs

1 2 Page 2
Page 2 of 2
  • The generateSAXEventForColumn() method generates events for column data. A null value for a column in a database has a different meaning from a column with an empty string. We capture the difference by not firing any events for a column that has a null value. Another choice for representing a null value in a database is to use a binary attribute like isNull. With this option, a true value will be set for null data; otherwise it will be false.

  • The getTableMarker(), getRowMarker(), and getColumnMarker() methods return reasonable defaults for table, row, and column markers. Derived classes may override these to provide custom markups.

  • The getSelectorSQLStatement() method returns a "select * from <tableName>" string. Derived classes can override it to provide a different select query string to offer database-level filtering.

The JDBCSAXUtils convenience class provides two methods for creating a JDBCInputSource: it can be done from either a property file or a Property object. There's no need to supply a long list of parameters that describe a database to an application that uses either the SAX or DOM APIs for database. The class expects the user to supply a property file that contains entries for a database URL, a user name and password to connect to the database, a JDBC driver to establish a connection, and a table name. The code below demonstrates a typical property file:

# portfolio.prop
# JDBCSAXSource property file
URL=jdbc:odbc:db1
user=jw
password=jw-passwd
table=portfolio
driver=sun.jdbc.odbc.JdbcOdbcDriver

The story so far ...

We now have a simple parser that can generate appropriate SAX events for the information in a database table. It takes care of null data and offers some marker customization. While such functionality may be sufficient for some applications, the complete solution will consider additional functionality because:

  • The parser does not incorporate relational information. That can be solved by using a XPointer/XLink to set the reference to a foreign key in a table.

  • A text column in a database may contain marked-up data. A SAX parser for databases should parse those data as well and generate appropriate SAX events. If such functionality is important for an application, it could override generateSAXEventForColumn() and parse the content of the column and generate additional SAX events.

  • In databases, a table contains an unordered list of columns; the order in which columns are stored is not important. An XML DTD, on the other hand, does not have a way to describe an unordered collection of child elements.

    We can deal with this problem in a few ways. If the task is to convert a database into another XML document, say an HTML page, the XSLT stylesheet written for that purpose can create output in the correct order. We could also override the getSelectorSQLStatement() method to supply an explicit list of columns in the correct order.

  • It is desirable to present only a selected part of a table as a document based on some query. While XML tools can do the filtering, databases are better at it. The getSelectorSQLStatement() method can be overridden to return the appropriate select query string.

  • The parser uses the getString() method on the result-set object to obtain the string representation of the value in a column. This works fine for columns with text, numbers, and so on, but it does not work well with binary data. While binary data can be represented as text, that may not be suitable for certain tasks. The parser also does not deal with user-defined types available with SQL3/JDBC 2.0.

    We can solve both problems by overriding the generateSAXEventForColumn() method and providing a suitable implementation.

Implementing the DOM API for Databases

To build a DOM tree for a database table, we could iterate over rows and columns and build nodes for a tree as we visit them. Or we could employ another library, like Sun's JAXP implementation, that builds a DOM tree from a SAX event stream. The latter approach is simpler, and it requires less coding because it reuses an existing facility. To implement the DOM API using such an approach we need just a clever reuse of the SAX parser for databases we implemented earlier.

Class JDBCDOMParser implements the DOM API for Databases:

// JDBCDOMParser.java
package dbxml.dom;
import java.io.IOException;
import org.w3c.dom.Document;
import org.xml.sax.SAXException;
import com.sun.xml.tree.XmlDocumentBuilder;
import dbxml.sax.*;
public class JDBCDOMParser {
    public static Document createDocument(JDBCInputSource inputSource) 
        throws SAXException, IOException {
        XmlDocumentBuilder documentBuilder = new XmlDocumentBuilder();
        JDBCSAXParser saxParser = new JDBCSAXParser();
        documentBuilder.setParser(saxParser);
        saxParser.parse(inputSource);
        return documentBuilder.getDocument();
    }
}

The implementation for class JDBCDOMParser is simple. It uses the XmlDocumentBuilder class provided by JAXP to build a DOM document from a SAX event stream. The JDBCDOMParser has only one method: createDocument(), which takes a JDBC data source as the argument. The method creates a JDBCSAXParser and sets it as a parser for an XmlDocumentBuilder object. It then fires the parsing and returns the resulting document from the XmlDocumentBuilder object. Internally, the XmlDocumentBuilder object responds to the SAX events generated by the JDBCSAXParser object by building a DOM document.

Using the SAX API for Databases

We have already looked at one example of using the SAX API for Databases to implement a DOM API for Databases. Here we look at another example of using the SAX API for Databases. In this section, we look at using the SAX API for Databases to integrate with XT -- an XSLT processor written in Java. With such integration, we can apply an XSLT stylesheet directly to the virtual XML documents stored in a database.

We wrap the logic of creating a SAX source for a given database source and processing it with the given XSLT stylesheet to produce an output file in class JDBCXSLProcessor, which is based on com.jclark.xsl.sax.Driver from XT. The main method takes three arguments: a database property file, an XSLT stylesheet file, and an output file.

As we'll see below, we can use this approach to generate HTML pages directly without incurring the penalty of creating an intermediate XML file. Moreover, we'll look at how we can use the integration of the SAX API for databases and XT to convert a database into a nonvirtual XML document.

(To view the source code for JDBCXSLProcessor.java, click here.)

Generating HTML pages directly from a database using an XSLT stylesheet

Here we look at a simple stylesheet that formats a highly regular XML document represented by a database table. The database table is formatted as an HTML table. The generic stylesheet createTable.xsl can be used to transform any XML document with a tablelike structure. The stylesheet uses names of markers for columns as table headers.

(To view the stylesheet createTable.xsl, click here.)

Converting a database to XML with an XSLT stylesheet

While most XML applications work with the SAX or DOM API, we may need to obtain an XML document file in some situations. For example, we need an XML document to use a tool that does not work with either API. Here, I suggest a way to convert a database into an XML file. With this approach, we write an XSLT stylesheet to do the identity transformation. Using such a stylesheet with the SAX API for Databases will yield an XML document that represents a table in the database. I have provided a stylesheet -- identity.xsl -- for the identity transformation that works with the current implementation of XT.

identity.xsl
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="@*|*">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>
</xsl:stylesheet>

Note that while an XSLT stylesheet easily performs the identity transformation, it is not very efficient as it goes through the complex general-purpose logic of applying the full stylesheet. This inefficiency can be a particular problem for database tables with large numbers of records. An alternative approach is to write an application to perform the identity transformation. Such an application would listen to SAX events and create XML elements and data in response to them, resulting in an XML document that represents the table in the database.

Using the DOM API for Databases

For most situations, the SAX API for Databases is more memory efficient than the DOM API for Databases. However, some applications need random access to the XML documents and therefore require the treelike structure that the DOM API for Databases offers.

Integrate the DOM API for Databases with the XQL processor

XML Query Language (XQL), a query language for XML documents, has a syntax similar that of to XPath patterns. Here, we integrate our DOM parser for database with GMD-IPSI's XQL Engine. With such integration, we can perform SQL-like queries on the XML document representing a database table.

As an example of integration, I provide a simple shell for querying the XML document obtained from a database table. The class JDBCXQLProcessor creates a shell-like environment, which takes the queries from the user and prints the resulting document. The method processQueries() can work with any Document object -- not just objects created by the JDBCDOMParser. It reads the System.in for the query string, performs the query, and prints the result on System.out. The main() method creates a JDBCInputSource object from its argument and uses it with JDBCDOMParser to obtain a Document object corresponding to the given database table.

(To view the code for JDBCXQLProcessor.java, click here.)

As a side note, writing a database-to-XML converter is a snap with XMLWriter and JDBCDOMParser. Just get a Document object from JDBCDOMParser and write it to the desired file with XMLWriter.write(Document).

Conclusion

In this article, we discussed using the XML APIs for database to tap the information residing in databases. With such APIs, we can avoid the costs associated with converting a database into XML documents and the cost of keeping them synchronized. We presented a Java implementation of SAX and DOM APIs for databases, which can work with any database with a JDBC driver. We then presented an integration of SAX API for Databases with XT. We illustrated the uses of such integration to create HTML pages directly from a database and to convert databases into XML documents. Finally, we integrated the DOM API for Databases with an XQL processor.

Learn more about this topic

1 2 Page 2
Page 2 of 2