Java XML and JSON: Document processing for Java SE, Part 1: SAXON and Jackson

Transforming and converting XML and JSON documents with SAXON and Jackson

transformation / conversion / data cubes shift from one color to another
Guirong Hao / Valery Brozhinsky / Getty Images

XML and JSON are important to me, and I'm grateful to Apress for letting me write an entire book about them. In this article I will briefly introduce the second edition of my new book, Java XML and JSON. I'll also present two useful demos that I would have liked to include in the book if I'd had space for them.

First, I'll show you how to override Xalan, which is the standard XSLT implementation for Java 11, with an XSLT 2.0+ and XPath 2.0+-compatible alternative, in this case SAXON. Using SAXON for XSLT/XPath makes it much easier to access features such as grouping, which I'll also demonstrate. Next, I'll show you two ways to convert XML to JSON with Jackson: the first technique is data binding, the second is tree traversal.

Why XML and JSON?

Before XML arrived, I wrote software to import data stored in an undocumented binary format. I used a debugger to identify data field types, file offsets, and lengths. When XML came along, and then JSON, the technology greatly simplified my life.

The first edition of Java XML and JSON (June 2016) introduces XML and JSON, explores Java SE's own XML-oriented APIs, and explores external JSON-oriented APIs for Java SE. The second edition, recently published by Apress, offers new content, and (hopefully) answers more questions about XML, JSON, Java SE's XML APIs, and various JSON APIs, including JSON-P. It's also updated for Java SE 11.

After writing the book I wrote two additional sections introducing useful features of SAXON and Jackson, respectively. I'll present those sections in this article. First, I'll take a minute to introduce the book and its contents.

Java XML and JSON, second edition

Ideally, you should read the second edition of Java XML and JSON before studying the additional content in this article. Even if you haven't read the book yet, you should know what it covers, because that information puts the additional sections in context.

The second edition of Java XML and JSON is organized into three parts, consisting of 12 chapters and an appendix:

  • Part 1: Exploring XML
    • Chapter 1: Introducing XML
    • Chapter 2: Parsing XML Documents with SAX
    • Chapter 3: Parsing and Creating XML Documents with DOM
    • Chapter 4: Parsing and Creating XML Documents with StAX
    • Chapter 5: Selecting Nodes with XPath
    • Chapter 6: Transforming XML Documents with XSLT
  • Part 2: Exploring JSON
    • Chapter 7: Introducing JSON
    • Chapter 8: Parsing and Creating JSON Objects with mJson
    • Chapter 9: Parsing and Creating JSON Objects with Gson
    • Chapter 10: Extracting JSON Values with JsonPath
    • Chapter 11: Processing JSON with Jackson
    • Chapter 12: Processing JSON with JSON-P
  • Part 3: Appendices
    • Appendix A: Answers to Exercises

Part 1 focuses on XML. Chapter 1 defines key terminology, presents XML language features (XML declaration, elements and attributes, character references and CDATA sections, namespaces, and comments and processing instructions), and covers XML document validation (via Document Type Definitions and schemas). The remaining five chapters explore Java SE's SAX, DOM, StAX, XPath, and XSLT APIs.

Part 2 focuses on JSON. Chapter 7 defines key terminology, tours JSON syntax, demonstrates JSON in a JavaScript context (because Java SE has yet to officially support JSON), and shows how to validate JSON objects (via the JSON Schema Validator online tool). The remaining five chapters explore the third-party mJSon, Gson, JsonPath, and Jackson APIs; and Oracle's Java EE-oriented JSON-P API, which is also unofficially available for use in a Java SE context.

Each chapter ends with a set of exercises, including programming exercises, which are designed to reinforce the reader's understanding of the material. Answers are revealed in the book's appendix.

The new edition differs from its predecessor in some significant ways:

  • Chapter 2 shows the proper way to obtain an XML reader. The previous edition's approach is deprecated.
  • Chapter 3 also introduces the DOM's Load and Save, Range, and Traversal APIs.
  • Chapter 6 shows how to work with SAXON to move beyond XSLT/XPath 1.0.
  • Chapter 11 is a new (lengthy) chapter that explores Jackson.
  • Chapter 12 is a new (lengthy) chapter that explores JSON-P.

This edition also corrects minor errors in the previous edition's content, updates various figures, and adds numerous new exercises.

While I didn't have room for it in the second edition, a future edition of Java XML and JSON may cover YAML.

Addendum to Chapter 6: Transforming XML documents with XSLT

Move beyond XSLT/XPath 1.0 with SAXON

Java 11's XSLT implementation is based on the Apache Xalan Project, which supports XSLT 1.0 and XPath 1.0 but is limited to these early versions. To access the later XSLT 2.0+ and XPath 2.0+ features, you need to override the Xalan implementation with an alternative such as SAXON.

Java XML and JSON, Chapter 6 shows how to override Xalan with SAXON, then verify that SAXON is being used. In the demo, I recommend inserting the following line at the beginning of an application's main() method, in order to use SAXON:

System.setProperty("javax.xml.transform.TransformerFactory",
                   "net.sf.saxon.TransformerFactoryImpl");

You don't actually need this method call because SAXON's TransformerFactory implementation is provided in a JAR file as a service that's loaded automatically when the JAR file is accessible via the classpath. However, if there were multiple TransformerFactory implementation JAR files on the classpath, and if the Java runtime chose a non-SAXON service as the transformer implementation, there could be a problem. Including the aforementioned method call would override that choice with SAXON.

XSLT/XPath features: A demo

Chapter 6 presents two XSLTDemo applications, and a third application is available in the book's code archive. Listing 1, below, presents a fourth XSLTDemo demo application that highlights XSLT/XPath features.

Listing 1. XSLTDemo.java

import java.io.FileReader;
import java.io.IOException;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.FactoryConfigurationError;
import javax.xml.parsers.ParserConfigurationException;

import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;

import javax.xml.transform.dom.DOMSource;

import javax.xml.transform.stream.StreamResult;
import javax.xml.transform.stream.StreamSource;

import org.w3c.dom.Document;

import org.xml.sax.SAXException;

import static java.lang.System.*;

public class XSLTDemo
{
   public static void main(String[] args)
   {
      if (args.length != 2)
      {
         err.println("usage: java XSLTDemo xmlfile xslfile");
         return;
      }

      try
      {
         DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
         DocumentBuilder db = dbf.newDocumentBuilder();
         Document doc = db.parse(args[0]);
         TransformerFactory tf = TransformerFactory.newInstance();
         out.printf("TransformerFactory: %s%n", tf);
         FileReader fr = new FileReader(args[1]);
         StreamSource ssStyleSheet = new StreamSource(fr);
         Transformer t = tf.newTransformer(ssStyleSheet);
         Source source = new DOMSource(doc);
         Result result = new StreamResult(out);
         t.transform(source, result);
      }
      catch (IOException ioe)
      {
         err.printf("IOE: %s%n", ioe.toString());
      }
      catch (FactoryConfigurationError fce)
      {
         err.printf("FCE: %s%n", fce.toString());
      }
      catch (ParserConfigurationException pce)
      {
         err.printf("PCE: %s%n", pce.toString());
      }
      catch (SAXException saxe)
      {
         err.printf("SAXE: %s%n", saxe.toString());
      }
      catch (TransformerConfigurationException tce)
      {
         err.printf("TCE: %s%n", tce.toString());
      }
      catch (TransformerException te)
      {
         err.printf("TE: %s%n", te.toString());
      }
      catch (TransformerFactoryConfigurationError tfce)
      {
         err.printf("TFCE: %s%n", tfce.toString());
      }
   }
}

The code in Listing 1 is similar to Chapter 6's Listing 6-2, but there are some differences. First, Listing 1's main() method must be called with two command-line arguments: the first argument names the XML file; the second argument names the XSL file.

The second difference is that I don't set any output properties on the transformer. Specifically, I don't specify the output method or whether indentation is used. These tasks can be accomplished in the XSL file.

Compile Listing 1 as follows:

javac XSLTDemo.java

XSLT 2.0 example: Grouping nodes

XSLT 1.0 doesn't offer built-in support for grouping nodes. For example, you might want to transform the following XML document, which lists books with their authors:

<book title="Book 1">
  <author name="Author 1" />
  <author name="Author 2" />
</book>
<book title="Book 2">
  <author name="Author 1" />
</book>
<book title="Book 3">
  <author name="Author 2" />
  <author name="Author 3" />
</book>

into the following XML, which lists authors with their books:

<author name="Author 1">
  <book title="Book 1" />
  <book title="Book 2" />
</author>
<author name="Author 2">
  <book title="Book 1" />
  <book title="Book 3" />
</author>
<author name="Author 3">
  <book title="Book 3" />
</author>

While this transformation is possible in XSLT 1.0, it's awkward. XSLT 2.0's xsl:for-each-group element, by contrast, lets you take a set of nodes, group it by some criterion, and process each created group.

Let's explore this capability, starting with an XML document to process. Listing 2 presents the contents of a books.xml file that groups author names by book title.

Listing 2. books.xml (grouping by book title)

<?xml version="1.0"?>
<books>
   <book title="Securing Office 365: Masterminding MDM and Compliance in the Cloud">
     <author name="Matthew Katzer"/>
     <publisher name="Apress" isbn="978-1484242292" pubyear="2019"/>
   </book>
   <book title="Office 2019 For Dummies">
     <author name="Wallace Wang"/>
     <publisher name="For Dummies" isbn="978-1119513988" pubyear="2018"/>
   </book>
   <book title="Office 365: Migrating and Managing Your Business in the Cloud">
     <author name="Matthew Katzer"/>
     <author name="Don Crawford"/>
     <publisher name="Apress" isbn="978-1430265269" pubyear="2014"/>
   </book>
</books>

Listing 3 presents the contents of a books.xsl file that provides the XSL transformation to turn this document into one that groups book titles according to author names.

Listing 3. books.xsl (grouping by author name)

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="2.0">
  <xsl:output method="html" indent="yes"/>
  <xsl:template match="/books">
<html>
<head>
</head>
<body>
      <xsl:for-each-group select="book/author" group-by="@name">
      <xsl:sort select="@name"/>
<author name="{@name}">
          <xsl:for-each select="current-group()">
          <xsl:sort select="../@title"/>
<book title="{../@title}" />
          </xsl:for-each>
</author>
      </xsl:for-each-group>
</body>
</html>
  </xsl:template>
</xsl:stylesheet>

The xsl:output element indicates that indented HTML output is required. The xsl:template-match element matches the single books root element.

The xsl:for-each-group element selects a sequence of nodes and organizes them into groups. The select attribute is an XPath expression that identifies the elements to group. Here, it's told to select all author elements that belong to book elements. The group-by attribute groups together all elements having the same value for a grouping key, which happens to be the @name attribute of the author element. In essence, you end up with the following groups:

Group 1

Matthew Katzer
Matthew Katzer

Group 2

Wallace Wang

Group 3

Don Crawford

These groups are not in alphabetical order of author names, and so author elements will be output such that Matthew Katzer is first and Don Crawford is last. The xsl:sort select="@name" element ensures that author elements are output in sorted order.

The <author name="{@name}"> construct outputs an <author> tag whose name attribute is assigned only the first author name in the group.

1 2 Page 1
Page 1 of 2