New options for Java reporting

SpreadsheetML and WordprocessingML ease the creation of Word and Excel documents

Integrating Excel and Word documents in Java applications has always been difficult. One option for reading and writing Word and Excel documents is to use an OLE 2 (Object Linking and Embedding) compound document reader like Jakarta POI (Poor Obfuscation Implementation) from the Apache Jakarta Project. POI does not yet have a mature direct API for Word access. Also, POI's underlying structure and architecture is complex and difficult to understand.

Another option for writing an Excel file is to write the output in CSV (comma-separated value) format. You can also read a CSV file from a Java application. The problem with CSV is that it does not allow you to embed formulas and other formatting into your data. When reading data from a CSV file, you have the same limitations as when writing it, with the added hassle that the user must save her files in that format.

Another option is to use an appropriate JDBC (Java Database Connectivity) driver like the one offered by Vista Portal. Many commercially available libraries are also available for reading and writing Excel documents. For reading and writing Word documents, the story is different. The main choices are POI or libraries like JavaBean word processing that are not pure Java solutions. Thankfully, with the new XML formats in Office 2003, reading and writing Excel and Word documents just became much easier.

SpreadsheetML and WordprocessingML are two new formats available in Office 2003. SpreadsheetML is an XML format for saving Excel documents, and WordprocessingML saves Word documents. Microsoft published and made public the schemas for these two formats (see Resources). To save a document as SpreadsheetML and WordprocessingML, all you have to do is select .xml format in the Save As dialog of Word and Excel. Windows recognizes SpreadsheetML and WordprocessingML as Excel or Word documents. So, if you open a SpreadsheetML or WordprocessingML document, Windows will open it with Excel or Word.

In this article, I introduce the basic tags of SpreadsheetML and WordprocessingML. I also provide examples of how to use these elements in a Java application. The article's first section introduces WordprocessingML and its basic elements. The second section presents SpreadsheetML and examples of how to use it. The third section recommends a few applications for WordprocessingML and SpreadsheetML and also presents some final thoughts.

Note: You can download this article's complete source from Resources.


WordprocessingML is an XML format for reading and writing Word documents. To keep with tradition, let's start with a Hello World example:

<?xml version="1.0"?>
<?mso-application progid="Word.Document"?>
            <w:t>Hello JavaWorld.</w:t> 
<w:t>This is a great reporting tool.</w:t>

Figure 1 shows the output of the XML code in Word.

Figure 1. Result of WordprocessingML code

The code above presents a Word document with the statement "Hello JavaWorld. This is a great reporting tool." Let's analyze the code's important elements:

  • <?mso-application progid="Word.Document"?> is a processing instruction that tells Windows to treat this XML file as a Word document. You must put it in the document for it to be recognized by Word and Windows. The text-related elements for WordprocessingML have the prefix w.
  • <wordDocument/>defines a Word document. You can see that w's namespace is defined here.
  • <body> is a container for text in the document.
  • <p> defines a paragraph.
  • <r> defines a run of other elements. Inside this tag, you should place tags like <t>, <br>, and so on. All elements inside a <r> element can have a common set of properties.
  • <t> is the element that encloses actual text.
  • <br> inserts breaks inside paragraphs. The <br> element has the Attribute type. The type can be page, column, and text-wrapping.

Those are some of WordprocessingML's basic text and text-structure elements. For more information on them, please refer to the appropriate schemas and manuals in Resources.

WordprocessingML also has numerous formatting elements. Usually, the text and formatting elements have a property element that you put inside them. A property element's name is the name of the element for which it is a property plus Pr. So, for example, <p> has a <pPr> property element. Inside a property element, you place <style> elements. In the style element, you define font, font size, bold, italic, underline, etc., for the property. Before you use a style, you must define one. Bellow is an example of how to define a paragraph style called JavaStyle:

    <w:style w:type="paragraph" w:styleId="JavaStyle" w:default="on"/>

Bellow is an example of how to use JavaStyle for the properties of a paragraph:

            <w:pStyle w:val="JavaStyle"/>
            <w:t>Applying styles.</w:t>

So, to use a style, you must create it inside the <styles> element. The <styles> element is outside the <body> element but inside the <wordDocument> element. Then, reference a style through its ID in a property element.

Creating a Java program to output or read WordprocessingML is not difficult. Java has great XML support. I discuss an example of how to create a SpreadsheetML document with a Java program in the next section.


SpreadsheetML is an XML format for reading and writing Excel documents; it's similar to WordprocessingML. Let's look at a quick example:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
   <Worksheet ss:Name="TestSheet">
<ss:Row ss:Index="1">
       <ss:Data ss:Type="Number">121</ss:Data>
<ss:Row ss:Index="2">
       <ss:Data ss:Type="String">121 is 11*11</ss:Data>

Figure 2 shows the example's output.

Figure 2. Result from SpreadsheetML code

As we can see, the SpreadsheetML code from the example generated an Excel document with 121 in cell A1 and 121 is 11*11 in cell A2. Now I go into detail about this document's elements:

  • <?mso-application progid="Excel.Sheet"?> is a processing instruction that tells Windows and Excel that this is a SpreadsheetML document and therefore Excel should handle it.
  • <Workbook> defines an Excel workbook. You must define the default namespace and the namespaces for the prefixes x and ss in this element.
  • <Worksheet> defines an Excel worksheet. With the Name attribute, you can name the worksheet. If there is no value for name, Excel will give the sheet a default name. Also, in this tag, all the attributes must use the prefix ss.
  • <ss:Table> defines a set of rows. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Row> defines an Excel row. In this element, you can use the Index attribute to specify what row you want to manipulate. If you don't use Index, Excel will treat a row element as the row indexed according to the order that Excel found it in the document. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Cell> defines an Excel cell. In this element, you can use the Index attribute, which works just like a row element's Index attribute. Another interesting attribute for this element is Formula, which you can use to specify a formula for the cell. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Data> holds the data for a cell. This element's main attribute is Type. With Type, you can specify the type of data the cell holds. The options for Type are String, Number, Boolean, DateTime, and Error. Make sure to use the prefix ss for this element and all its attributes.

Just like WordprocessingML, SpreadsheetML uses styles to format data. But many element properties, like height and width, can be manipulated with attributes as well as styles. The styles work similarly to the WordprocessingML styles. You define a style inside a <styles> element. Then you reference that style as an attribute of an element. Below is an example of how to define a style. For more information on how to create styles, consult Microsoft's documentation in Resources.

   <ss:Style ss:ID="JavaStyle">
       <Font ss:Size="12" ss:Bold="1"/>

The style JavaStyle will make an element's font bold and set the font's size to 12. Below is an example of how to use this style:

<ss:Cell ss:StyleID="JavaStyle">
    <ss:Data ss:Type="String">121 is 11*11</ss:Data>

Now that we understand SpreadsheetML's fundamentals, we can do a coding example. In the example I use dom4j as my XML library. So, if you want to use this example, make sure to have the appropriate jar files from dom4j in your classpath. The class JavaExcel writes the SpreadsheetML document from this example:

import org.dom4j.*;
import java.util.HashMap;
public class JavaExcel {
    private File outputFile = null;
    /** Creates a new instance of JavaExcel */
    public JavaExcel() {
    //This function writes the SpreadsheetML document to a file
    public void write(String file) throws Exception {
        outputFile = new File(file);
        Document doc = DocumentHelper.createDocument();
        HashMap pmap = new HashMap();
        FileWriter out = new FileWriter(outputFile);
    //This function creates the root of the SpreadsheetML and
    //populates it with necesary elements.
    private Element createElements() throws Exception {
        //Create all Elements
        Element workbook = this.createWorkbook();
        Element worksheet = this.createWorksheet("TestSheet");
        Element table = this.createTable();
        Element row1 = this.createRow("1");
        Element cellA1 = this.createCell();
        Element dataA1 = this.createData("Number","121");
        Element row2 = this.createRow("2");
        Element cellA2 = this.createCell();
        Element dataA2 = this.createData("String","121 is 11*11");
        return workbook;
    private Element createWorksheet(String name) {
        Element e = DocumentHelper.createElement("Worksheet");
        e.addAttribute("ss:Name", name);
        e.setQName(new QName("Worksheet",new Namespace(null,"urn:schemas-microsoft-com:office:spreadsheet")));
        return e;
    private Element createTable() {
        return DocumentHelper.createElement("ss:Table");
    private Element createRow(String index) {
        Element e = DocumentHelper.createElement("ss:Row");
        return e;
    private Element createCell() {
        return DocumentHelper.createElement("ss:Cell");
    private Element createData(String type, String data) {
        Element e = DocumentHelper.createElement("ss:Data");
        return e;
    private Element createWorkbook() throws Exception {
        Element root =  DocumentHelper.createElement("Workbook");
        //Set up the necessary namespaces
        root.add(new Namespace("x","urn:schemas-microsoft-com:office:excel"));
        root.add(new Namespace("ss","urn:schemas-microsoft-com:office:spreadsheet"));
        root.setQName(new QName("Workbook",new Namespace(null,"urn:schemas-microsoft-com:office:spreadsheet")));
        return root;
    public static void main(String[] args) {
           JavaExcel j = new JavaExcel();
       } catch(Exception e) {e.printStackTrace();}
1 2 Page 1
Page 1 of 2