New options for Java reporting

SpreadsheetML and WordprocessingML ease the creation of Word and Excel documents

Integrating Excel and Word documents in Java applications has always been difficult. One option for reading and writing Word and Excel documents is to use an OLE 2 (Object Linking and Embedding) compound document reader like Jakarta POI (Poor Obfuscation Implementation) from the Apache Jakarta Project. POI does not yet have a mature direct API for Word access. Also, POI's underlying structure and architecture is complex and difficult to understand.

Another option for writing an Excel file is to write the output in CSV (comma-separated value) format. You can also read a CSV file from a Java application. The problem with CSV is that it does not allow you to embed formulas and other formatting into your data. When reading data from a CSV file, you have the same limitations as when writing it, with the added hassle that the user must save her files in that format.

Another option is to use an appropriate JDBC (Java Database Connectivity) driver like the one offered by Vista Portal. Many commercially available libraries are also available for reading and writing Excel documents. For reading and writing Word documents, the story is different. The main choices are POI or libraries like JavaBean word processing that are not pure Java solutions. Thankfully, with the new XML formats in Office 2003, reading and writing Excel and Word documents just became much easier.

SpreadsheetML and WordprocessingML are two new formats available in Office 2003. SpreadsheetML is an XML format for saving Excel documents, and WordprocessingML saves Word documents. Microsoft published and made public the schemas for these two formats (see Resources). To save a document as SpreadsheetML and WordprocessingML, all you have to do is select .xml format in the Save As dialog of Word and Excel. Windows recognizes SpreadsheetML and WordprocessingML as Excel or Word documents. So, if you open a SpreadsheetML or WordprocessingML document, Windows will open it with Excel or Word.

In this article, I introduce the basic tags of SpreadsheetML and WordprocessingML. I also provide examples of how to use these elements in a Java application. The article's first section introduces WordprocessingML and its basic elements. The second section presents SpreadsheetML and examples of how to use it. The third section recommends a few applications for WordprocessingML and SpreadsheetML and also presents some final thoughts.

Note: You can download this article's complete source from Resources.

WordprocessingML

WordprocessingML is an XML format for reading and writing Word documents. To keep with tradition, let's start with a Hello World example:

<?xml version="1.0"?>
<?mso-application progid="Word.Document"?>
<w:wordDocument 
    xmlns:w="http://schemas.microsoft.com/office/word/2003/wordml">
<w:body>
    <w:p>
        <w:r>
            <w:t>Hello JavaWorld.</w:t> 
    <w:br/>
<w:t>This is a great reporting tool.</w:t>
        </w:r>
    </w:p>
</w:body>
</w:wordDocument>

Figure 1 shows the output of the XML code in Word.

Figure 1. Result of WordprocessingML code

The code above presents a Word document with the statement "Hello JavaWorld. This is a great reporting tool." Let's analyze the code's important elements:

  • <?mso-application progid="Word.Document"?> is a processing instruction that tells Windows to treat this XML file as a Word document. You must put it in the document for it to be recognized by Word and Windows. The text-related elements for WordprocessingML have the prefix w.
  • <wordDocument/>defines a Word document. You can see that w's namespace is defined here.
  • <body> is a container for text in the document.
  • <p> defines a paragraph.
  • <r> defines a run of other elements. Inside this tag, you should place tags like <t>, <br>, and so on. All elements inside a <r> element can have a common set of properties.
  • <t> is the element that encloses actual text.
  • <br> inserts breaks inside paragraphs. The <br> element has the Attribute type. The type can be page, column, and text-wrapping.

Those are some of WordprocessingML's basic text and text-structure elements. For more information on them, please refer to the appropriate schemas and manuals in Resources.

WordprocessingML also has numerous formatting elements. Usually, the text and formatting elements have a property element that you put inside them. A property element's name is the name of the element for which it is a property plus Pr. So, for example, <p> has a <pPr> property element. Inside a property element, you place <style> elements. In the style element, you define font, font size, bold, italic, underline, etc., for the property. Before you use a style, you must define one. Bellow is an example of how to define a paragraph style called JavaStyle:

<w:styles>
    <w:style w:type="paragraph" w:styleId="JavaStyle" w:default="on"/>
</w:styles>

Bellow is an example of how to use JavaStyle for the properties of a paragraph:

    <w:p>
        <w:pPr>
            <w:pStyle w:val="JavaStyle"/>
        </w:pPr>
        <w:r>
            <w:t>Applying styles.</w:t>
        </w:r>
    </w:p>

So, to use a style, you must create it inside the <styles> element. The <styles> element is outside the <body> element but inside the <wordDocument> element. Then, reference a style through its ID in a property element.

Creating a Java program to output or read WordprocessingML is not difficult. Java has great XML support. I discuss an example of how to create a SpreadsheetML document with a Java program in the next section.

SpreadsheetML

SpreadsheetML is an XML format for reading and writing Excel documents; it's similar to WordprocessingML. Let's look at a quick example:

<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    xmlns:x="urn:schemas-microsoft-com:office:excel"
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
   <Worksheet ss:Name="TestSheet">
      <ss:Table>
<ss:Row ss:Index="1">
   <ss:Cell>
       <ss:Data ss:Type="Number">121</ss:Data>
   </ss:Cell>
</ss:Row>
<ss:Row ss:Index="2">
   <ss:Cell>
       <ss:Data ss:Type="String">121 is 11*11</ss:Data>
   </ss:Cell>
</ss:Row>
       </ss:Table>
   </Worksheet>
</Workbook>

Figure 2 shows the example's output.

Figure 2. Result from SpreadsheetML code

As we can see, the SpreadsheetML code from the example generated an Excel document with 121 in cell A1 and 121 is 11*11 in cell A2. Now I go into detail about this document's elements:

  • <?mso-application progid="Excel.Sheet"?> is a processing instruction that tells Windows and Excel that this is a SpreadsheetML document and therefore Excel should handle it.
  • <Workbook> defines an Excel workbook. You must define the default namespace and the namespaces for the prefixes x and ss in this element.
  • <Worksheet> defines an Excel worksheet. With the Name attribute, you can name the worksheet. If there is no value for name, Excel will give the sheet a default name. Also, in this tag, all the attributes must use the prefix ss.
  • <ss:Table> defines a set of rows. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Row> defines an Excel row. In this element, you can use the Index attribute to specify what row you want to manipulate. If you don't use Index, Excel will treat a row element as the row indexed according to the order that Excel found it in the document. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Cell> defines an Excel cell. In this element, you can use the Index attribute, which works just like a row element's Index attribute. Another interesting attribute for this element is Formula, which you can use to specify a formula for the cell. Make sure to use the prefix ss for this element and all its attributes.
  • <ss:Data> holds the data for a cell. This element's main attribute is Type. With Type, you can specify the type of data the cell holds. The options for Type are String, Number, Boolean, DateTime, and Error. Make sure to use the prefix ss for this element and all its attributes.

Just like WordprocessingML, SpreadsheetML uses styles to format data. But many element properties, like height and width, can be manipulated with attributes as well as styles. The styles work similarly to the WordprocessingML styles. You define a style inside a <styles> element. Then you reference that style as an attribute of an element. Below is an example of how to define a style. For more information on how to create styles, consult Microsoft's documentation in Resources.

<ss:Styles>
   <ss:Style ss:ID="JavaStyle">
       <Font ss:Size="12" ss:Bold="1"/>
   </ss:Style>
</ss:Styles> 

The style JavaStyle will make an element's font bold and set the font's size to 12. Below is an example of how to use this style:

<ss:Cell ss:StyleID="JavaStyle">
    <ss:Data ss:Type="String">121 is 11*11</ss:Data>
</ss:Cell>

Now that we understand SpreadsheetML's fundamentals, we can do a coding example. In the example I use dom4j as my XML library. So, if you want to use this example, make sure to have the appropriate jar files from dom4j in your classpath. The class JavaExcel writes the SpreadsheetML document from this example:

import java.io.*;
import org.dom4j.*;
import java.util.HashMap;
public class JavaExcel {
    
    private File outputFile = null;
    
    /** Creates a new instance of JavaExcel */
    public JavaExcel() {
    }
    
    //This function writes the SpreadsheetML document to a file
    public void write(String file) throws Exception {
        outputFile = new File(file);
        Document doc = DocumentHelper.createDocument();
        HashMap pmap = new HashMap();
        pmap.put("progid","Excel.Sheet");
        doc.addProcessingInstruction("mso-application",pmap);
        doc.add(this.createElements());
        FileWriter out = new FileWriter(outputFile);
        doc.write(out);
        out.flush();
        out.close();
    }
    
    //This function creates the root of the SpreadsheetML and
    //populates it with necesary elements.
    private Element createElements() throws Exception {
        //Create all Elements
        Element workbook = this.createWorkbook();
        Element worksheet = this.createWorksheet("TestSheet");
        workbook.add(worksheet);
        Element table = this.createTable();
        worksheet.add(table);
        Element row1 = this.createRow("1");
        table.add(row1);
        Element cellA1 = this.createCell();
        row1.add(cellA1);
        Element dataA1 = this.createData("Number","121");
        cellA1.add(dataA1);
        Element row2 = this.createRow("2");
        table.add(row2);
        Element cellA2 = this.createCell();
        row2.add(cellA2);
        Element dataA2 = this.createData("String","121 is 11*11");
        cellA2.add(dataA2);
        return workbook;
    }
    
    private Element createWorksheet(String name) {
        Element e = DocumentHelper.createElement("Worksheet");
        e.addAttribute("ss:Name", name);
        e.setQName(new QName("Worksheet",new Namespace(null,"urn:schemas-microsoft-com:office:spreadsheet")));
        return e;
    }
    
    private Element createTable() {
        return DocumentHelper.createElement("ss:Table");
    }
    
    private Element createRow(String index) {
        Element e = DocumentHelper.createElement("ss:Row");
        e.addAttribute("ss:Index",index);
        return e;
    }
    
    private Element createCell() {
        return DocumentHelper.createElement("ss:Cell");
    }
    
    private Element createData(String type, String data) {
        Element e = DocumentHelper.createElement("ss:Data");
        e.addAttribute("ss:Type",type);
        e.setText(data);
        return e;
    }
    
    private Element createWorkbook() throws Exception {
        Element root =  DocumentHelper.createElement("Workbook");
        //Set up the necessary namespaces
        root.add(new Namespace("x","urn:schemas-microsoft-com:office:excel"));
        root.add(new Namespace("ss","urn:schemas-microsoft-com:office:spreadsheet"));
        root.setQName(new QName("Workbook",new Namespace(null,"urn:schemas-microsoft-com:office:spreadsheet")));
        return root;
    }
    
    public static void main(String[] args) {
       try{
           JavaExcel j = new JavaExcel();
           j.write("C:\\JavaExcel.xml");
       } catch(Exception e) {e.printStackTrace();}
    }
    
}

The method write() gets the root element for the document, inserts the appropriate processing instructions, and writes the XML document to a file. main() just initiates a new JavaExcel object and calls write() with a filename. getElements() does most of the work for this class. It creates the workbook, worksheet, table, rows, cells, and data elements. Then it adds each element to its appropriate parent. The methods getWorkbook(), getWorksheet(), getTable(), getRow(), getCell(), and getData() prepare an appropriate element and return it.

This example just populates two cells. Yet, it could do much more. It could add formulas to cells, populate cells from a database, change cell formatting, and so on. An improvement to this class would use dom4j's prefix and namespace functionality to avoid manually adding the prefix to the element name and its attributes. One more note: This example does not use the schemas provided by Microsoft. Using a schema makes the process of reading and writing XML files robust and safer. I didn't use the schemas for simplicity's sake. For more information on schemas and its applications see Resources.

Conclusion

Reading and writing Word and Excel documents with WordprocessingML and SpreadsheetML from a Java program is easy. If you use the schemas for these formats, you can easily detect errors in the documents you are creating and reading. Microsoft also provides a similar format for Visio documents called DatadiagramML. With this format, you can create and manipulate Visio documents with Java programs in the same way you create and manipulate Excel documents with SpreadsheetML.

There are numerous applications of WordprocessingML and SpreadsheetML in Java programs. The most obvious application is to create complex and professional reports in Word or Excel format. Another application is to read Word and Excel documents to populate databases. On the server side, these formats are also quite useful. For example, Web applications where users can upload Word documents for publication (resumes, publications, etc.) are easier to write with WordprocessingML. A WordprocessingML document is easier to parse than a regular Word file. WordprocessingML and SpreadsheetML are two exciting new developments for Java developers.

Hugo Troche is currently a software engineer living in Auburn, Alabama. He received his undergraduate degree in computer science from Auburn University (Magna Cum Laude) in 2002 and a master's in software engineering from Auburn in 2004. He is a software consultant and contractor specializing in Web services and n-tier business systems. He has worked in more than 14 n-tier J2EE business systems projects and in six Web services projects. Currently, he is working on a project to centralize reporting for a large human resources application. He has five years of experience developing Java applications. His areas of interests are design patterns, aspect-oriented design and patterns, software architecture, software process for small groups, software size estimation techniques, and software tools.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies