Book excerpt: Converting XML to spreadsheet, and vice versa

Use Jakarta POI to generate Excel spreadsheets from XML documents

1 2 3 Page 2
Page 2 of 3

Converting an XML document to an Excel spreadsheet

In this section, we will show how to convert the example XML document in Listing 1 to an Excel document using the Apache POI HSSF API. Specifically, you will parse the example XML document, retrieve values from the document, and construct an Excel spreadsheet. The procedure to create a spreadsheet is as follows:

  1. Create an Excel spreadsheet workbook and an empty spreadsheet
  2. Define a cell style
  3. Set the spreadsheet column width
  4. Add a header row to the spreadsheet
  5. Parse the XML document
  6. Add statement columns to the spreadsheet
  7. Output the spreadsheet

You need to import the Apache POI HSSF package, org.apache.poi.hssf.usermodel. You can create an Excel workbook using a no-arguments constructor for HSSFWorkbook, as shown in Listing 2. You create a spreadsheet, represented with the HSSFSheet class, by using the createSheet(String sheetName) method of the HSSFWorkbook class.

Listing 2. Creating an Excel workbook and spreadsheet

HSSFWorkbook wb=new HSSFWorkbook();
HSSFSheet spreadSheet=wb.createSheet("spreadSheet");

You can represent a cell in a spreadsheet using the HSSFCell class. You set the cell style using the HSSFCellStyle class. To set the cell style in the example spreadsheet being generated, create a cell style object using the createCellStyle() method of the HSSFWorkbook class, as shown in Listing 3. The example cell style defines a cell border and is used for cells that represent totals for a column or subcolumn. You can set the border settings for an HSSFCellStyle object using the setter methods setBorderTop(short), setBorderLeft(short), setBorderBottom(short), and setBorderRight(short).

Listing 3. Setting the cell style

HSSFCellStyle cellStyle=wb.createCellStyle();
cellStyle.setBorderRight(HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderTop(HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderLeft(HSSFCellStyle.BORDER_MEDIUM);
cellStyle.setBorderBottom(HSSFCellStyle.BORDER_MEDIUM);

You can represent a border type with a short value, as shown in Listing 3. Table 1 lists some of the commonly used types of borders.

Table 1. Border types

ShortDescription
BORDER_DASH_DOTDash-dot border
BORDER_DASHEDDashed border
BORDER_DOUBLEDouble-line border
BORDER_MEDIUMMedium border
BORDER_NONENo border
BORDER_THICKThick border
BORDER_THINThin border

You can set the border color using the setter methods setBottomBorderColor(short color), setLeftBorderColor(short color), setRightBorderColor(short color), and setTopBorderColor(short color). You can represent spreadsheet color using the HSSFColor subclasses. For example, the class HSSFColor.BLUE represents the color blue. You can obtain a short value corresponding to a color using the field index. The following is an example of setting a color:

short blue= HSSFColor.BLUE.index;
cellStyle.setRightBorderColor(blue);

You can set background color and foreground color using the methods setFillBackgroundColor(short fg) and setFillForegroundColor(short bg). You can set text indentation using the setIndention(short indent) method. You can wrap cell text using the setWrapText(boolean wrapped) method. For example, you can set cell-style indentation to 4 and add text wrapping, as shown here:

cellStyle.setIndention((short)4);
cellStyle.setWrapText(true);

Further, you can add text rotation to cell text using the setRotation(short rotation) method. You specify rotation in degrees using values from -90 to +90. You can horizontally align cell text using the setAlignment(short) method. You represent cell alignment using a short value. Some of the commonly used cell alignment types are ALIGN_CENTER, ALIGN_RIGHT, ALIGN_LEFT, and ALIGN_FILL. You can set vertical alignment using the setVerticalAlignment(short align) method. Vertical alignment short values are VERTICAL_TOP, VERTICAL_CENTER, VERTICAL_BOTTOM, and VERTICAL_JUSTIFY. You define the spreadsheet font using the HSSFFont class. Listing 4 shows an example of creating an italicized font using font height 24 and font name Courier New. As shown in the listing, a font is created using the method createFont() of the HSSFWorkbook class.

Listing 4. Setting the font

HSSFFont font = wb.createFont();
   font.setFontHeightInPoints((short)24);
   font.setFontName("Courier New");
   font.setItalic(true);
   cellStyle.setFont(font);

A row in the spreadsheet created from the example XML document has cells corresponding to each of the elements in the stmt tag of the example XML document. You set the column width in a spreadsheet at column level using the HSSFSheet method setColumnWidth(short column, short width). For example, you specify the column width of the first column of a spreadsheet as shown here: spreadSheet.setColumnWidth((short)0, (short)(256*25));

A spreadsheet has a header row that specifies headers for the columns in the spreadsheet. Therefore, add a header row to the HSSFSheet class. A header row is just like any other row and is created using the createRow(int rowNumber) method, as shown in Listing 5.

You add column headers to the header row using the createCell(short) method, as shown in Listing 5. You set the cell value using the setCellValue(String) method. For example, add a column header for the Year 2005 column.

Listing 5. Adding the spreadsheet header row

HSSFRow row = spreadSheet.createRow(0);
HSSFCell cell = row.createCell((short) 0);
cell.setCellValue("Year 2005");

You can add the column header for the Year 2004 column similarly. You need to parse the example XML document using a DocumentBuilder to navigate an XML document and retrieve the values from the document. You need to create a DocumentBuilderFactory from which you will create a DocumentBuilder parser, as shown in Listing 6. Subsequently, parse the example XML document and obtain a Document object.

Listing 6. Parsing an XML document

DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.parse(xmlDocument);

You can obtain a node list that consists of stmt nodes from the Document object using the getElementsByTagName(String) method as shown in Listing 7. Each stmt node represents a column in a spreadsheet. The subelements in an stmt element represent the row values for a column. In the spreadsheet, add 11 rows corresponding to the subelements of an stmt element. For example, the following code shows how to add row 1: HSSFRow row1 = spreadSheet.createRow(1);

To construct a spreadsheet, iterate over the node list and add a column to the spreadsheet corresponding to each of the stmt nodes in the node list, as shown in Listing 7. You add a spreadsheet column using the HSSFRow object. The node list of stmt elements has two nodes corresponding to the two stmt elements in the example XML document. Using a switch statement, you'll add row labels and row values for two columns. For example, to add a row labeled Revenue, create a row label and create row cells for the two nodes in the stmt element node list, as shown in Listing 7. A column consists of cells corresponding to each of the elements in the stmt element. You create a cell using the createCell(short) method of the HSSFRow object, as shown in Listing 7. You set the cell value using the setCellValue(String) method.

Listing 7. Constructing a spreadsheet

NodeList nodeList = document.getElementsByTagName("stmt");
for (int i = 0; i < nodeList.getLength(); i++) {

switch(i){
case 0:
HSSFCell cell = row1.createCell((short) 0);
cell.setCellValue("Revenue ($)");
cell = row1.createCell((short) 1);

cell.setCellValue(((Element)
(nodeList.item(0))).
getElementsByTagName
("revenue").item(0).getFirstChild()
.getNodeValue());

break;
case 1:
HSSFCell cell = row1.createCell((short) 2);
cell.setCellValue(((Element)
(nodeList.item(1))).
getElementsByTagName("revenue").
item(0).getFirstChild().getNodeValue());
break;

}

}

The first cell in a row has index 0. Earlier in the section, you defined a cell style. The cell style is set at the cell level using the setCellStyle() method of the HSSFCell object, as shown here: cell.setCellStyle(cellStyle);

Similarly, you need to set row values for other cells in a column. HSSFSheet provides some methods to set different characteristics of a spreadsheet. Table 2 discusses some of these methods.

Table 2. HSSFSheet methods

Method nameDescription
setColumnBreak(short column)Sets a page break at the specified column
setDefaultColumnWidth(short width)

Sets the default column width, if the width is not specified at

the column level

setDefaultRowHeight(short height)

Sets the default row height, if the height is not specified

at the row level

setFitToPage(boolean b)Sets it to fit to the page
setHorizontallyCenter(boolean value)Sets the output to be horizontally centered
setMargin(short margin,double size)Sets the style sheet margin
setRowBreak(int row)Sets a page break at the specified row
setZoom(int numerator, int denominator)

Sets the zoom magnification for

the style sheet

To output the Excel workbook to an .xls file, create a FileOutputStream object, as shown in Listing 8. You can output the Excel workbook using the write(HSSFWorkbook) method, and you can close the FileOutputStream object using the close() method.

Listing 8. Outputting the Excel workbook

FileOutputStream output=new FileOutputStream(new File("IncomeStatements.xls"));
   wb.write(output);
   output.flush();
   output.close();

Listing 9 shows the Java application, XMLToExcel.java, used to convert an XML document to an Excel spreadsheet. The application consists of a method generateExcel(File) that generates an Excel spreadsheet from an XML document. In the generateExcel() method, an Excel workbook is created and a spreadsheet is added to the workbook. The cell style is added using an HSSFCellStyle object. An XML document is parsed, and the stmt element node list is iterated over to retrieve node values. A spreadsheet column is added corresponding to each of the stmt nodes in the example XML document. A column header value is set from the year element in an stmt element. A spreadsheet row is added corresponding to each of the subelements in an stmt element. A switch statement is used to set row values for a column. Subsequently, the Excel workbook is output using a FileOutputStream.

Listing 9. XMLToExcel.java

package com.apress.excel;

import org.apache.poi.hssf.usermodel.*;
import org.w3c.dom.*;
import java.io.*;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import org.xml.sax.SAXException;

public class XMLToExcel {
   public void generateExcel(File xmlDocument) {
      try {// Creating a Workbook
         HSSFWorkbook wb = new HSSFWorkbook();
         HSSFSheet spreadSheet = wb.createSheet("spreadSheet");

         spreadSheet.setColumnWidth((short) 0, (short) (256 * 25));
         spreadSheet.setColumnWidth((short) 1, (short) (256 * 25));
         // Parsing XML Document
         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
         DocumentBuilder builder = factory.newDocumentBuilder();
         Document document = builder.parse(xmlDocument);
         NodeList nodeList = document.getElementsByTagName("stmt");
         // Creating Rows
         HSSFRow row = spreadSheet.createRow(0);

         HSSFCell cell = row.createCell((short) 1);
         cell.setCellValue("Year 2005");
         cell = row.createCell((short) 2);
         cell.setCellValue("Year 2004");

         HSSFRow row1 = spreadSheet.createRow(1);
         HSSFRow row2 = spreadSheet.createRow(2);
         HSSFRow row3 = spreadSheet.createRow(3);
         HSSFRow row4 = spreadSheet.createRow(4);
         HSSFRow row5 = spreadSheet.createRow(5);
         HSSFRow row6 = spreadSheet.createRow(6);
         HSSFRow row7 = spreadSheet.createRow(7);
         HSSFRow row8 = spreadSheet.createRow(8);
         HSSFRow row9 = spreadSheet.createRow(9);
         HSSFRow row10 = spreadSheet.createRow(10);
         HSSFRow row11 = spreadSheet.createRow(11);

         for (int i = 0; i < nodeList.getLength(); i++) {
            HSSFCellStyle cellStyle = wb.createCellStyle();
            cellStyle.setBorderRight(HSSFCellStyle.BORDER_MEDIUM);
            cellStyle.setBorderTop(HSSFCellStyle.BORDER_MEDIUM);
            cellStyle.setBorderLeft(HSSFCellStyle.BORDER_MEDIUM);
            cellStyle.setBorderBottom(HSSFCellStyle.BORDER_MEDIUM);

            switch (i) {
            // Creating column1 (Row label) and column 2 (Year 2005 stmt)
            case 0:

               cell = row1.createCell((short) 0);
               cell.setCellValue("Revenue ($)");

               cell = row1.createCell((short) 1);
               cell.setCellValue(((Element) (nodeList.item(0)))
                  .getElementsByTagName("revenue").item(0)
                  .getFirstChild().getNodeValue());

            cell = row2.createCell((short) 0);
            cell.setCellValue("Cost of Revenue ($)");

            cell = row2.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("costofrevenue").item(0)
               .getFirstChild().getNodeValue());

            cell = row3.createCell((short) 0);
            cell.setCellValue("Research and Development ($)");

            cell = row3.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("researchdevelopment")
               .item(0).getFirstChild().getNodeValue());

            cell = row4.createCell((short) 0);
            cell.setCellValue("Sales and Marketing ($)");

            cell = row4.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("salesmarketing").item(0)
               .getFirstChild().getNodeValue());

            cell = row5.createCell((short) 0);
            cell.setCellValue("General and Administrative ($)");

            cell = row5.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("generaladmin").item(0)
               .getFirstChild().getNodeValue());

            cell = row6.createCell((short) 0);
            cell.setCellValue("Total Operating Expenses ($)");
            cell.setCellStyle(cellStyle);
            cell = row6.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("totaloperexpenses").item(0)
               .getFirstChild().getNodeValue());

            cell.setCellStyle(cellStyle);

            cell = row7.createCell((short) 0);
            cell.setCellValue("Operating Income ($)");

            cell = row7.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("operincome").item(0)
               .getFirstChild().getNodeValue());

            cell = row8.createCell((short) 0);
            cell.setCellValue("Investment Income ($)");

            cell = row8.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("invincome").item(0)
               .getFirstChild().getNodeValue());

            cell = row9.createCell((short) 0);
            cell.setCellValue("Income Before Taxes ($)");
            cell.setCellStyle(cellStyle);

            cell = row9.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("incbeforetaxes").item(0)
               .getFirstChild().getNodeValue());

            cell.setCellStyle(cellStyle);

            cell = row10.createCell((short) 0);
            cell.setCellValue("Taxes ($)");

            cell = row10.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("taxes").item(0)
               .getFirstChild().getNodeValue());

            cell = row11.createCell((short) 0);
            cell.setCellValue("Net Income ($)");
            cell.setCellStyle(cellStyle);

            cell = row11.createCell((short) 1);
            cell.setCellValue(((Element) (nodeList.item(0)))
               .getElementsByTagName("netincome").item(0)
               .getFirstChild().getNodeValue());

            cell.setCellStyle(cellStyle);

            break;
         // Creating column 3 (Year 2004 stmt)
         case 1:

            cell = row1.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("revenue").item(0)
               .getFirstChild().getNodeValue());

            cell = row2.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("costofrevenue").item(0)
               .getFirstChild().getNodeValue());

            cell = row3.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("researchdevelopment")
               .item(0).getFirstChild().getNodeValue());

            cell = row4.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("salesmarketing").item(0)
               .getFirstChild().getNodeValue());

            cell = row5.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("generaladmin").item(0)
               .getFirstChild().getNodeValue());

            cell = row6.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("totaloperexpenses").item(0)
               .getFirstChild().getNodeValue());

            cell.setCellStyle(cellStyle);

            cell = row7.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("operincome").item(0)
               .getFirstChild().getNodeValue());

            cell = row8.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("invincome").item(0)
               .getFirstChild().getNodeValue());

            cell = row9.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("incbeforetaxes").item(0)
               .getFirstChild().getNodeValue());

            cell.setCellStyle(cellStyle);

            cell = row10.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("taxes").item(0)
               .getFirstChild().getNodeValue());

            cell = row11.createCell((short) 2);
            cell.setCellValue(((Element) (nodeList.item(1)))
               .getElementsByTagName("netincome").item(0)
               .getFirstChild().getNodeValue());
            cell.setCellStyle(cellStyle);
            break;

         default:
            break;
         }

      }
         // Outputting to Excel spreadsheet
         FileOutputStream output = new FileOutputStream(new File("IncomeStatements.xls"));
         wb.write(output);
         output.flush();
         output.close();
      } catch (IOException e) {
         System.out.println("IOException " + e.getMessage());
      } catch (ParserConfigurationException e) {
         System.out
            .println("ParserConfigurationException " + e.getMessage());
      } catch (SAXException e) {
         System.out.println("SAXException " + e.getMessage());
      }

   }

   public static void main(String[] argv) {

      File xmlDocument = new File("incomestatements.xml");

      XMLToExcel excel = new XMLToExcel();
      excel.generateExcel(xmlDocument);
   }
}

You can run the XMLToExcel.java application in Eclipse. This generates an Excel spreadsheet. IncomeStatements.xls, the example spreadsheet generated from the example XML document, gets added to the Chapter11 project, as shown in Figure 3.

Figure 3. The Excel spreadsheet IncomeStatements.xls in the Chapter11 project

Figure 4 shows the Excel spreadsheet generated with the Apache POI HSSF API.

Figure 4. IncomeStatements.xls spreadsheet
1 2 3 Page 2
Page 2 of 3