Book excerpt: Converting XML to spreadsheet, and vice versa

Use Jakarta POI to generate Excel spreadsheets from XML documents

1 2 3 Page 3
Page 3 of 3

Converting an Excel spreadsheet to an XML document

In the previous section, you learned how to generate an Excel document from an XML document. In this section, you'll convert the Excel document to an XML document. The procedure to generate an XML document from a spreadsheet is as follows:

  1. Create an empty XML document using DocumentBuilder
  2. Add top-level stmt elements
  3. Create an HSSFSheet object from the Excel file
  4. Iterate over the spreadsheet and add subelements to the XML document
  5. Output the XML document using the Transformer API

You can use the Apache POI HSSF API to parse an Excel spreadsheet and retrieve cell values from the spreadsheet. As in the previous section, first you need to import the Apache POI package org.apache.poi.hssf.usermodel.

The root element of the XML document (Listing 1) that you will generate is incmstmts, and you'll add an stmt element corresponding to each of the columns of the Excel spreadsheet. Therefore, generate an XML document using a DocumentBuilder object as shown in Listing 10 and add the root element of the document. You can obtain the DocumentBuilder object from a DocumentBuilderFactory object, as shown in Listing 10.

Listing 10. Creating an XML document

DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document document = builder.newDocument();
Element rootElement=document.createElement("incmstmts");
document.appendChild(rootElement);

You can read the XLS spreadsheet that is to be converted to an XML document using an InputStream, as shown in Listing 11. Subsequently, obtain a workbook from the InputStream object and obtain the spreadsheet in the Excel workbook.

Listing 11. Obtaining spreadsheet

InputStream input=new FileInputStream(new File("IncomeStatements.xls") );
HSSFWorkbook workbook=new HSSFWorkbook(input);
HSSFSheet spreadsheet=workbook.getSheetAt(0);

To construct an XML document, add an stmt element for each of the columns in the spreadsheet. You also need to add an element, year, corresponding to the column header, to each of the stmt elements, as shown in Listing 12.

Listing 12. Adding stmt elements

Element stmtElement1 = document.createElement("stmt");
rootElement.appendChild(stmtElement1);
Element year1 = document.createElement("year");
stmtElement1.appendChild(year1);
year1.appendChild(document.createTextNode("2005"));

To add subelements to stmt elements, iterate over spreadsheet rows and, using a switch statement, retrieve row cell values; use these row cell values to create the subelements. Because the first row(corresponding to index 0)) is a header row, iterate from the second row. For example, to add a revenue element, retrieve the second spreadsheet row, which corresponds to index 1, using the getRow(int) method of the HSSFSheet class. You can retrieve a row cell value using the HSSFRow method getCell(short) and a row cell value using the method getStringCellValue(), as shown in Listing 13. You can obtain the number of rows in a spreadsheet using the HSSFSheet class method getLastRowNum().

Listing 13. Adding elements to the XML document

for (int i = 1; i <= spreadsheet.getLastRowNum(); i++) {
   switch (i) {
   case 1:
   HSSFRow row1 = spreadsheet.getRow(1);
   Element revenueElement1 =
   document.createElement("revenue");
   stmtElement1.appendChild(revenueElement1);
      revenueElement1.appendChild
       (document.createTextNode
       (row1.getCell((short)1).
      getStringCellValue()));
      Element revenueElement2 = document.createElement("revenue");
      stmtElement2.appendChild(revenueElement2);
      revenueElement2.appendChild
       (document.createTextNode
       (row1.getCell((short) 2).
      getStringCellValue()));

      break;
      }
}

Similarly, other cell values are retrieved from the spreadsheet and specified in the XML document. You can generate the XML document using the Transformer API. You obtain a Transformer object from a TransformerFactory object, as shown in Listing 14. You can output the XML document using the transform(DOMSource, StreamResult) method with a DOMSource object as input and a StreamResult object as output, as shown in Listing 14.

Listing 14. Outputting an XML document

TransformerFactory tFactory = TransformerFactory.newInstance();
Transformer transformer = tFactory.newTransformer();
DOMSource source = new DOMSource(document);
StreamResult result = new StreamResult(new File(System.out));
transformer.transform(source, result);

Listing 15 shows the Java application, ExcelToXML.java, used to convert an Excel spreadsheet to an XML document. The application consists of a method generateXML(File excelFile) that converts a spreadsheet to an XML document. An XML document is created using a DocumentBuilder object. A spreadsheet is parsed, and an XML document element, stmt, is added corresponding to each of the columns in the spreadsheet. Elements are added to the stmt element that corresponds to the rows in a column. The XML document is output using the Transformer API.

Listing 15. ExcelToXML.java

package com.apress.excel;

import org.apache.poi.hssf.usermodel.*;
import org.w3c.dom.*;
import java.io.*;
import javax.xml.parsers.*;
import javax.xml.transform.*;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.stream.StreamResult;
public class ExcelToXML {
   public void generateXML(File excelFile) {
      try { //Initializing the XML document
         DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
         DocumentBuilder builder = factory.newDocumentBuilder();
         Document document = builder.newDocument();
         Element rootElement = document.createElement("incmstmts");
         document.appendChild(rootElement);
            //Creating top-level elements
         Element stmtElement1 = document.createElement("stmt");
         rootElement.appendChild(stmtElement1);

         Element stmtElement2 = document.createElement("stmt");
         rootElement.appendChild(stmtElement2);
            //Adding first subelements
         Element year1 = document.createElement("year");
         stmtElement1.appendChild(year1);

         year1.appendChild(document.createTextNode("2005"));

         Element year2 = document.createElement("year");
         stmtElement2.appendChild(year2);
         year2.appendChild(document.createTextNode("2004"));
            //Creating an HSSFSpreadsheet object from an Excel file
         InputStream input = new FileInputStream(excelFile);
         HSSFWorkbook workbook = new HSSFWorkbook(input);
         HSSFSheet spreadsheet = workbook.getSheetAt(0);

         for (int i = 1; i <= spreadsheet.getLastRowNum(); i++) {
            switch (i) {
         //Iterate over spreadsheet rows to create stmt element
         //subelements.
            case 1:
               HSSFRow row1 = spreadsheet.getRow(1);

         Element revenueElement1 = document.createElement("revenue");
            stmtElement1.appendChild(revenueElement1);

            revenueElement1.appendChild
            (document.createTextNode
            (row1.getCell((short) 1).
            getStringCellValue()));

         Element revenueElement2 = document.createElement("revenue");
         stmtElement2.appendChild(revenueElement2);

            revenueElement2.appendChild
            (document.createTextNode
            (row1.getCell((short) 2).
            getStringCellValue()));

         break;
         case 2:
            HSSFRow row2 = spreadsheet.getRow(2);

            Element costofrevenue1 = document.createElement("costofrevenue");
            stmtElement1.appendChild(costofrevenue1);
            costofrevenue1.appendChild
             (document.createTextNode
             (row2.getCell((short)1).
            getStringCellValue()));

         Element costofrevenue2 = document.createElement("costofrevenue");
            stmtElement2.appendChild(costofrevenue2);

            costofrevenue2.appendChild
            (document.createTextNode
            (row2.getCell((short) 2).
            getStringCellValue()));
            break;
         case 3:
            HSSFRow row3 = spreadsheet.getRow(3);

         Element researchdevelopment1 = document.createElement("researchdevelopment");
            stmtElement1.appendChild(researchdevelopment1);

            researchdevelopment1.appendChild
            (document.createTextNode
            (row3.getCell((short) 1).getStringCellValue()));

               Element researchdevelopment2 =document.createElement("researchdevelopment");
               stmtElement2.appendChild(researchdevelopment2);

               researchdevelopment2.appendChild
              (document.createTextNode
              (row3.getCell((short) 2).
               getStringCellValue()));
               break;
            case 4:
               HSSFRow row4 = spreadsheet.getRow(4);

               Element salesmarketing1 = document.createElement("salesmarketing");
               stmtElement1.appendChild(salesmarketing1);

            salesmarketing1.appendChild(document.createTextNode(row4.getCell((short) 1).getStringCellValue()));

               Element salesmarketing2 = document.createElement("salesmarketing");
               stmtElement2.appendChild(salesmarketing2);
               salesmarketing2.appendChild(document.createTextNode(row4.getCell((short) 2).getStringCellValue()));
               break;
            case 5:
               HSSFRow row5 = spreadsheet.getRow(5);

               Element generaladmin1 = document.createElement("generaladmin");
               stmtElement1.appendChild(generaladmin1);

               generaladmin1.appendChild(document.createTextNode(row5
                  .getCell((short) 1).getStringCellValue()));

               Element generaladmin2 = document.createElement("generaladmin");
               stmtElement2.appendChild(generaladmin2);

               generaladmin2.appendChild(document.createTextNode(row5
               .getCell((short) 2).getStringCellValue()));
               break;
            case 6:
               HSSFRow row6 = spreadsheet.getRow(6);

               Element totaloperexpenses1 = document.createElement("totaloperexpenses");
               stmtElement1.appendChild(totaloperexpenses1);

               totaloperexpenses1.appendChild(document.createTextNode(row6
                  .getCell((short) 1).getStringCellValue()));

               Element totaloperexpenses2 = document.createElement("totaloperexpenses");
               stmtElement2.appendChild(totaloperexpenses2);

               totaloperexpenses2.appendChild(document.createTextNode(row6
                  .getCell((short) 2).getStringCellValue()));
               break;
            case 7:
               HSSFRow row7 = spreadsheet.getRow(7);

            Element operincome1 = document.createElement("operincome");
               stmtElement1.appendChild(operincome1);

               operincome1.appendChild(document.createTextNode(row7
                  .getCell((short) 1).getStringCellValue()));

            Element operincome2 = document.createElement("operincome");
               stmtElement2.appendChild(operincome2);

               operincome2.appendChild
                (document.createTextNode
                (row7.getCell((short) 2).
               getStringCellValue()));
               break;
            case 8:
               HSSFRow row8 = spreadsheet.getRow(8);

            Element invincome1 = document.createElement("invincome");
               stmtElement1.appendChild(invincome1);

               invincome1.appendChild
                (document.createTextNode
                (row8.getCell((short) 1).
               getStringCellValue()));

            Element invincome2 = document.createElement("invincome");
            stmtElement2.appendChild(invincome2);

               invincome2.appendChild
                (document.createTextNode
                (row8.getCell((short) 2).
               getStringCellValue()));
               break;
            case 9:
               HSSFRow row9 = spreadsheet.getRow(9);

               Element incbeforetaxes1 = document.createElement("incbeforetaxes");
               stmtElement1.appendChild(incbeforetaxes1);

               incbeforetaxes1.appendChild
               (document.createTextNode
               (row9.getCell((short) 1).
               getStringCellValue()));

               Element incbeforetaxes2 =document.createElement("incbeforetaxes");
               stmtElement2.appendChild(incbeforetaxes2);

               incbeforetaxes2.appendChild
                (document.createTextNode
                (row9.getCell((short)2).
               getStringCellValue()));
               break;
            case 10:
               HSSFRow row10 = spreadsheet.getRow(10);

               Element taxes1 = document.createElement("taxes");
               stmtElement1.appendChild(taxes1);

               taxes1.appendChild(document.createTextNode(row10.getCell(
                   (short) 1).getStringCellValue()));

               Element taxes2 = document.createElement("taxes");
               stmtElement2.appendChild(taxes2);

               taxes2.appendChild(document.createTextNode(row10.getCell(
                   (short) 2).getStringCellValue()));
               break;

            case 11:
               HSSFRow row11 = spreadsheet.getRow(11);

            Element netincome1 = document.createElement("netincome");
               stmtElement1.appendChild(netincome1);

            netincome1.appendChild(document.createTextNode(row11
               .getCell((short) 1).getStringCellValue()));

            Element netincome2 = document.createElement("netincome");
               stmtElement2.appendChild(netincome2);

               netincome2.appendChild(document.createTextNode(row11
                  .getCell((short) 2).getStringCellValue()));
               break;
            default:
               break;
            }

         }

         TransformerFactory tFactory = TransformerFactory.newInstance();

         Transformer transformer = tFactory.newTransformer();
            //Add indentation to output
         transformer.setOutputProperty
         (OutputKeys.INDENT, "yes");
         transformer.setOutputProperty(
            "{http://xml.apache.org/xslt}indent-amount", "2");

         DOMSource source = new DOMSource(document);
         StreamResult result = new StreamResult(System.out);
         transformer.transform(source, result);
      } catch (IOException e) {
         System.out.println("IOException " + e.getMessage());
      } catch (ParserConfigurationException e) {
         System.out
            .println("ParserConfigurationException " + e.getMessage());
      } catch (TransformerConfigurationException e) {
         System.out.println("TransformerConfigurationException "+ e.getMessage());
      } catch (TransformerException e) {
         System.out.println("TransformerException " + e.getMessage());
      }
   }

   public static void main(String[] argv) {
      ExcelToXML excel = new ExcelToXML();
      File input = new File("IncomeStatements.xls");
      excel.generateXML(input);
   }
}

You can run the application ExcelToXML.java in Eclipse. Listing 11-16 shows the output from the ExcelToXML.java application.

Listing 16. Output from ExcelToXML.java

<?xml version="1.0" encoding="UTF-8"?>
<incmstmts>
<stmt>
<year>2005</year>
<revenue>11837</revenue>
<costofrevenue>2239</costofrevenue>
<researchdevelopment>1591</researchdevelopment>
<salesmarketing>2689</salesmarketing>
<generaladmin>661</generaladmin>
<totaloperexpenses>7180</totaloperexpenses>
<operincome>4657</operincome>
<invincome>480</invincome>
<incbeforetaxes>5137</incbeforetaxes>
<taxes>1484</taxes>
<netincome>3653</netincome>
</stmt>
<stmt>
<year>2004</year>
<revenue>10818</revenue>
<costofrevenue>1875</costofrevenue>
<researchdevelopment>1421</researchdevelopment>
<salesmarketing>2122</salesmarketing>
<generaladmin>651</generaladmin>
<totaloperexpenses>6069</totaloperexpenses>
<operincome>4749</operincome>
<invincome>420</invincome>
<incbeforetaxes>5169</incbeforetaxes>
<taxes>1706</taxes>
<netincome>3463</netincome>
</stmt>
</incmstmts>

Summary

The Apache POI API provides a useful mechanism for converting data between XML and spreadsheets. In this article, you learned how to convert an example XML document to an Excel spreadsheet and then convert the spreadsheet to an XML document. With XML being a universal format, there really is no limit to what you can do with it!

Ajay Vohra is senior architect at DataSynapse, currently working on virtualizing business-critical applications on a shared infrastructure. He has 15 years of experience in the software industry, working on diverse projects such as making X Windows Toolkit thread-safe, developing embedded software for an ATM network switch, and architecting Java EE business-critical applications. He is the founder and a principal member of NuBean. He has an MBA from the University of Michigan, Ann Arbor and a masters of science in computer science from Southern Illinois University, Carbondale.

Deepak Vohra is an independent consultant and a principal member of the NuBean.com software company. He has worked in the area of XML and Java programming for more than 5 years and has published numerous articles on related topics in various print and online trade journals. He has a masters of science in mechanical engineering from Southern Illinois University, Carbondale and has published original research papers in the area of fluidized bed combustion. Currently, he is working on an automated, Web-based Java EE development environment for NuBean.com.

Learn more about this topic

1 2 3 Page 3
Page 3 of 3