Navigate data with the Mapper framework

Build your own data mapping system with an interlingual approach

Most developers, at some point, have written software to move (and/or manipulate) data from two different data sources. Usually, the software that tackles this job is custom code specific to the data entities involved and the data itself. Fueling the fire, good data mapping software is typically very expensive for organizations with tight IT budgets, especially in today's market. The Mapper framework offers a simple and inexpensive (free) way for you to read from one data entity and write to another with minimal coding and maintenance.

In this article, I first explain the system's overall design and then demonstrate how the framework operates by mapping between a file and a database table. Using this example as a template, you'll be able to add other entities as your own specific requirements dictate and map data between them as easily as editing a few XML lines.

The framework

In Chapter 8.2 of the online book Survey of the State of the Art in Human Language Technology, Martin Kay explains that one algorithm for Machine Translation (MT), which translates text from one natural language (like English) to another, works by parsing the source text into a standard semantic form using the source language's grammar rules. It then applies the target language's grammar rules to the standard form to yield the desired translation. Of course, this is an oversimplification of how MT actually works, but this straightforward process, called the interlingual approach in MT, is the basis for the Mapper framework.

In contrast to the interlingual approach, another algorithm called the transfer approach attempts to translate texts by having a separate translation module from the source language to the target language. Given n languages in a system, n(n-1) translation mechanisms must translate each language to every other language. However, a solution similar to the interlingual approach significantly reduces the complexity of translating the languages, requiring only 2n translation mechanisms. The Mapper framework applies the interlingual approach to mapping data entities in a system, thereby making the system maintainable and easily extendable. The following figure illustrates this approach.

The Mapper framework's interlingual approach

The framework's semantic representation of data, for simplicity's sake, is a HashMap called MapperRecord (of course, you can use XML as an alternative representation). In addition, the Entity interface represents each data entity:

public class MapperRecord extends java.util.HashMap {
}
public interface Entity {
   public static int READ = 0;
   public static int WRITE = 1;
   public void open() throws MapperException; //Open the entity for reading or writing
   public void close() throws MapperException; //Close all reading and writing
resources
   public MapperRecord readRecord() throws MapperException; //Read/translate
record from 
     //Data entity into a MapperRecord
   public void writeRecord(MapperRecord record) throws MapperException; //Write
MapperRecord to the data entity
}

You should apply the following rules to smoothly map between arbitrary data entities:

  1. Every Entity implementation creates a bidirectional map between the data entity it represents and a MapperRecord -- mandated by the Entity interface. It should know how to marshal (or translate) data from its data source into a MapperRecord, as well as write a MapperRecord's contents to its respective data store.
  2. The rules (or grammar) for mapping between the data entity and the MapperRecord object are placed in an XML file that the entity parses at runtime.
  3. Every Entity implementation has a two-parameter constructor: the name of the map to use and the operation to perform (Entity.READ or Entity.WRITE). All other object variables should be accessible via getter and setter methods. (I will clarify later why this rule is necessary):

    protected String fileName;
    //Constructor that creates a file entity
    public FileEntity(String entityAlias, int operation) {
       this.fileMapName = entityAlias;
       this.operation = operation;
    }
    public void setFileName(String fName) { //Sets filename
       this.fileName = fName;
    }
    public String getFileName() { //Gets filename
       return this.fileName;
    }
    

Once all the framework's entities can successfully create and store MapperRecords based on XML metadata, you can effortlessly create execution paths to map data from one to another:

Entity readEntity = new FileEntity("from_map",Entity.READ);
readEntity.setFileName("/tmp/from.txt");
Entity writeEntity = new TableEntity("table_map",Entity.WRITE);
//Open entities for reading and writing
readEntity.open();
writeEntity.open();
//For each read record, write record to write entity
MapperRecord record;
while ((record = (MapperRecord)readEntity.readRecord()) != null) {
   if (record.isEmpty()) {
      continue;
   }
   writeEntity.writeRecord(record);
}
//Close entities
writeEntity.close();
readEntity.close();

The classic case

I originally designed this framework to reliably parse and create transaction-laden text files for exchange with business affiliates. Creating a custom Perl script for each affiliate's incoming (and outgoing) file formats is an arduous task for any development team, without even considering the testing and maintenance nightmares. As an alternative to Perl scripting, this reusable and extendable application pattern reduces the time spent on the development lifecycle's latter stages.

So let's start with the classic example of reading records from a text file and writing them to a database to show how well the design works. Creating the two entities, FileEntity and TableEntity, which implement the Entity interface, is fairly simple.

Parse and create any data file

The FileEntity class parses an XML file, like the following, to load different file formats into memory (using Apache's Xerces SAX parser):

<!-- FileEntityList.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<filemaps>
  <map name="from_map" delimiter="," > <!-- comma-delimited file 
format -->
    <field name="id" />
    <field name="amount" />
    <field name="date" />
  </map>
  <map name="to_map" delimiter="|" > <!-- pipe-delimited file format 
-->
    <field name="date" />
    <field name="amount" />
    <field name="id" />
  </map>
  <map name="fixed_map"> <!--fixed-length file format -->
      <field name="id" start="1" end="2" />
      <field name="amount" start="3" end="32" />
      <field name="date" start="33" end="62" />
  </map>
</filemaps>

The from_map map describes a comma-delimited file format while the fixed_map map describes a fixed-length file format. Armed with the my_map file format, the READ operation, and a filename, FileEntity's readRecord() method can marshal the file's comma-delimited records into a MapperRecord keyed by the field names in the XML file:

//Constructor that creates a file entity
public FileEntity(String entityAlias, int operation) {
  this.fileAlias = entityAlias;
  this.operation = operation;
}
//Sets filename
public void setFileName(String fName) {
  this.fileName = fName;
}
//Reads record from buffered reader and returns a MapperRecord
public MapperRecord readRecord() throws MapperException {
  //. . . 
  return (MapperRecord)transformLine(record);
  //. . . 
}
//Transforms record to MapperRecord based on xml specs
private MapperRecord transformLine(String record) {
  MapperRecord rec = new MapperRecord(); //Create empty mapper record
  HashMap map = (HashMap)mapList.get(fileMapName); //Get map
  ArrayList fieldList = (ArrayList)map.get(FIELD_LIST); //Get field list
   //If delimiter specified, then tokenize and place data in mapper record
  if (map.get(DELIMITER) != null) {
    StringTokenizer st = new StringTokenizer(record,(String)map.get(DELIMITER));
    for (Iterator fieldIterator = fieldList.iterator(); 
           fieldIterator.hasNext() && st.hasMoreElements(); ) {
      HashMap field = (HashMap)fieldIterator.next();
      rec.put(field.get(FIELD_NAME),(String)st.nextToken());
    }
  }
  //Have to parse record based on fixed lengths specified for each field
  else {
    for (Iterator i = fieldList.iterator(); i.hasNext(); ) {
      HashMap field = (HashMap)i.next();
      int start = (new Integer((String)field.get(START))).intValue();
      int end = (new Integer((String)field.get(END))).intValue();
      String str;
      try {
        str = ((String)record.substring(start-1,end));
      } catch (StringIndexOutOfBoundsException e) { //Reached end of record
        try { //Get remaining data
          str = ((String)record.substring(start-1));
        } catch (StringIndexOutOfBoundsException ex) {
          str = record;
        }
      }
      rec.put(field.get(FIELD_NAME),str);
    }
  }
  return rec;
}

If you create a Perl script for each file format your system handles, you should consider using this entity as an alternative to flat-file parsing. By placing the file format descriptions in an XML file, you can parse just about any file format -- comma delimited, pipe delimited, fixed length, and so on -- without writing any code (assuming the code in readRecord() can handle it). You'll save yourself from writing tedious custom code and you'll have the data records in a standard format to use with other business objects. Further, since the FileEntity object is a bidirectional map, it can also write data records in delimited or fixed-length format. The code for writeRecord() isn't shown above, but the code is just as straightforward; see this article's source code.

Read from and write to any table

The TableEntity also uses XML data mapping rules to place MapperRecords into a table called T_STAGING_TABLE, based on the field names keyed in the MapperRecord object and the column names supplied:

<!-- TableEntityList.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<tablemaps>
  <map name="table_map" table-name="T_STAGING_TABLE"
      driver="oracle.jdbc.driver.OracleDriver" 
      connect-string="jdbc:oracle:thin:user/pass@localhost:1521:DEV">
    <field name="id" column="MISC_CHAR1" column-type="String" />
    <field name="amount" column="MISC_NUM1" column-type="double" />
    <field name="date" column="MISC_DATE1" column-type="Date" date-
format="MMddyyyy" />
  </map>
</tablemaps>

The writeRecord() method for TableEntity uses a JDBC PreparedStatement to write the MapperRecord to the specified table:

//Write record to table
public void writeRecord(MapperRecord record) throws MapperException {
  try {
    ArrayList fieldList = (ArrayList)getFieldList();
    PreparedStatement ps = (PreparedStatement)getPreparedStatement(fieldList);
    int i=1;
    for (Iterator fieldIterator=fieldList.iterator(); fieldIterator.hasNext(); 
i++) {
      HashMap field = (HashMap)fieldIterator.next();
      String recordString = (String)record.get((String)field.get(FIELD_NAME)); 
      if (recordString == null) {
        recordString = "";
      }
      if (((String)field.get(COLUMN_TYPE)).equals(STRING)) {
        ps.setString(i,recordString);
      } else if (((String)field.get(COLUMN_TYPE)).equals(LONG)) {
        ps.setLong(i,Long.parseLong(recordString));
      } else if (((String)field.get(COLUMN_TYPE)).equals(DOUBLE)) {
        ps.setDouble(i,Double.parseDouble(recordString));
      } else if (((String)field.get(COLUMN_TYPE)).equals(DATE)) {
        ps.setDate(i,new java.sql.Date((parse(recordString,
           (String)field.get(DATE_FORMAT))).getTime()));
      } else if (((String)field.get(COLUMN_TYPE)).equals(TIMESTAMP)) {
        ps.setTimestamp(i,new java.sql.Timestamp((parse(recordString,
           (String)field.get(DATE_FORMAT))).getTime()));
      }
    }
    ps.execute();
    conn.commit();
    ps.close();
  } catch (Exception e) {
    e.printStackTrace();
    throw new MapperException("error writing to entity: "+tableMapName);
  }
}

Create mappings and manipulate data

The Mapper object is the module that ties the entities in the framework together. Based on XML, the object opens the proper entities, reads MapperRecords from the source entity, executes data-modifying tasks on each MapperRecord, and writes the MapperRecords to the target entity.

Data-modifying tasks are incorporated into this module so that you can manipulate MapperRecords before you write them to the target entity. For example, a convenient task for fields in a MapperRecord is to replace all occurrences of a string with another string:

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more