Use a RandomAccessFile to build a low-level database

How to extend a RandomAccessFile to support the storage and retrieval of arbitrary record data

As I searched JavaWorld's site for ideas for this month's Step by Step, I came across only a few articles covering low-level file access. Although high-level APIs such as JDBC give us the flexibility and power needed in large enterprise applications, many smaller applications require a more simple and elegant solution.

In this article, we will build an extension to the RandomAccessFile class that allows us to store and retrieve records. This "records file" will be equivalent to a persistent hashtable, allowing keyed objects to be stored and retrieved from file storage.

A primer on files and records

Before we jump headlong into the example, let's start with a basic backgrounder. We'll begin by defining some terms pertaining to files and records, then we'll briefly discuss class java.io.RandomAccessFile and platform-dependency.

Terminology

The following definitions are tuned to our example, rather than to traditional database terminology.

Record -- A collection of related data stored in a file. A record typically has multiple fields, each being a named and typed item of information.

Key -- An identifier for a record. Keys are usually unique.

File -- A sequential collection of data stored in some sort of stable storage such as a hard drive.

Nonsequential file access -- Allows data to be read from arbitrary locations in the file.

File pointer -- A number holding the position of the next byte of data to be read from a file.

Record pointer -- A record pointer is a file pointer that points to the location where a particular record begins.

Index -- A secondary means of accessing records in a file; that is, it maps keys to record pointers.

Heap -- A sequential file of unordered and variable-sized records. A heap requires some external indexing in order to meaningfully access the records.

Persistence -- Refers to storing an object or record for a certain length of time. This length of time is typically longer than the span of one process, so objects are usually persisted in files or databases.

Overview of class java.io.RandomAccessFile

Class RandomAccessFile is Java's way of providing nonsequential access to files. The class allows us to jump to a certain location in the file by using the seek() method. Once the file pointer has been positioned, data can be read from and written to the file using the DataInput and DataOutput interfaces. These interfaces allow us to read and write data in a platform-independent manner. Other handy methods in RandomAccessFile allow us to check and set the length of the file.

Platform-dependent considerations

Modern databases rely on disk drives for storage. Data on a disk drive is stored in blocks, which are distributed across tracks and surfaces. The disk's seek time and rotational delay dictate how data can be most efficiently stored and retrieved. A typical database management system relies closely on the disk's attributes in order to streamline performance. Unfortunately (or fortunately, depending on your interest in low-level file I/O!), these parameters lie far from reach when using a high-level file API such as java.io. Given this fact, our example will disregard the optimizations that knowledge of the disk's parameters could provide.

Designing the RecordsFile example

Now we're ready to design our example. To start, I'll lay out some design requirements and goals, resolve issues of concurrent access, and specify the low-level file format. Before proceeding to the implementation, we'll also look at the main record operations and their corresponding algorithms.

Requirements and goals

Our main goal in this example is to use a RandomAccessFile to provide a way of storing and retrieving record data. We'll associate a key of type String with each record as a means of uniquely identifying it. The keys will be limited to a maximum length, although the record data will not be limited. For the purposes of this example, our records will consist of only one field -- a "blob" of binary data. The file code will not attempt to interpret the record data in any way.

As a second design goal, we'll require that the number of records our file supports not be fixed at creation time. We will allow the file to grow and shrink as records are inserted and removed. Because our index and record data will be stored in the same file, this restriction will cause us to add extra logic to dynamically increase the index space by reorganizing records.

Accessing data in a file is orders of magnitude slower than accessing data in memory. This means that the number of file accesses the database performs will be the determining performance factor. We'll require that our main database operations not depend on the number of records in the file. In other words, they'll be of constant order time with respect to file accesses.

As a final requirement, we'll assume our index is small enough to load into memory. This will make it easier for our implementation to fulfill the requirement that dictates access time. We'll mirror the index in a Hashtable, which provides immediate record header lookups.

Code Correction

The code for this article has a bug that causes it to throw a NullPointerException in many possible cases. There is a routine named insureIndexSpace(int) in the abstract class BaseRecordsFile. The code is intended to move existing records to the end of the file if the index area needs to expand. After the "first" record's capacity is reset to its actual size, it is moved to the end. The dataStartPtr is then set to point to the second record in the file. Unfortunately, if there was free space in the first record, the new dataStartPtr will not point to a valid record, since it was incremented by the first record's length rather than its capacity. The modified Java source for BaseRecordsFile can be found in Resources.

from Ron Walkup

Senior Software Engineer

bioMerieux, Inc.

Synchronization and concurrent file access

For simplicity, we start by supporting only a single-thread model, in which file requests are prohibited from happening concurrently. We can accomplish this by synchronizing the public access methods of the BaseRecordsFile and RecordsFile classes. Note that you can relax this restriction to add support for concurrent reads and writes on nonconflicting records: You'll need to maintain a list of locked records and interleave reads and writes for concurrent requests.

Details of the file format

We will now explicitly define the format of the records file. The file consists of three regions, each with its own format.

The records file format

The file headers region. This first region holds the two essential headers needed to access records in our file. The first header, called the data start pointer, is a long that points to the start of the record data. This value tells us the size of the index region. The second header, called the num records header, is an int that gives the number of records in the database. The headers region starts on the first byte of the file and extends for FILE_HEADERS_REGION_LENGTH bytes. We'll use readLong() and readInt() to read the headers, and writeLong() and writeInt() to write the headers.

The index region. Each entry in the index consists of a key and a record header. The index starts on the first byte after the file headers region and extends until the byte before the data start pointer. From this information, we can calculate a file pointer to the start of any of the n entries in the index. Entries have a fixed length -- the key data starts on the first byte in the index entry and extends MAX_KEY_LENGTH bytes. The corresponding record header for a given key follows immediately after the key in the index. The record header tells us where the data is located, how many bytes the record can hold, and how many bytes it is actually holding. Index entries in the file index are in no particular order and do not map to the order in which the records are stored in the file.

Record data region. The record data region starts on the location indicated by the data start pointer and extends to the end of the file. Records are positioned back-to-back in the file with no free space permitted between records. This part of the file consists of raw data with no header or key information. The database file ends on the last block of the last record in the file, so there is no extra space at the end of the file. The file grows and shrinks as records are added and deleted.

The size allocated to a record does not always correspond to the actual amount of data the record contains. The record can be thought of as a container -- it may be only partially full. Valid record data is positioned at the start of the record.

Supported operations and their algorithms

The RecordsFile will support the following main operations:

  • Insert -- Adds a new record to the file

  • Read -- Reads a record from the file

  • Update -- Updates a record

  • Delete -- Deletes a record

  • Ensure capacity -- Grows the index region to accommodate new records

Before we step through the source code, let's go over the chosen algorithms for each of these operations:

Insert. This operation inserts a new record into the file. To insert, we:

  1. Make sure the key being inserted isn't already contained in the file
  2. Ensure the index region is large enough for the additional entry
  3. Find free space in the file large enough to hold the record
  4. Write the record data to the file
  5. Add the record header to the index

Read. This operation retrieves a requested record from the file based on a key. To retrieve a record, we:

  1. Use the index to map the given key to the record header
  2. Seek down to the start of the data (using the pointer to the record data stored in the header)
  3. Read the record's data from the file

Update. This operation updates an existing record with new data, replacing the new data with the old. The steps for our update vary, depending on the size of the new record data. If the new data fits into the existing record, we:

  1. Write the record data to the file, overwriting the previous data
  2. Update the attribute that holds the length of the data in the record's header

Otherwise, if the data is too big for the record, we:

  1. Perform a delete operation on the existing record
  2. Perform an insert of the new data

Delete. This operation removes a record from the file. To delete a record, we:

  1. Reclaim the space allocated to the record being removed by either shrinking the file, if the record is the last one in the file, or by adding its space to an adjacent record

  2. Remove the record's header from the index by replacing the entry being deleted with the last entry in the index; this ensures the index is always full, with no empty spaces between entries

Ensure capacity. This operation makes sure the index region is large enough to accommodate additional entries. In a loop, we move records from the front to the end of the file until there is sufficient space. To move one record we:

  1. Locate the record header of the first record in the file; note that this is the record with data at the top of the record data region -- not the record with the first header in the index

  2. Read the target record's data

  3. Grow the file by the size of the target record's data by using the setLength(long) method in RandomAccessFile

  4. Write the record data to the bottom of the file

  5. Update the data pointer in the record that was moved

  6. Update the global header that points to the first record's data

Implementation details -- stepping though the source code

We're now ready to get our hands dirty and work through the code for the example. You can download the complete source from Resources.

Note: You must use the Java 2 platform (formerly known as JDK 1.2) to compile the source.

Class BaseRecordsFile

BaseRecordsFile is an abstract class and is the main implementation of our example. It defines the main access methods as well as a slew of utility methods for manipulating records and index entries.

package hamner.db;

import java.io.*;

import java.util.*;

public abstract class BaseRecordsFile { // The database file. private RandomAccessFile file; // Current file pointer to the start of the record data. protected long dataStartPtr; // Total length in bytes of the global database headers. protected static final int FILE_HEADERS_REGION_LENGTH = 16; // Number of bytes in the record header. protected static final int RECORD_HEADER_LENGTH = 16; // The length of a key in the index. protected static final int MAX_KEY_LENGTH = 64; // The total length of one index entry - the key length plus the record header length. protected static final int INDEX_ENTRY_LENGTH = MAX_KEY_LENGTH + RECORD_HEADER_LENGTH; // File pointer to the num records header. protected static final long NUM_RECORDS_HEADER_LOCATION = 0; // File pointer to the data start pointer header. protected static final long DATA_START_HEADER_LOCATION = 4; /** * Creates a new database file, initializing the appropriate headers. Enough space is allocated in * the index for the specified initial size. */ protected BaseRecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException { File f = new File(dbPath); if (f.exists()) { throw new RecordsFileException("Database already exits: " + dbPath); } file = new RandomAccessFile(f, "rw"); dataStartPtr = indexPositionToKeyFp(initialSize); // Record Data Region starts were the setFileLength(dataStartPtr); // (i+1)th index entry would start. writeNumRecordsHeader(0); writeDataStartPtrHeader(dataStartPtr); } /** * Opens an existing database file and initializes the dataStartPtr. The accessFlags * parameter can be "r" or "rw" -- as defined in RandomAccessFile. */ protected BaseRecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException { File f = new File (dbPath); if(!f.exists()) { throw new RecordsFileException("Database not found: " + dbPath); } file = new RandomAccessFile(f, accessFlags); dataStartPtr = readDataStartHeader(); } /** * Returns an Enumeration of the keys of all records in the database. */ public abstract Enumeration enumerateKeys(); /** * Returns the number or records in the database. */ public abstract int getNumRecords(); /** * Checks there is a record with the given key. */ public abstract boolean recordExists(String key); /** * Maps a key to a record header. */ protected abstract RecordHeader keyToRecordHeader(String key) throws RecordsFileException; /** * Locates space for a new record of dataLength size and initializes a RecordHeader. */ protected abstract RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, IOException; /** * Returns the record to which the target file pointer belongs - meaning the specified location * in the file is part of the record data of the RecordHeader which is returned. Returns null if * the location is not part of a record. (O(n) mem accesses) */ protected abstract RecordHeader getRecordAt(long targetFp) throws RecordsFileException; protected long getFileLength() throws IOException { return file.length(); } protected void setFileLength(long l) throws IOException { file.setLength(l); } /** * Reads the number of records header from the file. */ protected int readNumRecordsHeader() throws IOException { file.seek(NUM_RECORDS_HEADER_LOCATION); return file.readInt(); } /** * Writes the number of records header to the file. */ protected void writeNumRecordsHeader(int numRecords) throws IOException { file.seek(NUM_RECORDS_HEADER_LOCATION); file.writeInt(numRecords); } /** * Reads the data start pointer header from the file. */ protected long readDataStartHeader() throws IOException { file.seek(DATA_START_HEADER_LOCATION); return file.readLong(); } /** * Writes the data start pointer header to the file. */ protected void writeDataStartPtrHeader(long dataStartPtr) throws IOException { file.seek(DATA_START_HEADER_LOCATION); file.writeLong(dataStartPtr); } /** * Returns a file pointer in the index pointing to the first byte * in the key located at the given index position. */ protected long indexPositionToKeyFp(int pos) { return FILE_HEADERS_REGION_LENGTH + (INDEX_ENTRY_LENGTH * pos); } /** * Returns a file pointer in the index pointing to the first byte * in the record pointer located at the given index position. */ long indexPositionToRecordHeaderFp(int pos) { return indexPositionToKeyFp(pos) + MAX_KEY_LENGTH; } /** * Reads the ith key from the index. */ String readKeyFromIndex(int position) throws IOException { file.seek(indexPositionToKeyFp(position)); return file.readUTF(); } /** * Reads the ith record header from the index. */ RecordHeader readRecordHeaderFromIndex(int position) throws IOException { file.seek(indexPositionToRecordHeaderFp(position)); return RecordHeader.readHeader(file); } /** * Writes the ith record header to the index. */ protected void writeRecordHeaderToIndex(RecordHeader header) throws IOException { file.seek(indexPositionToRecordHeaderFp(header.indexPosition)); header.write(file); } /** * Appends an entry to end of index. Assumes that insureIndexSpace() has already been called. */ protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOException, RecordsFileException { DbByteArrayOutputStream temp = new DbByteArrayOutputStream(MAX_KEY_LENGTH); (new DataOutputStream(temp)).writeUTF(key); if (temp.size() > MAX_KEY_LENGTH) { throw new RecordsFileException("Key is larger than permitted size of " + MAX_KEY_LENGTH + " bytes"); } file.seek(indexPositionToKeyFp(currentNumRecords)); temp.writeTo(file); file.seek(indexPositionToRecordHeaderFp(currentNumRecords)); newRecord.write(file); newRecord.setIndexPosition(currentNumRecords); writeNumRecordsHeader(currentNumRecords+1); } /** * Removes the record from the index. Replaces the target with the entry at the * end of the index. */ protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOException, RecordsFileException { if (header.indexPosition != currentNumRecords -1) { String lastKey = readKeyFromIndex(currentNumRecords-1); RecordHeader last = keyToRecordHeader(lastKey); last.setIndexPosition(header.indexPosition); file.seek(indexPositionToKeyFp(last.indexPosition)); file.writeUTF(lastKey); file.seek(indexPositionToRecordHeaderFp(last.indexPosition)); last.write(file); } writeNumRecordsHeader(currentNumRecords-1); } /** * Adds the given record to the database. */ public synchronized void insertRecord(RecordWriter rw) throws RecordsFileException, IOException { String key = rw.getKey(); if (recordExists(key)) { throw new RecordsFileException("Key exists: " + key); } insureIndexSpace(getNumRecords() + 1); RecordHeader newRecord = allocateRecord(key, rw.getDataLength()); writeRecordData(newRecord, rw); addEntryToIndex(key, newRecord, getNumRecords()); } /** * Updates an existing record. If the new contents do not fit in the original record, * then the update is handled by deleting the old record and adding the new. */ public synchronized void updateRecord(RecordWriter rw) throws RecordsFileException, IOException { RecordHeader header = keyToRecordHeader(rw.getKey()); if (rw.getDataLength() > header.dataCapacity) { deleteRecord(rw.getKey()); insertRecord(rw); } else { writeRecordData(header, rw); writeRecordHeaderToIndex(header); } } /** * Reads a record. */ public synchronized RecordReader readRecord(String key) throws RecordsFileException, IOException { byte[] data = readRecordData(key); return new RecordReader(key, data); } /** * Reads the data for the record with the given key. */ protected byte[] readRecordData(String key) throws IOException, RecordsFileException { return readRecordData(keyToRecordHeader(key)); } /** * Reads the record data for the given record header. */ protected byte[] readRecordData(RecordHeader header) throws IOException { byte[] buf = new byte[header.dataCount]; file.seek(header.dataPointer); file.readFully(buf); return buf; } /** * Updates the contents of the given record. A RecordsFileException is thrown if the new data does not * fit in the space allocated to the record. The header's data count is updated, but not * written to the file. */ protected void writeRecordData(RecordHeader header, RecordWriter rw) throws IOException, RecordsFileException { if (rw.getDataLength() > header.dataCapacity) { throw new RecordsFileException ("Record data does not fit"); } header.dataCount = rw.getDataLength(); file.seek(header.dataPointer); rw.writeTo((DataOutput)file); } /** * Updates the contents of the given record. A RecordsFileException is thrown if the new data does not * fit in the space allocated to the record. The header's data count is updated, but not * written to the file. */ protected void writeRecordData(RecordHeader header, byte[] data) throws IOException, RecordsFileException { if (data.length > header.dataCapacity) { throw new RecordsFileException ("Record data does not fit"); } header.dataCount = data.length; file.seek(header.dataPointer); file.write(data, 0, data.length); } /** * Deletes a record. */ public synchronized void deleteRecord(String key) throws RecordsFileException, IOException { RecordHeader delRec = keyToRecordHeader(key); int currentNumRecords = getNumRecords(); if (getFileLength() == delRec.dataPointer + delRec.dataCapacity) { // shrink file since this is the last record in the file setFileLength(delRec.dataPointer); } else { RecordHeader previous = getRecordAt(delRec.dataPointer -1); if (previous != null) {

// append space of deleted record onto previous record

previous.dataCapacity += delRec.dataCapacity;

writeRecordHeaderToIndex(previous); } else {

// target record is first in the file and is deleted by adding its space to

// the second record.

RecordHeader secondRecord = getRecordAt(delRec.dataPointer + (long)delRec.dataCapacity);

byte[] data = readRecordData(secondRecord);

secondRecord.dataPointer = delRec.dataPointer;

secondRecord.dataCapacity += delRec.dataCapacity;

writeRecordData(secondRecord, data);

writeRecordHeaderToIndex(secondRecord); } } deleteEntryFromIndex(key, delRec, currentNumRecords); } // Checks to see if there is space for and additional index entry. If // not, space is created by moving records to the end of the file. protected void insureIndexSpace(int requiredNumRecords) throws RecordsFileException, IOException { int currentNumRecords = getNumRecords(); long endIndexPtr = indexPositionToKeyFp(requiredNumRecords); if (endIndexPtr > getFileLength() && currentNumRecords == 0) { setFileLength(endIndexPtr); dataStartPtr = endIndexPtr; writeDataStartPtrHeader(dataStartPtr); return; } while (endIndexPtr > dataStartPtr) { RecordHeader first = getRecordAt(dataStartPtr); byte[] data = readRecordData(first); first.dataPointer = getFileLength(); first.dataCapacity = data.length; setFileLength(first.dataPointer + data.length); writeRecordData(first, data); writeRecordHeaderToIndex(first); dataStartPtr += first.dataCapacity; writeDataStartPtrHeader(dataStartPtr); } } /** * Closes the file. */ public synchronized void close() throws IOException, RecordsFileException { try { file.close(); } finally { file = null; } }

}

Figure 1.

HREF="http://www.javaworld.com/javaworld/jw-01-1999/step/hamner/db/BaseRecordsFile.java">BaseRecordsFile

Because the code is fairly well documented, I won't step through the entire class; I'll only cover the more important segments.

private RandomAccessFile file;

All file access is done through an instance of RandomAccessFile. We declare it as private so we don't have to worry that subclasses will inadvertently corrupt our data format.

protected long dataStartPtr;

This is the current file pointer to the start of the record data. It mirrors the value of the data start pointer header in the file in order to reduce file accesses.

protected BaseRecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException {
   File f = new File(dbPath);
   if (f.exists()) {
      throw new RecordsFileException("Database already exits: " + dbPath);
   } 
   file = new RandomAccessFile(f, "rw");
   dataStartPtr = indexPositionToKeyFp(initialSize); 
   setFileLength(dataStartPtr);                     
   writeNumRecordsHeader(0);
   writeDataStartPtrHeader(dataStartPtr);
}

The first constructor for this class is used to create a new database file. After checking that the file doesn't already exist, the RandomAccessFile is created. The dataStartPtr is set to the byte where the (n+1)th index entry would be, meaning exactly enough space is allocated for the index region to hold n index entries. The two file headers are then written to the file using the writeNumRecordsHeader() and writeDataStartPtrHeader() methods.

protected BaseRecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException {
   File f = new File (dbPath);
   if(!f.exists()) {
      throw new RecordsFileException("Database not found: " + dbPath);
   } 
   file = new RandomAccessFile(f, accessFlags);
   dataStartPtr = readDataStartHeader();
}

The other constructor is used to open an existing database. The accessFlags parameter determines whether the file will be read-only (r) or read-write (rw). Once the file is opened, the dataStartPtr variable is initialized to the value held in the file.

protected int readNumRecordsHeader() throws IOException {
   file.seek(NUM_RECORDS_HEADER_LOCATION);
   return file.readInt();
}

The above method is used to fetch the number of record headers from the file. We first use seek() to position the file pointer, and then use readInt() to read the data. This method is typical of how most data access methods work in this class.

protected long indexPositionToKeyFp(int pos) {
   return FILE_HEADERS_REGION_LENGTH + (INDEX_ENTRY_LENGTH * pos);
}

The indexPositionToKeyFp() method maps a position in the file index to a file pointer pointing to the first byte of the key part of the index entry. Because the index entries are fixed length, it's an easy task to calculate this value; we just multiply the length of one entry by the position and then add FILE_HEADERS_REGION_LENGTH bytes to account for the headers region at the start of the file.

The indexPositionToRecordHeaderFp() method is identical to this method except it adds an offset to account for the length of a key. These two methods enable us to iterate over all the records in the file.

protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOException, RecordsFileException {
   DbByteArrayOutputStream temp = new DbByteArrayOutputStream(MAX_KEY_LENGTH);
   (new DataOutputStream(temp)).writeUTF(key);
   if (temp.size() > MAX_KEY_LENGTH) {
      throw new RecordsFileException("Key is larger than permitted size of " + MAX_KEY_LENGTH + " bytes");
   }
   file.seek(indexPositionToKeyFp(currentNumRecords));
   temp.writeTo(file); 
   file.seek(indexPositionToRecordHeaderFp(currentNumRecords));
   newRecord.write(file);
   newRecord.setIndexPosition(currentNumRecords);
   writeNumRecordsHeader(currentNumRecords+1);
}

The addEntryToIndex() method adds a new entry to the end of the index, and then updates the record count by calling writeNumRecordsHeader(). It assumes there is sufficient space in the index to hold the entry. The temp stream is used to ensure the key fits in the allowed space. A RecordsFileException is thrown if the key doesn't fit.

protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOException, RecordsFileException {
   if (header.indexPosition != currentNumRecords -1) {
      String lastKey = readKeyFromIndex(currentNumRecords-1);
      RecordHeader last  = keyToRecordHeader(lastKey);
      last.setIndexPosition(header.indexPosition);
      file.seek(indexPositionToKeyFp(last.indexPosition));
      file.writeUTF(lastKey);
      file.seek(indexPositionToRecordHeaderFp(last.indexPosition));
      last.write(file);
   }
   writeNumRecordsHeader(currentNumRecords-1);
}

The deleteEntryFromIndex() method removes an entry from the index by copying the last entry into the position of the entry being removed. After moving the entry, the record count header is decremented.

public synchronized void insertRecord(RecordWriter rw) throws RecordsFileException, IOException {
   String key = rw.getKey();
   if (recordExists(key)) {
      throw new RecordsFileException("Key exists: " + key);
   }
   insureIndexSpace(getNumRecords() + 1);
   RecordHeader newRecord = allocateRecord(key, rw.getDataLength());
   writeRecordData(newRecord, rw);
   addEntryToIndex(key, newRecord, getNumRecords());
}

We insert a record by growing the index space (if necessary) by calling insureIndexSpace() and then calling allocateRecord() to locate a position for the new record data. The data is then written to the file and the entry is added to the index.

public synchronized void updateRecord(RecordWriter rw) throws RecordsFileException, IOException { 
   RecordHeader header = keyToRecordHeader(rw.getKey());
   if (rw.getDataLength() > header.dataCapacity) {
      deleteRecord(rw.getKey());
      insertRecord(rw);
   } else {
      writeRecordData(header, rw);
      writeRecordHeaderToIndex(header);
   }
}

If the new record data fits in the original space allocated to the record, updateRecord() writes the new record data into that space, then updates the header. Otherwise, we must do the high-level, delete-and-insert operation to allocate enough space for the new data.

protected void writeRecordData(RecordHeader header, RecordWriter rw) throws IOException, RecordsFileException {
   if (rw.getDataLength() > header.dataCapacity) {
      throw new RecordsFileException ("Record data does not fit");
   }
   header.dataCount = rw.getDataLength();
   file.seek(header.dataPointer);
   rw.writeTo((DataOutput)file);
}

This method first checks that the new data fits in the old space and then positions the file pointer to the dataPointer field in the header. It then calls the writeTo() method of the RecordWriter in order to write the data to the file. There is also a writeRecordData() method, which takes a raw byte buffer instead of a RecordWriter.

public synchronized void deleteRecord(String key) throws RecordsFileException, IOException {
   RecordHeader delRec = keyToRecordHeader(key);
   int currentNumRecords = getNumRecords();
   if (getFileLength() == delRec.dataPointer + delRec.dataCapacity) {
      // shrink file since this is the last record in the file
      setFileLength(delRec.dataPointer);
   } else {
      RecordHeader previous = getRecordAt(delRec.dataPointer -1);
      if (previous != null) {
         // append space of deleted record onto previous record
         previous.dataCapacity += delRec.dataCapacity;
         writeRecordHeaderToIndex(previous);
      } else {
         // target record is first in the file and is deleted by adding its space to
         // the second record.
         RecordHeader secondRecord = getRecordAt(delRec.dataPointer + (long)delRec.dataCapacity);
         byte[] data = readRecordData(secondRecord);
         secondRecord.dataPointer = delRec.dataPointer;
         secondRecord.dataCapacity += delRec.dataCapacity;
         writeRecordData(secondRecord, data);
         writeRecordHeaderToIndex(secondRecord);
      }
   }
   deleteEntryFromIndex(key, delRec, currentNumRecords); 
}

We can delete a record in one of several ways. If the record is the last one in the file, we simply shrink the file by calling the setFileLength() method. Otherwise, we have two choices. If the record has a record before it in the file (which is found by calling getRecordAt()), we add its space onto the previous record's space; if the record is the first one in the file, its space is added onto the second record's space.

protected void insureIndexSpace(int requiredNumRecords) throws RecordsFileException, IOException {
   int currentNumRecords = getNumRecords();
   long endIndexPtr = indexPositionToKeyFp(requiredNumRecords);
   if (endIndexPtr > getFileLength() && currentNumRecords == 0) {
       setFileLength(endIndexPtr);
       dataStartPtr = endIndexPtr;
       writeDataStartPtrHeader(dataStartPtr);
       return;
   }
   while (endIndexPtr > dataStartPtr) {
      RecordHeader first = getRecordAt(dataStartPtr);
      byte[] data = readRecordData(first);
      first.dataPointer = getFileLength();
      first.dataCapacity = data.length;
      setFileLength(first.dataPointer + data.length);
      writeRecordData(first, data);
      writeRecordHeaderToIndex(first);
      dataStartPtr += first.dataCapacity;
      writeDataStartPtrHeader(dataStartPtr);  
   }
}

The insureIndexSpace() is used to grow the index. If the file is empty, the file is grown to the desired size. In the while loop, records are read from the top of the file and written to the end of the file. The dataStartPtr is updated with each iteration.

Class RecordsFile

RecordsFile extends BaseRecordsFile to provide an implementation for the abstract methods.

package hamner.db;
import java.io.*;
import java.util.*;
public class RecordsFile extends BaseRecordsFile {
  /**
   * Hashtable which holds the in-memory index. For efficiency, the entire index 
   * is cached in memory. The hashtable maps a key of type String to a RecordHeader.
   */
  protected Hashtable memIndex;    
  /**
   * Creates a new database file.  The initialSize parameter determines the 
   * amount of space which is allocated for the index.  The index can grow 
   * dynamically, but the parameter is provide to increase 
   * efficiency. 
   */
  public RecordsFile(String dbPath, int initialSize) throws IOException, RecordsFileException {
    super(dbPath, initialSize);
    memIndex = new Hashtable(initialSize);
  }
  /**
   * Opens an existing database and initializes the in-memory index. 
   */
  public RecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException {
    super(dbPath, accessFlags);
    int numRecords = readNumRecordsHeader();
    memIndex = new Hashtable(numRecords);
    for (int i = 0; i < numRecords; i++) {
      String key = readKeyFromIndex(i);
      RecordHeader header = readRecordHeaderFromIndex(i);
      header.setIndexPosition(i);
      memIndex.put(key, header);
    }
  }
  /**
   * Returns an enumeration of all the keys in the database.
   */
  public synchronized Enumeration enumerateKeys() {
    return memIndex.keys();
  }
  /**
   * Returns the current number of records in the database. 
   */
  public synchronized int getNumRecords() {
    return memIndex.size();
  }
  /**
   * Checks if there is a record belonging to the given key. 
   */
  public synchronized boolean recordExists(String key) {
    return memIndex.containsKey(key);
  }
  /**
   * Maps a key to a record header by looking it up in the in-memory index.
   */
  protected RecordHeader keyToRecordHeader(String key) throws RecordsFileException {
    RecordHeader h = (RecordHeader)memIndex.get(key);
    if (h==null) {
      throw new RecordsFileException("Key not found: " + key);
    } 
    return h;
  }
  /**
   * This method searches the file for free space and then returns a RecordHeader 
   * which uses the space. (O(n) memory accesses)
   */
  protected RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, IOException {
    // search for empty space
    RecordHeader newRecord = null;
    Enumeration e = memIndex.elements();
    while (e.hasMoreElements()) {
      RecordHeader next = (RecordHeader)e.nextElement();
      int free = next.getFreeSpace();
      if (dataLength <= next.getFreeSpace()) {
    newRecord = next.split();
    writeRecordHeaderToIndex(next);
    break;
      }
    }
    if (newRecord == null) {
      // append record to end of file - grows file to allocate space
      long fp = getFileLength();
      setFileLength(fp + dataLength);
      newRecord = new RecordHeader(fp, dataLength);
    } 
    return newRecord;
  }
  /**
   * Returns the record to which the target file pointer belongs - meaning the specified location
   * in the file is part of the record data of the RecordHeader which is returned.  Returns null if 
   * the location is not part of a record. (O(n) mem accesses)
   */
  protected RecordHeader getRecordAt(long targetFp) throws RecordsFileException {
    Enumeration e = memIndex.elements();
    while (e.hasMoreElements()) {
      RecordHeader next = (RecordHeader) e.nextElement();
      if (targetFp >= next.dataPointer &&
      targetFp < next.dataPointer + (long)next.dataCapacity) {
    return next;
      }
    }
    return null;
  }
  /**
   * Closes the database. 
   */
  public synchronized void close() throws IOException, RecordsFileException {
    try {
      super.close();
    } finally {
      memIndex.clear();
      memIndex = null;
    }
  }
  /**
   * Adds the new record to the in-memory index and calls the super class add
   * the index entry to the file. 
   */
  protected void addEntryToIndex(String key, RecordHeader newRecord, int currentNumRecords) throws IOException, RecordsFileException {
    super.addEntryToIndex(key, newRecord, currentNumRecords);
    memIndex.put(key, newRecord);   
  }
  /**
   * Removes the record from the index. Replaces the target with the entry at the 
   * end of the index. 
   */
  protected void deleteEntryFromIndex(String key, RecordHeader header, int currentNumRecords) throws IOException, RecordsFileException {
    super.deleteEntryFromIndex(key, header, currentNumRecords);
    RecordHeader deleted = (RecordHeader)memIndex.remove(key);
  }
}

Figure 2. BaseRecordsFile

As with the previous class, I won't discuss all the methods here because most of the code is straightforward and well documented.

protected Hashtable memIndex;

The instance variable memIndex is used to cache the entire file index in memory. The hashtable maps keys to record headers.

public RecordsFile(String dbPath, String accessFlags) throws IOException, RecordsFileException {
   super(dbPath, accessFlags);
   int numRecords = readNumRecordsHeader();
   memIndex = new Hashtable(numRecords);
   for (int i = 0; i < numRecords; i++) {
      String key = readKeyFromIndex(i);
      RecordHeader header = readRecordHeaderFromIndex(i);
      header.setIndexPosition(i);
      memIndex.put(key, header);
   }
}

This constructor initializes the file by calling the super's constructor and then prepares the in-memory index by iterating over the index entries while using the readKeyFromIndex() and readRecordHeaderFromIndex() methods to add entries to the hashtable.

protected RecordHeader keyToRecordHeader(String key) throws RecordsFileException {
   RecordHeader h = (RecordHeader)memIndex.get(key);
   if (h==null) {
      throw new RecordsFileException("Key not found: " + key);
   } 
   return h;
}

This method maps a record's key to its RecordHeader. Note that because no file accesses are required, and the lookup is done through the hashtable, this operation takes place in constant order time; that is, the time it takes to look up one record is not a function of the number of records in the file.

protected RecordHeader allocateRecord(String key, int dataLength) throws RecordsFileException, IOException {
   // search for empty space
   RecordHeader newRecord = null;    
   Enumeration e = memIndex.elements();
   while (e.hasMoreElements()) {
      RecordHeader next = (RecordHeader)e.nextElement();
      int free = next.getFreeSpace();
      if (dataLength <= next.getFreeSpace()) {
         newRecord = next.split();
         writeRecordHeaderToIndex(next);
         break;
      }
   }
   if (newRecord == null) {
      // append record to end of file - grow file to allocate space
      long fp = getFileLength();
      setFileLength(fp + dataLength);
      newRecord = new RecordHeader(fp, dataLength);
   }
   return newRecord;
}

This method locates free space for a new record of size dataLength by iterating over all the entries in memIndex until adequate space is found. If space is not found in an existing record, the new record is appended to the end of the file. If the record does contain space, it is split into two parts: one for the existing record and one for the new record.

I need to point out here that this is not the optimal solution for this operation because it makes inserting an element an order n operation with respect to memory accesses. A more advanced implementation of the RecordsFile class would keep a sorted memory map so that finding free space would give log(n) performance. However, this implementation still holds true to our design requirement, since the number of file accesses required is not dependent on the number of records in the file.

protected RecordHeader getRecordAt(long targetFp) throws RecordsFileException {
   Enumeration e = memIndex.elements();
   while (e.hasMoreElements()) {
      RecordHeader next = (RecordHeader) e.nextElement();
      if (targetFp >= next.dataPointer && targetFp < next.dataPointer + (long)next.dataCapacity) {
         return next;
      }
   }
   return null;
}

This method locates the record that has data residing at the indicated position in the file. As with the last method, this implementation is O(n) with respect to memory accesses and leaves room for improvement.

Class RecordHeader

RecordHeader provides a wrapper to hold key information about a record.

package hamner.db;
import java.io.*;
public class RecordHeader {
  /**
   * File pointer to the first byte of record data (8 bytes).
   */
  protected long dataPointer; 
  /**
   * Actual number of bytes of data held in this record (4 bytes).
   */
  protected int dataCount;
  /**
   * Number of bytes of data that this record can hold (4 bytes).
   */
  protected int dataCapacity;  
  /**
   * Indicates this header's position in the file index.
   */
  protected int indexPosition;
  protected RecordHeader() {
  }
  protected RecordHeader(long dataPointer, int dataCapacity) {
    if (dataCapacity < 1) {
      throw new IllegalArgumentException("Bad record size: " + dataCapacity);
    }
    this.dataPointer = dataPointer;
    this.dataCapacity = dataCapacity;
    this.dataCount = 0;
  }
  protected int getIndexPosition() {
    return indexPosition;
  }
  protected void setIndexPosition(int indexPosition) {
    this.indexPosition = indexPosition;
  }
  protected int getDataCapacity() {
    return dataCapacity;
  }
  protected int getFreeSpace() {
    return dataCapacity - dataCount;
  }
  protected void read(DataInput in) throws IOException {
    dataPointer = in.readLong();
    dataCapacity = in.readInt();
    dataCount = in.readInt();
  }
  protected void write(DataOutput out) throws IOException {
    out.writeLong(dataPointer);
    out.writeInt(dataCapacity);
    out.writeInt(dataCount);
  }
  protected static RecordHeader readHeader(DataInput in) throws IOException {
    RecordHeader r = new RecordHeader();
    r.read(in);
    return r;
  }
  /**
   * Returns a new record header which occupies the free space of this record.
   * Shrinks this record size by the size of its free space.
   */
  protected RecordHeader split() throws RecordsFileException {
    long newFp = dataPointer + (long)dataCount;
    RecordHeader newRecord = new RecordHeader(newFp, getFreeSpace());
    dataCapacity = dataCount;
    return newRecord;
  }
}

Figure 3. RecordHeader

Here are the highlights of the source:

protected long dataPointer;

The dataPointer is a file pointer to the first byte of this record's data.

protected int dataCount;

The dataCount is used to keep track of the actual number of bytes of data held in this record. This is the number of valid bytes written and should not to be confused with the dataCapacity.

protected int dataCapacity;  

The dataCapacity is the number of bytes of data this record can hold. This value will always be equal to or larger than the dataCount.

protected int indexPosition;

The index position is used to speed accesses to this header's entry in the file. Unlike the other fields in this class, this value is stored only in-memory and is not written to the file as part of the record header.

protected void write(DataOutput out) throws IOException {
   out.writeLong(dataPointer);
   out.writeInt(dataCapacity);
   out.writeInt(dataCount);
}

The write() method writes the header to the file.

protected void read(DataInput in) throws IOException {
   dataPointer = in.readLong();
   dataCapacity = in.readInt();
   dataCount = in.readInt();
}

This method reads and initializes the header fields from the file. Note that the fields are read in the same order in which they were written.

protected RecordHeader split() throws RecordsFileException {
   long newFp = dataPointer + (long)dataCount;
   RecordHeader newRecord = new RecordHeader(newFp, getFreeSpace());
   dataCapacity = dataCount;
   return newRecord;
}

The split() method returns a new record header, which occupies the free space of this record. It also shrinks this record's size by the size of its free space.

Class RecordWriter

RecordWriter is used to construct a new record and write it to the file as a byte buffer. To accomplish this, the user writes data into a DbByteArrayOutputStream, which is in turn written to the file. This design protects the file from being corrupted by the user, since the user is never allowed to write directly to the file. This speeds things up, since the user can build a record (often times consuming due to object serialization) without blocking other users from reading and writing records.

package hamner.db;
import java.io.*;
public class RecordWriter {
  String key;
  DbByteArrayOutputStream out;
  ObjectOutputStream objOut;
  public RecordWriter(String key) {
    this.key = key;
    out = new DbByteArrayOutputStream();
  }
  public String getKey() {
    return key;
  }
  public OutputStream getOutputStream() {
    return out;
  }
  public ObjectOutputStream getObjectOutputStream() throws IOException {
    if (objOut == null) {
      objOut = new ObjectOutputStream(out);
    }
    return objOut;
  }
  public void writeObject(Object o) throws IOException {
    getObjectOutputStream().writeObject(o);
    getObjectOutputStream().flush();
  }
  /**
   * Returns the number of bytes in the data.
   */
  public int getDataLength() {
    return out.size();
  }
  /**
   *  Writes the data out to the stream without re-allocating the buffer.
   */
  public void writeTo(DataOutput str) throws IOException {
    out.writeTo(str);
  }
}

Figure 4. RecordWriter

Here are the code highlights for this class:

String key;
DbByteArrayOutputStream out;
ObjectOutputStream objOut;
public RecordWriter(String key) {
   this.key = key;
   out = new DbByteArrayOutputStream();
}

The constructor sets the key field and then creates a DbByteArrayOutputStream, which is used to buffer the data written to the record.

public ObjectOutputStream getObjectOutputStream() throws IOException {
   if (objOut == null) {
      objOut = new ObjectOutputStream(out);
   }
   return objOut;
}

This method returns an ObjectOutputStream that can be used to store serialized objects in the record.

public void writeObject(Object o) throws IOException {
   getObjectOutputStream().writeObject(o);
   getObjectOutputStream().flush();
}

This utility method serializes an object to the record.

public int getDataLength() {
   return out.size();
}

This method returns the number of bytes of data in this record. Note that this method will only return the correct value if all buffered streams constructed off of the output stream have been flushed.

public void writeTo(DataOutput str) throws IOException {
   out.writeTo(str);
}

This method writes the record data to the DataOutput stream. It uses the writeTo() method of class DbByteArrayOutputStream.

Class RecordReader

A RecordReader is used to read data from the record. It is a utility class that makes record data more accessible than it is in its raw byte buffer form.

package hamner.db;
import java.io.*;
public class RecordReader {
  String key;
  byte[] data;
  ByteArrayInputStream in;
  ObjectInputStream objIn;
  public RecordReader(String key, byte[] data) {
    this.key = key;
    this.data = data;
    in = new ByteArrayInputStream(data);
  }
  public String getKey() {
    return key;
  }
  public byte[] getData() {
    return data;
  }
  public InputStream getInputStream() throws IOException {
    return in;
  }
  public ObjectInputStream getObjectInputStream() throws IOException {
    if (objIn == null) {
      objIn = new ObjectInputStream(in);
    }
    return objIn;
  }
  /**
   * Reads the next object in the record using an ObjectInputStream.
   */
  public Object readObject() throws IOException, OptionalDataException, ClassNotFoundException {
    return getObjectInputStream().readObject();
  }
}

Figure 5. RecordReader

Here's a look at the code highlights:

String key;
byte[] data;
ByteArrayInputStream in;
ObjectInputStream objIn;

The RecordReader has four instance variables:

  • The record's key

  • A byte buffer that holds the record data

  • A ByteArrayInputStream for reading the record data

  • An ObjectInputStream used for reading serialized objects from the record

public RecordReader(String key, byte[] data) {
   this.key = key;
   this.data = data;
   in = new ByteArrayInputStream(data);
}

The constructor initializes the key and data fields and then creates the ByteArrayInputStream off of the data buffer.

public InputStream getInputStream() throws IOException {
   return in;
}

This method returns an input stream for reading fields from the record.

public ObjectInputStream getObjectInputStream() throws IOException {
   if (objIn == null) {
      objIn = new ObjectInputStream(in);
   }
   return objIn;
}

This method returns an ObjectOutputStream for reading serialized objects. The user should use either this method or the getInputStream() method for reading data -- mixed access will produce undefined results.

public Object readObject() throws IOException, OptionalDataException, ClassNotFoundException {
   return getObjectInputStream().readObject();
}

This is a utility method to read the next serialized object from the objIn stream.

Class DbByteArrayOutputStream

DbByteArrayOutputStream is a small utility class that extends ByteArrayOutputStream in order to provide a way to write the buffer to a DataOutput without reallocating it.

package hamner.db;
import java.io.*;
/**
 * Extends ByteArrayOutputStream to provide a way of writing the buffer to
 * a DataOutput without re-allocating it.
 */
public class DbByteArrayOutputStream extends ByteArrayOutputStream {
  public DbByteArrayOutputStream() {
    super();
  }
  public DbByteArrayOutputStream(int size) {
    super(size);
  }
  /**
   * Writes the full contents of the buffer a DataOutput stream.
   */
  public synchronized void writeTo (DataOutput dstr) throws IOException {
    byte[] data = super.buf;
    int l = super.size();
    dstr.write(data, 0, l);
  }
}

Figure 6. DbByteArrayOutputStream

Class RecordsFileException

RecordsFileException is a simple exception class that indicates that an error has occurred while performing an operation on the records file.

  package hamner.db;
  public class RecordsFileException extends Exception {
    public RecordsFileException (String msg) {
      super(msg);
    }
  }

Figure 7. RecordsFileException

Using the RecordsFile class

The following code creates a new RecordsFile and inserts a test record that consists of a serialized Date. The records file is created with an initial capacity of 64. We can use this code in any application that needs to persistently store the last time that a particular resource was accessed.

RecordsFile recordsFile = new RecordsFile("testDatabase.jdb", 64);
RecordWriter rw = new RecordWriter("foo.lastAccessTime");
rw.writeObject(new Date());
recordsFile.insertRecord(rw);

Later in our application, when we wish to retrieve the record, the code would be as follows:

RecordReader rr = recordsFile.readRecord("foo.lastAccessTime");
Date d = (Date)rr.readObject();
System.out.println("last access was at: " + d.toString());

With this small amount of code, we have accomplished a rather important task: we have inserted an arbitrarily sized object into our file, and then retrieved it again.

When a new access to the resource takes place at a later time, we can update the contents of the record as follows:

RecordWriter rw = new RecordWriter("foo.lastAccessTime");
rw.writeObject(new Date());
recordsFile.updateRecord(rw);

Finally, if we need to remove the record from the database, we can use the deleteRecord() method:

recordsFile.deleteRecord("foo.lastAccessTime");

To see this code in action, check out the TestRecords class provided in the source distribution for this example.

Possible additions and optimizations

You could improve on our example in several ways. To improve performance you could restructure the operations that depend on iterating over the entire in-memory index by keeping a datastructure that holds pointers to the free space in the record heap. Another optimization would be to add a "quick insert" that always appends the record to the end of the file. This approach would be useful when an entire database is being constructed from scratch; that is, when there is never free space between record data.

Other improvements to reduce the number of file accesses and, hence, improve performance include caching recently accessed records, and using a buffer to load and store index entries as a single unit, rather than calling separate reads and writes for each field in the entry.

Conclusion

This example has shown that we can design and implement our own low-level file format to store records. Although it was designed as a means of illustrating the uses of the RandomAccessFile class, it has practical applications in a wide range of programming situations.

Here are a few possible examples that come to mind:

  • A persistent data cache: A RecordsFile could be used to cache arbitrary data in a file for applications such as Web browsers, Web servers, and proxy servers, among others.

  • A "properties" file for arbitrary objects: The Java properties file format, provided by class java.util.Properties is a very useful means of storing and retrieving name/value pairs. This example provides the same basic functionality, but allows the user to store binary data.

  • Any application where the data model is too simple to justify the use of a heavyweight database API such as JDBC.

Derek Hamner has extensive experience designing and implementing collaborative systems in Java. He is president of CSR Inc., a North Carolina startup that creates Internet classroom applications for distance learning and corporate training. Derek is also coauthor of Java Network Programming, Second Edition, which covers many related networking and I/O issues, including the Java 2 platform (formerly known as JDK 1.2), and is due for release January 1999.

Learn more about this topic

  • Download the complete source for this article as a zip file http://www.javaworld.com/jw-01-1999/step/jw-01-step.zip
  • Download the modified BaseRecordsFile.java http://www.javaworld.com/jw-01-1999/step/BaseRecordsFile.java
  • View the API documentation for class java.io.RandomAccessFile http://java.sun.com/products/jdk/1.2/docs/api/java/io/RandomAccessFile.html
  • Download the Java 2 platform (JDK 1.2) http://www.javasoft.com/products/jdk/1.2/
  • Read this primer on I/O streams in Java http://developer.java.sun.com/developer/technicalArticles/iostreams/
  • Read this intro to I/O from Sun's Java Tutorial http://www.javasoft.com/docs/books/tutorial/essential/io/index.html
  • Read all the other Step by Step columns http://www.javaworld.com/topicalindex/jw-ti-step.html
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more