Wizard API updated!
Tim Boudreau has released a new version of the Swing Wizard library (version 0.997) that fixes the WizardException bug reported in JavaWorld's recent Open Source Java Project profile. The article's examples have been reworked to test out the new, improved WizardException. Thanks, Tim, for this helpful fix!
Open Source Java Projects: The Wizard API

Newsletter sign-up

Sign up for our technology specific newsletters.

Enterprise Java
View all newsletters

Email Address:

Use search engine technology for object persistence

How a seemingly unrelated technology can help solve some typical problems

Java developers are often required to provide a simple persistence mechanism for their Java classes. Many of the problems related to that task fall into this great gray area where property-file-based persistence is simply not enough, but database-oriented persistence (and related object-relational mapping) is definite overkill.

The typical solution is to create a simple datastore using object serialization or XML binding. Attractive as it is, this solution often does not scale enough to handle even a few thousand objects, especially in terms of providing decent search performance.

So how would one create a datastore capable of persisting a significant quantity of JavaBeans, and provide speedy search and retrieval of those objects without resorting to a relational database or complex memory caching schemes?

In this article, I show you how to approach this problem from a different angle: by treating individual objects as documents being indexed by an Internet search engine. I demonstrate how to index individual attributes of standard JavaBeans using a popular third-party indexing/searching library and how to quickly retrieve those attributes from storage. The usual database API consisting of find, retrieve, store, and delete methods is provided.

As an example of real-world applications for this approach, I describe a Unix-like permission system (users and groups) for the plain Java objects.

The API

Essentially, we are developing a library—something other programmers will hopefully use in their own projects. A good library starts with a good interface. It is imperative to design the interface (or interfaces) first, before a single line of "real" code is written. So what do we want our library to do?

Obviously, we want it to store and retrieve Java objects. One thing to keep in mind is the fact that our objects will be actually persisted on disk to be retrieved later, long after the program that created them is gone. We need something that helps us differentiate one object instance from another. Thus, all objects that flow through our library must have a unique object ID. It would also be convenient if we could immediately identify whether an object is compatible with our library. The simple way to achieve this goal is to have all storable objects implement this interface:

 public interface StorableInterface {
    public String getObjectId();
    public void setObjectId(String id);
}



Now, let's think about what our actual storage service will do. We use the following interface to define it:

   public void store(StorableInterface object) throws StorageException;
    public StorableInterface retrieve(String objectId) throws StorageException;
    public StorableInterface[] find(String key, String value) throws StorageException;
    public StorableInterface[] find(Class clazz) throws StorageException;
    public StorableInterface[] find(String query) throws StorageException;    
    public void delete(String objectId) throws StorageException;
    public void delete(String key, String value) throws StorageException;



As you can see, we provide the methods to:

  • Store a Storable object to disk.
  • Retrieve an object given its objectId.
  • Return an array of storable objects matching a given property name (key) and value (so we can execute queries like "firstName" = "Joe").
  • Return all objects of a given class.
  • Return all objects matching some free-form query. (We are cheating a little bit with this one. For now, we are just talking about queries like "firstName = "Joe" and "lastName" = "Smith". If later we implement more complex functionality, a free-form query will accommodate it as well.)
  • Delete an object by objectId.
  • Delete several objects matching a given property name and value.


We wrap (and possibly log) any exception thrown by the meat of our methods and rethrow it as a custom StorageException.

This API is simple but powerful. Many applications can benefit from having it available. Now let's see how we can actually implement such a service using search engine technology.

Lucene to the rescue

Fundamentally, the problem at hand is the storage of large quantities of arbitrary information and its subsequent retrieval. A technology has emerged to address this need—a search engine. Probably the most important aspect of today's computing is the remarkable interaction between a user and an Internet search engine such as Google. Suddenly, mountains of data are at your fingertips. The advance of search technology has certainly captured the mindshare of software developers, and numerous solutions have popped up to initiate the addition of Internet-like search capabilities to everyday applications.

One of the most mature, successful, and celebrated search engine toolkits available to today's Java programmer is Jakarta Lucene. According to its Website, "Lucene is a high-performance, full-featured text search engine library written entirely in Java." I have used Lucene on several occasions and am continually amazed at the speed and accuracy of the implementation. Lucene deserves articles (and books!) on its own, so I won't discuss the details of its use here. Let's just say that Lucene allows programmers to index arbitrary context and later find and retrieve the references to it.

If you want to index something with Lucene, first you need to create a Writer:

 writer = new IndexWriter("where_out_index_is_stored", new StandardAnalyzer(), true);



Then you need to create an instance of the org.apache.lucene.document.Document object, one that represents something you want to index and search for later, be it a Webpage, a text document, or anything else:

 Document doc = new Document();



You populate your Document by adding some Fields to it. A Field represents a property of a Document being indexed. When creating a Field, you must supply a name and a value:

  Field field = new Field("name", "value", true, true, true);
   doc.add(field);
   writer.addDocument(doc);



(Note: Boolean values tucked at the end of the method call are related to how Lucene will treat our Field, but not relevant to our discussion).

At this point, we are ready to search Lucene for our documents. The following simple code snippet illustrates the process:

1 | 2 |  Next >

Discuss

Start a new discussion or jump into one of the threads below:

Subject Replies Last post
. Use search engine technology for object persistenc
By JavaWorldAdministrator
19 03/02/06 05:29 PM
by Anonymous
. You could also try...
By Anonymous
3 03/01/06 11:02 PM
by Anonymous


Resources