Introduction to Hibernate Search

Bring the power of Lucene to your database-backed applications

1 2 3 4 5 6 7 Page 2
Page 2 of 7

Hibernate Search

Several key factors allow Hibernate and Lucene to align well by nature. They both provide CRUD access to the underlying data storage. They both define an elementary operational data unit -- the Entity (persistence model class) in Hibernate/JPA, and the Document in Lucene. And the same programming concepts coexist in Hibernate/JPA and Lucene -- deferred commit, filter, query expression, and query API are examples. To enable batch updates for better performance, Hibernate/JPA has a flush() method defined in its persistence context to synchronize cached data changes with the back-end database. The close() method of the Lucene IndexWriter class essentially works the same way -- it defers data synchronization between memory and storage to reduce I/O and minimize network latency.

Despite these similarities, the differences between Hibernate/JPA and Lucene are also obvious. Hibernate/JPA promotes domain model-oriented programming by encouraging developers to work out a rich domain-object graph that naturally represents the complexities of the real-world business through object association, inheritance, polymorphism, composition, and collections. Nevertheless, Lucene only deals with a single, built-in data model -- the Document class, which is too simple to describe those complex relationships.

The Hibernate team has recently introduced Hibernate Search as a higher level, universal API that encapsulates the virtues of both Hibernate/JPA and Lucene. Hibernate Search is an independent offering from the Hibernate team, and you must download it from the Hibernate Web site separate from the main Hibernate package. By mapping the application-specific persistence model classes to the Lucene Document class, Hibernate Search brings the power of Lucene full-text search to the persistence domain objects managed by Hibernate/JPA. The same persistence context (Hibernate Session/JPA EntityManager) is used for both domain-object persistence and Lucene indexing. Hibernate Search encloses Lucene indexing processes into the transaction contexts of Hibernate/JPA, and transparently manages the lifecycle of Lucene Document objects through the event handler mechanism of Hibernate Core. When auto-indexing is enabled, indexing processes become completely transparent to developers, and development around Hibernate Search thus becomes very easy. Note that developers are still required to learn the syntax of the Lucene query expression and query API in order to perform full-text searches against the persistence domain objects.

The easiest way to understand how Hibernate Search works in practice is through a sample application. In the following sections, you'll see an application designed on top of the latest Spring 2.5 application framework with annotation-driven configurations.

The sample application

Imagine that a startup IT consulting company has asked you to design and implement an application that maintains software developers' resumes in Microsoft Word format, and provides Web access to keyword search on those resume files. You can download a sample Web project partially implementing the requirements from the Resources section below. In the rest of the article, you'll walk through it and see how it works.

The sample application uses Maven 2 as its build tool, and MySQL as the back-end database. You need to download and install Maven 2 and MySQL to be able to build and test the application. A Maven 2 POM file located under the project root folder declares all the external dependencies of the application, including Hibernate, Hibernate EntityManager, Hibernate Search, Hibernate Annotations, JPA interfaces, Lucene, Spring, and Apache POI. If you are using the Eclipse IDE, Maven 2 provides a nice plugin (mvn eclipse:eclipse) that creates an Eclipse project file, so that the unzipped folder structure can be imported into the IDE as an already configured Web project. The same plugin also triggers the download of the external JAR dependencies into a Maven local repository, which is referenced by the project. Some of the dependencies have to be installed manually into the local repository, as they are not yet available at the POM-specified remote repository.

1 2 3 4 5 6 7 Page 2
Page 2 of 7