Many Web applications exist to provide access to copious amounts of data stored in a relational database, but what's the easiest way to enable users to search through that data and find what they need? In this article, Dr. Xinyu Liu introduces Hibernate Search, which integrates the sophisticated search capabilities of Lucene with the familiar object-relational mapping framework of Hibernate.
Apache Lucene is a high-performance, extensible full-text search-engine library written in Java. At first, it may not be obvious why you'd need such a thing -- after all, your data is nicely filed away in a decent relational database. While an RDBMS can do a great job of providing transactional CRUD operations on data stored in a relational model, search functions defined in SQL are not always capable of meeting both the functional and non-functional requirements of your projects. There are a number of query types that RDBMSs in general do not support without vendor extensions:
- Fuzzy queries, in which "fuzzy" and "wuzzy" are considered matches
- Word stemming queries, which consider "take," "took," and "taken" to be identical
- Sound-like queries, which consider "cat" and "kat" to be identical
- Synonym queries, which consider "jump," "hop," and "leap" to be identical
- Queries on binary BLOB data types, such as PDF documents, Microsoft Word or Excel documents, or HTML and XML documents
More disappointingly, SQL search results are not ranked by match-relevance scores. The SQL standard is simply not intended for full-text querying.
Lucene search capabilities, on the other hand, are unlimited. Lucene handles all the queries just mentioned, and more; it also allows you to find text documents similar to other documents through its advanced term-vector query. For instance, you could search the content of a number of books to find one with content similar to that of Hibernate in Action. The analyzer architecture in Lucene leverages Java's built-in internationalization and localization capabilities, which makes full-text query available for various languages worldwide. Lucene delivers outstanding performance through some innovative techniques, such as an inverted index. The Apache Lucene Web site features a list of performance benchmarks that demonstrate how well Lucene performs and scales.
Note that some database vendors do implement full-text search functions in their products as SQL extensions. To some degree, these proprietary functions are quite easy to use, but they compromise the portability of your applications at the database level. Besides, the features are no match for the user experience that Lucene offers, and under extreme conditions Lucene's performance is superior.
Hibernate and the Java Persistence API
Hibernate is a high-performance, mature object-relational mapping (ORM) library. As a non-intrusive ORM solution, Hibernate provides object query APIs for plain old Java object (POJO) persistence model classes and automatic data bindings between the object and relational representations of persistence data. In essence, it lets you focus on domain model-oriented programming.
The Java Persistence API (JPA) is the standard object-relational mapping and persistence management interface defined as part of Java EE 5, the latest version of the enterprise Java specification. Largely inspired by Hibernate, JPA emerged to replace the controversial entity bean programming model. JPA has an easy-to-use POJO programming style and object query interface (JPAQL); one improvement of JPA over entity beans is that you do not need an EJB 3 container to run applications that use the API, because it supports both standalone (Java SE) and container-managed (Java EE) running modes. Popular JPA providers include Apache OpenJPA and Oracle TopLink, as well as Hibernate itself, which implements the JPA specification through the add-on Hibernate Annotations and Hibernate EntityManager modules. In this article, I'll use JPA/Hibernate as shorthand for the two working together.
This article presents the technology of Hibernate Search to you through a sample application programmed in a POJO style with the latest Spring 2.5 annotations. Before you begin, you should have basic knowledge of Spring, Hibernate/JPA, and Lucene.