Newsletter sign-up
View all newsletters

Enterprise Java Newsletter
Stay up to date on the latest tutorials and Java community news posted on JavaWorld

Sponsored Links

Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs

The Lucene search engine: Powerful, flexible, and free

Easily add searching to your application with Lucene

  • Print
  • Feedback
Don't let the low version number -- 0.04 as of August 2000 -- fool you. The Lucene search engine is a robust, powerful, and flexible search toolkit, ready to tackle many common search problems. And since it's now available under the more flexible LGPL open source license, the price (free!) is right too.

Doug Cutting, an experienced developer of text-search and retrieval tools, created Lucene. Cutting is the primary author of the V-Twin search engine (part of Apple's Copland operating system effort) and is currently a senior architect at Excite. He designed Lucene to make it easy to add indexing and search capability to a broad range of applications, including:

  • Searchable email: An email application could let users search archived messages and add new messages to the index as they arrive.
  • Online documentation search: A documentation reader -- CD-based, Web-based, or embedded within the application -- could let users search online documentation or archived publications.
  • Searchable Webpages: A Web browser or proxy server could build a personal search engine to index every Webpage a user has visited, allowing users to easily revisit pages.
  • Website search: A CGI program could let users search your Website.
  • Content search: An application could let the user search saved documents for specific content; this could be integrated into the Open Document dialog.
  • Version control and content management: A document management system could index documents, or document versions, so they can be easily retrieved.
  • News and wire service feeds: A news server or relay could index articles as they arrive.


Of course, many search engines could perform most of those functions, but few open source search tools offer Lucene's ease of use, rapid implementation, and flexibility.

I first used Lucene when developing Eyebrowse, an open source Java-based tool for cataloguing and browsing mailing lists. (See Resources for a link.) A core requirement for Eyebrowse was flexible message search and retrieval capability. It demanded an indexing and search component that would efficiently update the index base as new messages arrived, allow multiple users to search and update the index base concurrently, and scale to archives containing millions of messages.

Every other open source search engine I evaluated, including Swish-E, Glimpse, iSearch, and libibex, was poorly suited to Eyebrowse's requirements in some way. This would have made integration problematic and/or time-consuming. With Lucene, I added indexing and searching to Eyebrowse in little more than half a day, from initial download to fully working code! This was less than one-tenth of the development time I had budgeted, and yielded a more tightly integrated and feature-rich result than any other search tool I considered.

How search engines work

Creating and maintaining an inverted index is the central problem when building an efficient keyword search engine. To index a document, you must first scan it to produce a list of postings. Postings describe occurrences of a word in a document; they generally include the word, a document ID, and possibly the location(s) or frequency of the word within the document.

  • Print
  • Feedback

Resources