|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
Doug Cutting, an experienced developer of text-search and retrieval tools, created Lucene. Cutting is the primary author of the V-Twin search engine (part of Apple's Copland operating system effort) and is currently a senior architect at Excite. He designed Lucene to make it easy to add indexing and search capability to a broad range of applications, including:
Of course, many search engines could perform most of those functions, but few open source search tools offer Lucene's ease of use, rapid implementation, and flexibility.
I first used Lucene when developing Eyebrowse, an open source Java-based tool for cataloguing and browsing mailing lists. (See Resources for a link.) A core requirement for Eyebrowse was flexible message search and retrieval capability. It demanded an indexing and search component that would efficiently update the index base as new messages arrived, allow multiple users to search and update the index base concurrently, and scale to archives containing millions of messages.
Every other open source search engine I evaluated, including Swish-E, Glimpse, iSearch, and libibex, was poorly suited to Eyebrowse's requirements in some way. This would have made integration problematic and/or time-consuming. With Lucene, I added indexing and searching to Eyebrowse in little more than half a day, from initial download to fully working code! This was less than one-tenth of the development time I had budgeted, and yielded a more tightly integrated and feature-rich result than any other search tool I considered.
Creating and maintaining an inverted index is the central problem when building an efficient keyword search engine. To index a document, you must first scan it to produce a list of postings. Postings describe occurrences of a word in a document; they generally include the word, a document ID, and possibly the location(s) or frequency of the word within the document.