|
|
Optimize with a SATA RAID Storage Solution
Range of capacities as low as $1250 per TB. Ideal if you currently rely on servers/disks/JBODs
As a full-text search engine, Lucene needs little introduction. Lucene, an open source project hosted by Apache, aims to produce high-performance full-text indexing and search software. The Java Lucene product itself is a high-performance, high capacity, full-text search tool used by many popular Websites such as the Wikipedia online encyclopedia and TheServerSide.com, as well as in many, many Java applications. It is a fast, reliable tool that has proved its value in countless demanding production environments.
Although Lucene is well known for its full-text indexing, many developers are less aware that it can also provide powerful complementary searching, filtering, and sorting functionalities. Indeed, many searches involve combining full-text searches with filters on different fields or criteria. For example, you may want to search a database of books or articles using a full-text search, but with the possibility to limit the results to certain types of books. Traditionally, this type of criteria-based searching is in the realm of the relational database. However, Lucene offers numerous powerful features that let you efficiently combine full-text searches with criteria-based searches and sorts.
The first step in any Lucene application involves indexing your data. Lucene needs to create its own set of indexes, using your data, so it can perform high-performance full-text searching, filtering, and sorting operations on your data.
This is a fairly straightforward process. First of all, you need to create an IndexWriter object, which you use to create the Lucene index and write it to disk. Lucene is very flexible, and there are many options.
Here, we will limit ourselves to creating a simple index structure in the "index" directory:
Directory directory = FSDirectory.getDirectory("index", true);
Analyzer analyser = new StandardAnalyzer();
IndexWriter writer = new IndexWriter(directory, analyser, true);
Next, you need to index your data records. Each of your records needs to be indexed individually. When you index records in
Lucene, you create a Document object for each record. For full-text indexing to work, you need to give Lucene some data that it can index. The simplest
option is to write a method that writes a full-text description of your record (including everything you may wish to search
on) and use this value as a searchable field. Here, we call this field "description."
You index a field by adding a new instance of the Field class to your document, as shown here:
Field field = new Field("field",
value,
Field.Store.NO,
Field.Index.TOKENIZED)
doc.add(field);
You have the option of specifying whether you want to store the value for future use (Field.Store.YES) or simply index it (Field.Store.NO). The latter option is useful for large values that you want to index, but do not need to retrieve later on.
The fourth parameter lets you indicate how you want to index the value. When you use Field.Index.TOKENIZED, the value will be analyzed, allowing Lucene to make better use of its powerful full-text indexing and search features. The
downside, as we will see, is that you cannot sort results on tokenized fields.
Archived Discussions (Read only)