Page 2 of 7
A predictable trend in response to this problem is to create vertical, information-specific search engines and data repositories. jCentral is one of the first vertical searching services. It acts like a code-specific Alta Vista -- or, as IBM Network Computing Software Division Webmaster and www.ibm.com/java Product Manager Dirk Nicol describes it, jCentral is a "crawler with attitude." Since jCentral is such new technology, Nicol encourages JavaWorld readers to try out jCentral and send him feedback at nicold@us.ibm.com. "We created jCentral for one reason: to help Java developers," Nicol notes. "We are very anxious to continue to help Java developers write code."
If the data repositories of general-purpose search engines can become clogged with massive amounts of irrelevant data, can't this also happen with vertical, specific-purpose search engines, and if so, how can this be prevented? Nicol says that this problem is prevented by the very nature of how jCentral (and other specific-purpose search engines) works. General-purpose search engines find and categorize information based on text keywords and the metatags of Web pages, which are controlled (and often abused) by Web page authors. In contrast, jCentral analyzes Java code as well as text comments; thus it searches, weighs, and indexes the inherent attributes of code rather than just the textual descriptions of code.
jCentral is both a global resource and an internal organization/intranet resource. Whereas the search mechanism at ibm.com/java is global, the jCentral search tool, IBM itself uses a private version, allowing in-house developers to search for Java materials within IBM's intranet. jCentral as a global resource is currently free and available to the public. (To date, IBM has not announced plans to license source code as a separate product to internal enterprise developers or to bundle it with another product.)
jCentral employs a combination of automated and manual approaches to growing the repository. This combination offers a distinct advantage over other Java-development services, which take only a manual approach to compiling resources.
The automated component operates by using a bank of IBM-patented 100 percent Java-based crawlers to search the Internet for Java materials and then adding the materials to the jCentral repository, analyzing and classifying the materials in the process. The repository currently houses approximately 150,000 items, including 40,000 applets and 60,000 pieces of Java code. jCentral also has added JavaWorld magazine's entire catalogue of articles and source code to the repository. The repository's growth rate is erratic and unpredictable, IBM notes, but averages several thousand additions per week.
The manual resource-compilation component of jCentral involves a more typical approach to growing a resource catalogue. Essentially, Java developers submit code to be included in the jCentral repository, and a small staff within the jCentral development team manually checks the code and approves it for inclusion. So far, the level of code-submission traffic isn't high or overwhelming. The number of Java developers is limited, so jCentral staff members don't expect to be as swamped with submissions as a general-public search engine like Yahoo! is sure to be.