Code-centric search tool strives to reduce Java development time

IBM's jCentral among the first of a new wave of targeted search engines

If you've spent hours, days, or even weeks or months searching for an obscure piece of Java-related information or a code example, you probably understand the frustration that such a quest involves. To find what you're looking for, you might try a number of Java informational sites, perhaps browsing manually through articles and archives, topic by topic. You scan the subject lines of scores of Usenet newsgroup articles. You peruse Java code directories, which contain dozens of code examples for just about everything but what you actually need. After plugging in as many keywords (and NOT-keywords) you can think of, you slog through myriad pages of general-purpose search engine results. You even resort to printed materials: books, magazines, old notes -- anything that might offer solutions to your Java development problems. Sometimes, if you're lucky, you eventually find what you're looking for. But often you don't. Perhaps the most frustrating aspect of development is knowing that some piece of necessary information is out there, but not knowing how to find it.

Your efforts to find Java resources may now reap more rewards -- and require less time. IBM jCentral, announced and showcased at the recent JavaOne Java developer conference in San Francisco, is an information-specific search engine for Java resources. In other words, jCentral is a search tool that finds only Java resources. And it finds all types of Java resources, including source code, JavaBeans, applets, and Java-related newsgroup articles and Web sites.

The jCentral Power search (Click image for full-size view)

Once the jCentral technology finds code in applets, beans, source code files, and newsgroup articles, it extracts the salient features of the code for indexing purposes. For example, when crawling a Java applet, jCentral analyzes the embedding HTML page and the applet class file to obtain information about the applet, such as all of its invoked methods. The information is subsequently indexed so that users can issue queries to find, say, all the Java applets that make a network connection by invoking methods from the class, or all the applets that contain a particular button, or a slider bar. Developers can use this specific code-searching technique on class methods, strings, and other snippets of useful Java code.

Because it is optimized for running Java-specific searches, jCentral represents an important new tool for the Java community. Internet development community leaders, such as the attendees of the seventh World Wide Web Consortium Conference in Australia, are voicing concerns about the growing ineffectiveness of monolithic search engines when used for specific purposes. As the Internet grows, the available information associated with any given keyword grows accordingly, which leads to general-purpose search engines becoming clogged with massive amounts of data -- data that is often irrelevant and useless to users. For instance, an English-only search for "Java" through Alta Vista (Digital Equipment Corp.'s popular general-purpose Internetwide search tool) uncovers more than 800,000 documents.

A predictable trend in response to this problem is to create vertical, information-specific search engines and data repositories. jCentral is one of the first vertical searching services. It acts like a code-specific Alta Vista -- or, as IBM Network Computing Software Division Webmaster and Product Manager Dirk Nicol describes it, jCentral is a "crawler with attitude." Since jCentral is such new technology, Nicol encourages JavaWorld readers to try out jCentral and send him feedback at "We created jCentral for one reason: to help Java developers," Nicol notes. "We are very anxious to continue to help Java developers write code."

Distinct approach focuses on code

If the data repositories of general-purpose search engines can become clogged with massive amounts of irrelevant data, can't this also happen with vertical, specific-purpose search engines, and if so, how can this be prevented? Nicol says that this problem is prevented by the very nature of how jCentral (and other specific-purpose search engines) works. General-purpose search engines find and categorize information based on text keywords and the metatags of Web pages, which are controlled (and often abused) by Web page authors. In contrast, jCentral analyzes Java code as well as text comments; thus it searches, weighs, and indexes the inherent attributes of code rather than just the textual descriptions of code.

jCentral is both a global resource and an internal organization/intranet resource. Whereas the search mechanism at is global, the jCentral search tool, IBM itself uses a private version, allowing in-house developers to search for Java materials within IBM's intranet. jCentral as a global resource is currently free and available to the public. (To date, IBM has not announced plans to license source code as a separate product to internal enterprise developers or to bundle it with another product.)

jCentral employs a combination of automated and manual approaches to growing the repository. This combination offers a distinct advantage over other Java-development services, which take only a manual approach to compiling resources.

The automated component operates by using a bank of IBM-patented 100 percent Java-based crawlers to search the Internet for Java materials and then adding the materials to the jCentral repository, analyzing and classifying the materials in the process. The repository currently houses approximately 150,000 items, including 40,000 applets and 60,000 pieces of Java code. jCentral also has added JavaWorld magazine's entire catalogue of articles and source code to the repository. The repository's growth rate is erratic and unpredictable, IBM notes, but averages several thousand additions per week.

The manual resource-compilation component of jCentral involves a more typical approach to growing a resource catalogue. Essentially, Java developers submit code to be included in the jCentral repository, and a small staff within the jCentral development team manually checks the code and approves it for inclusion. So far, the level of code-submission traffic isn't high or overwhelming. The number of Java developers is limited, so jCentral staff members don't expect to be as swamped with submissions as a general-public search engine like Yahoo! is sure to be.

The jCentral code-approval staff doesn't judge the quality of the code submitted or provide editorial content. This encourages developers to write abstracts and descriptions about their own code, so that users will be inclined to learn more about it and put it to use. Once the code is approved, metadata, which is the descriptive information about the output data (rather than the output data itself), is added to the jCentral repository.

Visual maps, e-mail notification

An impressive feature of jCentral is its ability to provide a class hierarchy diagram of a Java bean or component, which is a visual map of the code. (See the example Class Navigator image below.) When clicked, each node in the diagram provides relevant code and descriptive information in the box at the bottom. (To view a diagram, click the Map button next to the search result abstract. A good example to try is the bean keyword frame, because it uses a lot of the JDK.)

jCentral also offers an automatic e-mail notification service, which initiates a persistent query and periodically sends subscribers new results of a single search. Developers can use this feature to search for resources they need for development or find new instances of their own source code posted by others on the Internet. Combining both immediate searching and notification for future search results eliminates the burden of running constant searches for the same thing. This comes in handy if, for example, you're always looking for a new and improved animation bean. And jCentral users needn't worry about receiving unsolicited advertisements as a result of submitting an e-mail address to this service -- IBM notes that jCentral is not a marketing tool, so information about subscribers will not be used for any purpose beyond notifying subscribers of search results.

Avoid duplicate efforts; learn from others

jCentral lends itself to a plethora of purposes and potential uses. The most obvious use is to avoid duplicate efforts in code development. If someone has already created a code example or devised a tutorial that will save you time and energy, you might as well use it -- that is, if you can find it.

Finding appropriate examples and tutorials comes in handy when you're stuck on a line or section of code. For instance, if you need a good MD5 security algorithm or a specific invoked method (which typically is difficult to find), you can search for and evaluate such code using jCentral. Or, if you're developing Java for use with a database, you can run searches for examples that involve tier 3 (the back-end database), tier 2 (middleware), or tier 1 (the client). (For instance, a search on the class name would yield results related to connections to back-end databases.) Once you find code that will work for you, you can ask the author for permission to use it. You can also analyze how someone else developed a certain aspect of code to give you ideas on how to design your own. Just one successful use of jCentral can save you at least a couple of hours of development time.

Developers also can use jCentral to figure out if someone else is using their code. For example, Java developers at IBM periodically look up instances of code that use the IBM names class by searching for the IBM domain name with the reverse URL ( class). Prior to the release of jCentral, tracing authorized or unauthorized uses of your code proved difficult.

jCentral's impact extends well beyond the programming world; it has great potential for marketing purposes. Marketers and business developers can use jCentral to advertise their beans, applets, and other Java resources to the global Java community. In this way, jCentral can be used as a tool to build communities and facilitate commerce.

Because jCentral can find data beyond text, and it offers multitiered searching, you can search for a piece of code with specific attributes and properties. For example, you can find all applets with GridBagLayout, and then narrow those results using other specific parameters. jCentral also allows you to query specific parts of source code so that you can differentiate between source code and comments, such as when searching for author information embedded within code. You can also find an applet from a particular source or a specific domain; for example, if you are looking for a student who wrote a calculator applet, you can search for all calculator applets from all edu domains.

jCentral offers flexible and robust search features. A single query can run across all the types of information. With a single search, you can find the variety of ways that, for example, a GridBagLayout can be used -- in applets, source code, FAQs, newsgroup articles, and Web sites. After conducting the search, jCentral passes the code through a profile engine that was itself written in Java. And there is at least one additional (if unusual) use of jCentral: Manager of the IBM Almaden Research Center Web Technologies Department Dan Ford uses the tool to periodically find information about jCentral itself on external Web sites; in this way, he can monitor the public use and perception of his company's product.

It started as an internal IBM tool...

How did jCentral come into existence? jCentral is a child product of Grand Central Station (GCS), an IBM research project involving general-purpose search technology designed to search the Web for data formats beyond HTML. Impressively, the variety of formats that GCS can search and analyze includes relational databases such as SQL or ODBC, graphic image formats like MPEG or GIF, spreadsheets, compressed files in TAR and ZIP formats, and programming languages.

jCentral is the jewel in the GCS crown -- "one of the best examples showcasing the possibilities of GCS," says Qi Lu, Ph.D., research staff member of the IBM Almaden Research Center Web Technologies Department. Since jCentral is based on GCS, it's portable to other languages, such as C/C++ or Perl, but to date GCS/jCentral developers have not announced any plans for other code-specific search tools.

jCentral originally was developed to help the 2,500 Java developers within IBM avoid duplicate searching and development efforts among themselves. After bridging search crawler technology developed by the Computer Science Department of the IBM Almaden Research Center with source code analysis and visual mapping technology from the Haifa Research Lab in Haifa, Israel, Dirk Nicol first put jCentral to use by crawling the IBM intranet for Java code and information resources. Thus IBM's developers, located in various countries, could easily find and reuse each others' code. From this intranet prototype, Nicol and his associates performed testing and bug-fixing and continuously evaluated feedback from internal beta testers. After this internal development period, Nicol, with IBM's blessing, expanded the goal of the jCentral project to include serving the public.

1 2 Page 1
Page 1 of 2