An AI tool for the real world

Knowledge modeling with Protégé

1 2 Page 2
Page 2 of 2

Additional support for the new Semantic Web language, OWL (Web Ontology Language), is under construction. You can also store huge ontologies into relational database tables. Some Protégé users have created ontologies with several hundreds of thousands of concepts.

Before you can use all of these services, you might need to install them from the plug-in Website. After you have done that, you will find additional file formats in the "Save in format..." dialog.

The compatibility to various formats allows you to reuse ontologies/domain models from other projects. In the online magazine example, you can reuse a computer science ontology someone else developed for the topical index. In fact, standard models for some domains already exist. Protégé projects include other projects, so you can build complex domain models out of basic building blocks.

Slot widgets

Slot widgets are graphical components such as text fields and combo boxes placed in Protégé's instance forms to view and edit a slot value. Protégé has numerous built-in slot widgets, including a sophisticated widget that displays instances and their relationships in two-dimensional graphs. Protégé's plug-in library contains additional slot widgets for specific data types such as calendar and date widgets, and components that display images, sounds, and videos.

Tab plug-ins

Tabs are GUI panels displayed as a tab in Protégé's main window. Protégé has several default tabs, including Figure 1's Classes tab, Figure 3's Instances tab, and Figure 4's Forms tab. You can enable and disable additional tabs in the Project/Configure... menu. In this menu, you can also activate tabs you download from the Protégé plug-in library. Some examples of additional tabs that currently exist follow:

Visualization tabs

  • Jambalaya provides a hierarchical ontology browser that allows for interactive editing of existing data. Its browser combines an advanced implementation of a hypertext navigation metaphor with animated panning and zooming motions over the nested graph, which provides continuous orientation and contextual cues for the user.
  • TGVizTab visualizes the classes from a model in interactive graphs, based on the popular TouchGraph library.
  • OntoViz provides a highly configurable graphical display of models in graphs similar to UML diagrams.

Project and file management tabs

  • BeanGenerator generates JavaBeans classes from a Protégé class model. The resulting beans can access Protégé domain models conveniently from your Java program, especially from intelligent software agents.
  • DataGenie enables Protégé to read arbitrary databases using the Java Database Connectivity (JDBC) interface. Generally, each database table becomes a class, and each attribute becomes a slot.
  • Prompt allows you to manage multiple domain models in Protégé, in particular to merge two models into one, to extract a part of a model, or to identify differences between a model's two versions.

Tabs for making queries and intelligent reasoning

  • Query tab is used to ask queries on the knowledge model, for example, to retrieve all articles that have a certain topic.
  • PAL constraint and query tabs provide a powerful front end for editing and evaluating expressions in the PAL. The EZPAL plug-in facilitates the acquisition of PAL expressions.
  • JessTab connects Protégé to the Java Expert System Shell (JESS), which is very useful to specify complex constraints and to define rules that derive new knowledge from existing knowledge.
  • Algernon performs forward and backward rule-based processing of Protégé knowledge bases and efficiently stores and retrieves information in ontologies and knowledge bases.
  • PrologTab is a SourceForge project that integrates a Prolog inference engine with Protégé knowledge bases.

These are some of the modules where Protégé can unfold its full AI support. With some training, you can use these plug-ins for the development of clever services. Using the JessTab, the online magazine software can automatically notify editors when a certain topic has not been covered for a long time. The system might also filter authors who have written about related topics. Of course you can implement these and similar services in pure Java, too. But Protégé makes it possible to provide an astonishing amount of features without coding. Many of these features are even accessible to nonprogrammers like your domain experts and customers.

The Protégé API

If the plug-ins list above is still not sufficient for your needs, you can easily build your own extensions. Protégé has an open source API that can implement or customize plug-ins and access Protégé models from standalone programs. Here's how it works:

Write plug-ins

A Protégé plug-in is essentially a Java class that subclasses a certain Protégé base class. Let's assume you want to add a new tab that sends an email to all authors in your system who have written about a certain topic. This tab could consist of a list where you select the topic and a button to send the mail. To do so, you just need to subclass the Protégé API class AbstractTabWidget. Since this class derives from JPanel, you add the list component and the button directly into it. Then you only need to put your new classes into a jar file and put this jar file into Protégé's plugin folder. That's it! You can activate the tab the next time you start Protégé. For details and example code, please check Protégé's Website.

The plug-in mechanism means you can include almost any other piece of Java software in Protégé, so that Protégé and your own system can share models at runtime.

Access Protégé models from Java applications

If you would rather have a standalone application without Protégé, the following code shows how easy it is to build Protégé applications. Let's assume you wish to access the online magazine's articles with a Java application. The basic program just prints all of the articles' titles and their topics. Later, you might extend this functionality to create a list of articles in HTML from a servlet, but I keep this example simple for now.

To get started, you must first include the protege.jar in your classpath. Then you import the classes and interfaces from the edu.stanford.smi.protege.model package, which provide access to Protégé models and project files:

import edu.stanford.smi.protege.model.*; 
import java.util.*; 

The class Project represents Protégé projects, and you use its constructor to load an existing project file, such as the example project. When the project loads without errors, you access the domain model with the getKnowledgeBase() method:

public class ArticlePrinter { 
    private static final String PROJECT_FILE_NAME = "..."; 
    public static void main(String[] args) {
        Collection errors = new ArrayList();
        Project project = new Project(PROJECT_FILE_NAME, errors);
        if (errors.size() == 0) {
            printArticles(project.getKnowledgeBase());
        }
        else {
            Iterator i = errors.iterator();
            while (i.hasNext()) {
                System.out.println("Error: " + i.next());
            }
        }
    }

Now you access the classes, slots, and instances from the model with the KnowledgeBase object. The Protégé class Cls represents classes, and you look up classes by their names. The getInstances() method delivers all instances of the given Protégé class. The getOwnSlotValue() methods get a given slot's value(s) for an instance:

    private static void printArticles(KnowledgeBase kb) {
        System.out.println("Articles:");
        Cls articleCls = kb.getCls("Article");
        Iterator articles = articleCls.getInstances().iterator();
        while (articles.hasNext()) {
            Instance article = (Instance) articles.next();
            String title = (String) article.getOwnSlotValue(kb.getSlot("title"));
            System.out.println("- " + title);
            Collection topics = article.getOwnSlotValues(kb.getSlot("topics"));
            Iterator it = topics.iterator();
            while(it.hasNext()) {
                Cls topic = (Cls) it.next();
                System.out.println("   topic: " + topic.getName());
            }
        }
    }

Note this example uses the generic Protégé API to access Protégé models. The Protégé class Article is stored as a Cls object, and its instances are stored as Instance objects. Some applications might enforce a closer mapping between the application and the Protégé model, so articles are stored as Java class Article instances. Protégé provides several mechanisms that generate such classes. For example, you can export your model to UML and then generate Java classes with tools like Poseidon for UML. Or you can let Protégé directly generate Java classes for it with the BeanGenerator.

Protégé in the real world

This article has given you an idea of what Protégé can do for you. Protégé is a Java tool that builds domain models. More and more software developers recognize that domain modeling is a crucial task in modern development methodologies. Recent approaches like the Model Driven Architecture (MDA) emphasize that such domain models should be designed and implemented on a high level of abstraction. In the MDA, you start with very general domain models that capture your domain concepts and business logic in an application-independent way. Then you translate these generic models into specific platforms, such as plain Java classes, Enterprise JavaBeans, or .Net components.

One of the MDA's basic assumptions is that UML diagrams can be better maintained and reused than Java code. AI technology suggests that knowledge models (ontologies) can be even better maintained and reused than UML diagrams. Protégé helps you rapidly define such models and their semantics, and automatically generates the necessary GUI elements so your domain experts can conveniently enter their knowledge. From there, let Protégé generate other models and integrate them into your Java application. Welcome to the real world!

Holger Knublauch holds a PhD in computer science and has worked in the area of knowledge modeling and applied AI since 1993. For his thesis, he developed a Java extension for knowledge modeling, including an extensible modeling tool platform. He currently works as a post-doctoral research fellow at Stanford Medical Informatics. In this position, he is responsible for various Protégé platform features, including the UML back end and support for Semantic Web languages like the forthcoming W3C (World Wide Web Consortium) standard OWL.

Learn more about this topic

1 2 Page 2
Page 2 of 2