Use JGraph to create a Wikipedia browser

Get started with an open-source, Swing-based library for creating graphs

Jeroen van Bergen introduces JGraph, a popular open source library for creating graphs in Java applications. Creating a Wikipedia browser is an easy way to get started with JGraph's Swing-MVC programming model. Once you're familiar with JGraph you can consider the many other uses for graphs in your Java applications, including some that apply to Java application development.

Most application developers know that a picture can say more than a thousand words. A graphical representation of data, for instance, can be easier to grasp than the raw data itself. Graphs are used in many fields to convey the meaning of data in a compact form with high impact. The ability to display information in graph format is an essential component of many Java applications.

In this article I'll show you how to use JGraph, an open source, Swing-based library for creating graphs. You can use JGraph to create graphs for almost any kind of application, and it is applicable to both desktop and server-side Java applications. As an exercise, I'll walk you through the process of creating a small browser that shows a graph of a lemma and its links to other lemmata on Wikipedia. (A lemma is the word under which a set of related dictionary or encyclopedia entries appears.) When a user clicks on a lemma in the Wikipedia browser it is displayed along with its links.Once you've built the Wikipedia browser and have a good grasp of how JGraph works, I'll suggest some of the ways graphs could actually be useful to Java developers; including a graphing application to simplify the creation and management of XML configuration files.

The basic JGraph library is available for free under the LGPL or Mozilla Public License. You should download it now so that you can build the Wikipedia browser along with me. To install JGraph you'll simply need to add its jar file to your classpath. No additional libraries are needed to use JGraph. Note that this article assumes you are somewhat familiar with Swing programming, although you need not be an expert.

The Wikipedia browser

The actual implementation of the Wikipedia browser is simple, though it enables a user to explore the relationships between Wikipedia lemmata. The browser application can be broken down to the following subtasks:

  1. Read a set of configuration data.
  2. Retrieve the raw data.
  3. Extract the data to display.
  4. Represent the actual data in the form of a graph.
  5. Interact with the user to allow exploration.

In the first three steps you are laying the groundwork for the Wikipedia browser by setting up the data to be graphed. In the last two steps you create the graph and add its functionality. I'll go through all the steps in the sections that follow.

Read, retrieve, and extract the data

You will start by creating a Properties object to hold the configuration data. The object can easily be instantiated from a file as shown in Listing 1.

Listing 1. Instantiating the Properties object

// Load the configuration
     Properties configuration = null;
      try {
         configuration = new Properties();
             configuration.load(new FileInputStream(args[0]));
         } catch (IOException e) {
             // handle exception
         }

Note that the configuration contains an optional description of a proxy server and two items describing the first URL to retrieve. One of these is a URL prefix that is used to create a URL from just the name of an article. The other item is the name of the first lemma to be retrieved. Once you know where to get the data, you can issue an HTTP GET request to do so. In Listing 2 I've used the Apache Jakarta Commons HttpClient to retrieve data from the Wikipedia server.

Listing 2. Using HttpClient to retrieve data

HttpClient httpClient = new HttpClient();

urlPrefix = configuration.getProperty("url_prefix");
        if (configuration.getProperty("proxyhost") != null) {
            httpClient.getHostConfiguration().setProxy(configuration.getProperty("proxyhost"), Integer.parseInt(configuration.getProperty("proxyport")));
        }
String url = urlPrefix + name;
GetMethod getMethod = new GetMethod(url);
        try {
            System.out.println("Fetching URL " + url + "...");
            httpClient.executeMethod(getMethod);
        } catch (IOException e) {
            // handle exception
            throw e;
        }
        BufferedReader responseReader = null;
        StringBuilder content = new StringBuilder();
        try {
            responseReader = new BufferedReader(new InputStreamReader(getMethod.getResponseBodyAsStream()));
            String currentLine = null;
            while ((currentLine = responseReader.readLine()) != null) {
                content.append(currentLine);
            }
            responseReader.close();
        } catch (IOException e) {
            // handle exception
        }

The content variable now holds the raw HTML of the Wikipedia lemma. Your next step is to extract the data you want. Because you are interested only in the links from one lemma to other lemmata, you need to isolate the links. Furthermore, not all links are interesting because some of them point to help pages, talk pages, or other internal Wikipedia resources. You'll need to filter these so that they don't clutter the view. The configuration in Listing 3 holds a list of Strings to filter out unwanted links.

Listing 3. Using Strings to filter out unwanted data

String[] ignoreParts;
     ignoreParts =configuration.getProperty("ignore").split(",");
     List<String> lemmaLinks = new ArrayList<String>();
     int currentIndex = 0;
     // All links to other lemmata start with "/wiki Note the double quote, it distinguishes internal from interwiki links.
     while (lemma.indexOf("\"/wiki", currentIndex) != -1) {
         int start = lemma.indexOf("\"/wiki", currentIndex) + 7;
         int end = lemma.indexOf("\"", start);
         String lemmaLink = lemma.substring(start, end);
         currentIndex = end;
         boolean ignoreCurrentLink = false;
         for (String linkPart : ignoreParts) {
             if (lemmaLink.contains(linkPart)) {
                 ignoreCurrentLink = true;
                     break;
                 }
         }
         if (! ignoreCurrentLink) {
             lemmaLinks.add(lemmaLink);
         }
     }
     return lemmaLinks;

Finally, you create a Lemma object to hold the data. This object is just a value object, it is not capable of performing any actions on the lemma. You will use the Lemma object as the basis for the graph. The Lemma object has the public interface shown in Listing 4.

Listing 4. Public interface Lemma

public interface Lemma {
   String getName();
   List<String> getLinks();
}

Note that the interface represents the data you want to display. For now you want to display only the relationship between this lemma and other lemmata, so you're not interested in the actual text of the lemma. Not displaying the text makes it easer to show the relationships between lemma and lemmata. As an exercise, you might try adding another pane to this program, that displays the textual content of the lemma.

1 2 3 Page 1
Page 1 of 3