The Gnutella file-sharing network and Java

Use the JTella API to easily develop applications that access Gnutella

I'll begin with some definitions. First, Internet file sharing is an activity performed by a community of connected users. The file-sharing system, at a minimum, allows users to share files, and to search for files within the community. Some file-sharing services offer additional capabilities, such as chatting. And some, like the notorious Napster, only allow users to share certain types of files, namely MP3s.

The Gnutella network supports sharing and searching of any file type, but does not offer any extra functionality, like chatting. Gnutella is a peer-to-peer system, with client software that also acts as a server -- software typically referred to as a servant.

Using Gnutella vs. the Internet

I will make a wild assumption that most of you are familiar with the Internet; specifically, with using an HTTP server to serve files to clients.

To publish files on a Website, you typically use FTP to transfer the documents to the Web server. Then, to make the document accessible to users, you might submit the URL to a search engine for crawling. This means downloading a document, examining it for keywords and such, and creating a searchable index in a database. Now, when a user uses an appropriate query on the Web server, he or she will receive information about the published document and its location.

When publishing files to a file-sharing service, you typically interact only with the servant program, which can access the service. The file-sharing service's servant is connected to Gnutella and is continuously responding to search queries from the network, eliminating the need for an intermediary search engine. The documents remain on the user's computer and are not transferred. Gnutella shares a document by copying the file to a shared directory and having the servant scan and index the file. Since the file-sharing system publishes and indexes documents, the user has much less work to do. Also, the user has the option to share files for a limited amount of time; simply removing the Gnutella servant from the network will end the session.

Origin of Gnutella

It has been widely reported that Gnutella was created by a group of developers at Nullsoft, a subsidiary of America Online. Not surprisingly, AOL put an end to the project. Later, the Nullsoft client's protocol was reverse-engineered and a group of developers on the Internet collaborated to further develop the system. Eventually, those developers produced a number of clients, written in various programming languages and targeted at different operating systems.

Network structure

This is where I would discuss the network's structure and describe its topology -- but there is none! Each servant on the network is connected to at least one other servant on the network; servants can also have both inbound and outbound connections to each other, forming a cyclic connection. However, there is no fixed layout or pattern to the nodes on the network.

So how do messages navigate this unordered environment? Each Gnutella message contains a unique ID, which is used to intelligently route messages through the network. For instance, as a servant forwards messages to its connected servants, it caches each message's ID in memory. When a response arrives, it uses that ID to route the response back to the original sender. Originally, message IDs were Windows's Globally Unique Identifiers (GUIDs), which was an issue for other platforms. But since the GUID is just a 16-byte value, non-Windows code can calculate its own unique data.

The Gnutella protocol

Like many Internet technologies, Gnutella benefits from a protocol specification that is available to the public. The protocol provides a compatibility point that allows Gnutella servants to communicate across operating systems and programming languages. As long as a Gnutella servant implements the protocol, it can participate in the file-sharing network.

First, you connect to a servant on the network by accessing a well-known servant like gnutellahosts.com:6346, which is almost always connected to the network. Several Websites, like gnutellahosts.com, indicate a host/port where you can locate a Gnutella servant. Some of those servants are host caches, a specialized software that will return a collection of hosts currently on the network. You can use this information to maintain a desired number of connections.

When a connection is made, the connecting servant sends a text string that resembles "GNUTELLA CONNECT/0.4\n\n". If the connection is accepted, the reply servant sends the text string "GNUTELLA OK\n\n". Notice the two new line characters; I wonder why it's not the usual "\r\n". Now you have a working connection; the rest of the protocol exchanges consist of mostly binary data.

Most servants attempt to maintain connections to multiple servants. To facilitate this, the protocol provides a mechanism to discover connected servants: the PING message. A servant that sends the PING message will receive response messages known as PONG messages. (No, you have not fallen into a twilight zone of 1970s video games.)

The PONG messages contain a payload that identifies the host and port number of an active Gnutella servant; this allows a servant to maintain a cache of available servants. PING messages represent a significant amount of network traffic. One possible improvement would be for a servant, before disconnecting from the network, to send a disconnect message to servants that had previously received a PONG message.

Once the servant has established connections to the network, you will probably begin searching for files. The search message contains the search criteria and the preferred download speed, which is meant to prevent responses from servants on a slow connection. In practice, the user often sets the download speed incorrectly, effectively preventing the filtering of servants on slow connections.

When a servant responds to a search message, it includes all of the information needed to retrieve the file, including the IP address and the port on which the server is listening for connections.

The file is transferred with HTTP, so all you need to get a file is a GET request. The servants can even resume partial downloads if the GET request ends before the file transfer is complete.

Finally, the protocol supports a specialized message for dealing with firewall issues. There is a special PUSH message you can use to forward a file that was previously found through a search request. The main idea is to communicate enough information, so the servant behind the firewall can establish the connection to the servant requesting the file, thus sidestepping the firewall.

Future enhancements to protocol

Future enhancements to the Gnutella protocol will be made in the areas of scalability and spam defense. As the network has grown, network traffic has increased, due to message broadcasts from individual servants. Some work has been done to create a set of guidelines for servants, in order to limit excessive message traffic without requiring a protocol modification.

Each Gnutella message contains two pieces of information that can help to address these issues. The first is Time to Live (TTL), a value that is set by the servant creating the message and is decremented each time it is forwarded. When the TTL reaches a value less than one, it should be dropped (not forwarded when received). The second value is the Hop count, which starts at zero and is incremented each time the message is forwarded.

The guidelines for servants center on dropping messages that have large TTL and Hop values. A message with a large TTL may be spam; a message with a large Hop value has already flowed over a large part of the network.

Defending against spam is much more difficult. Spam occurs on the network when a servant responds to all or most queries with a text message, instead of an actual file that the servant is serving. The text message usually advertises a Website, and since the spamming servant is responding to a search query, the message is also displayed on other clients that monitor searches.

XML protocol alternative

One could envision an alternative Gnutella-like protocol, based on XML, that could have a number of advantages over today's binary data protocol. For one, you could easily read messages, making it easier to develop software using the protocol. You could also validate messages with a validating XML parser, which would allow you to discard malformed messages. An XML protocol would also make byte-swapping unnecessary, due to the use of little-endian byte ordering. Since Java uses network order, you must swap bytes for some of the numeric values in the messages. One downside of an XML-based protocol is that messages would be larger than those on today's system.

JTella API

I haven't yet discussed anything Java-related, so you may be wondering what this article is doing in JavaWorld. Well, this article introduces JTella, an API designed to enable fast and easy development of Java applications and tools that access the Gnutella network. JTella is still in an early stage of development (version 0.1), but it can already do a few things. Of course, it can form and maintain connections to the network. Second, it offers a search-monitoring function that allows you to monitor searches received by a JTella servant. Third, it can send search queries over the network and process the results.

I'll now show you some code examples for using JTella. Two example applications are shown: one with code to monitor the search requests received, the other to send new search requests over the network. (See Resources for the source code.) Both examples accept two command-line parameters: the first provides the name of a host, the second provides the port used by the remote Gnutella servant. See Resources for several sources of this information.

Making a connection

The first step is forming some connections to the Gnutella network; in JTella, you use the NetworkConnection class. The following code excerpt shows the typical usage of the com.kenmccrary.jtella.NetworkConnection class.

      //-------------------------------------------------------------
      // Start a network connection and listen for successful connection
      //-------------------------------------------------------------
      NetworkConnection c = new NetworkConnection(args[0],
                                   Integer.decode(args[1]).intValue());
      c.addConnectionListener(new MonitorExample(c));
      c.start();

This code constructs a NetworkConnection that supplies an IP address/host name and the port used by the Gnutella servant that is listening on the remote machine. Internally, the NetworkConnection attempts to open a socket to this machine; if that succeeds, the Gnutella protocol handshake is exchanged. The NetworkConnection treats this servant as a host cache and sends a PING to it.

A host cache is a servant that has accumulated the locations of many active servants, and supplies a number of PONG responses when PINGed. In this manner, you can easily find many servants' IP address/ports. The NetworkConnection then sequentially attempts to open connections until it successfully connects to remote servants. In the future, the NetworkConnection will be enhanced to concurrently open connections and to open a connection on demand, with a method provided in JTella.

The next line of code registers a callback function for information on connection status. JTella can notify a class that is implementing the ConnectionListener interface of the number of connections currently open on the network. The example applications respond to the established connection by performing an operation on the network, monitoring searches, and sending out new search requests. The final line just initiates the running of a new Java thread behind the scenes.

Monitoring searches

After receiving the callback indicating that a connection has been established, the search monitor initiates a monitoring session. The code for that process is shown below:

    // When an active connection is made run the example
    if ( 1 == event.getConnectionInformation().getCount() &&
         !monitoring )
    {
      System.out.println("CONNECTED");
      SearchMonitorSession monitor = conn.getSearchMonitorSession(new TestReceiver());
      monitoring = true;
    }

The NetworkConnection constructed earlier can provide a monitor session when given a class that implements the callback interface, MessageReceiver. The session will then enable monitoring of all search requests sent to the JTella servant. In later versions of JTella, the "session" concept should be expanded to include pausing and closing functions.

The MessageReceiver callback interface provides a method for receiving the monitor results. The method is shown below:

public void receiveSearch(SearchMessage searchMessage);

Each time the JTella servant receives a search query from the network, this method will be called. The SearchMessage provides information about the nature of the query, such as the search criteria. In the simple example provided, the criteria are merely printed to the console.

Searching

The code to implement a search request is very similar to monitoring all search queries. The code excerpt below shows an example -- again, after achieving a connection to the network.

1 2 Page 1