Dodge the traps hiding in the URLConnection class

The URLConnection class's generic design causes snags when posting to a URL

A pitfall is Java code that compiles fine but leads to erroneous, and sometimes disastrous, results. Avoiding pitfalls can save you hours of frustration. In this article, I will present a pitfall you might encounter when posting to a URL, and another that plagues Java beginners.

Pitfall 5: The hidden complexity of posting to a URL

As the Simple Object Access Protocol (SOAP) and other XML remote procedure calls (RPCs) continue to grow in popularity, posting to a URL will become a more common and more important operation -- it is the method for sending the SOAP or RPC request to the respective server.

While implementing a standalone SOAP server, I stumbled upon multiple pitfalls associated with posting to a URL, starting with the nonintuitive design of the URL-related classes and ending with specific usability pitfalls in the URLConnection class.

A simple HttpClient class would be the most direct way to perform an HTTP post operation on a URL, but after scanning the java.net package, you'll come up empty. Some open source HTTP clients are available, but I have not tested them. (If you have tested those clients, drop me an email regarding their utility and stability.) Interestingly, there is an HttpClient in the sun.net.www.http package that is shipped with the JDK (and used by HttpURLConnection), but it is not part of the public API. Instead, the java.net URL classes were designed to be extremely generic and take advantage of dynamic class-loading of both protocols and content handlers. Before we jump into the specific problems with posting, let's examine the overall structure of the classes we will use (either directly or indirectly).

This UML diagram of the URL-related classes in the java.net package illustrates the classes' interrelatedness. (The diagram was created with ArgoUML -- see Resources for a link.) For brevity's sake, the diagram shows only key methods and no data members.

URL classes

Pitfall 5 centers on the main class: URLConnection. However, you cannot instantiate that class directly -- it is abstract. Instead, you will receive a reference to a specific subclass of URLConnection via the URL class.

Admittedly, the figure above is complex. The general sequence of events works like this: A static URL commonly specifies the location of some content and the protocol needed to access it. The first time the URL class is used, a URLStreamHandlerFactory singleton is created. That factory generates an URLStreamHandler that understands the access protocol specified in the URL. The URLStreamHandler instantiates the appropriate URLConnection class, which opens a connection to the URL and instantiates the appropriate ContentHandler to handle the content at the URL.

So what is the problem? Because of the classes' overly generic design, they lack a clear conceptual model. In his book, The Design of Everyday Things (Doubleday, 1990), Donald Norman states that one of the primary principles of good design is a sound conceptual model that allows us to "predict the effects of our actions." Some problems with the URL classes' conceptual model include:

  • The URL class is conceptually overloaded. A URL is merely an abstraction for an address or an endpoint. In fact, a better design would feature URL subclasses that differentiate static resources from dynamic services. Missing conceptually is a URLClient class that uses the URL as the endpoint to read from or write to.
  • The URL class is biased toward retrieving data from a URL. There are three methods that retrieve content from a URL, but only one that writes data to a URL. That disparity would be better served with a URL subclass for static resources that only has a read operation; the URL subclass for dynamic services would have both read and write methods. That design would provide a clean conceptual model for use.
  • Calling the protocol handlers "stream" handlers is confusing because their primary purpose is to generate (or build) a connection. A better model would emulate the Java API for XML Processing (JAXP), where a DocumentBuilderFactory produces a DocumentBuilder, which produces a Document. Applying that model to the URL classes would yield a URLConnectorFactory that generates a URLConnector that produces a URLConnection.

Now you are ready to tackle the URLConnection class and attempt to post to a URL. The goal is to create a simple Java program that posts some text to a common gateway interface (CGI) program. To test the programs, I created a simple CGI program in C that echoes (in an HTML wrapper) whatever passes into it. (See Resources to download the source code for any program in this article, including the CGI program.)

The URLConnection class has getOutputStream() and getInputStream() methods, just like the Socket class. Based on that similarity, you would expect that sending data to a URL would be as easy as writing data to a Socket. Armed with that information and an understanding of the HTTP protocol, we write the program in Listing 5.1, BadURLPost.java.

Listing 5.1 BadURLPost.java

package com.javaworld.jpitfalls.article3;
import java.net.*;
import java.io.*;
public class BadURLPost
{
    public static void main(String args[])
    {
        // get an HTTP connection to POST to
        if (args.length < 1)
        {
            System.out.println("USAGE: java GOV.dia.mditds.util.BadURLPost 
url");
            System.exit(1);
        }
        try
        {
            // get the url as a string
            String surl = args[0];
            URL url = new URL(surl);
            URLConnection con = url.openConnection();
            System.out.println("Received a : " + con.getClass().getName());
            con.setDoInput(true);
            con.setDoOutput(true);
            con.setUseCaches(false);
            String msg = "Hi HTTP SERVER! Just a quick hello!";
            con.setRequestProperty("CONTENT_LENGTH", "5"); // Not checked
            con.setRequestProperty("Stupid", "Nonsense");
            System.out.println("Getting an input stream...");
            InputStream is = con.getInputStream();
            System.out.println("Getting an output stream...");
            OutputStream os = con.getOutputStream();
            /*
            con.setRequestProperty("CONTENT_LENGTH", "" + msg.length());
            Illegal access error - can't reset method.
            */
            OutputStreamWriter osw = new OutputStreamWriter(os);
            osw.write(msg);
            osw.flush();
            osw.close();
            System.out.println("After flushing output stream. ");
            // any response?
            InputStreamReader isr = new InputStreamReader(is);
            BufferedReader br = new BufferedReader(isr);
            String line = null;
            while ( (line = br.readLine()) != null)
            {
                System.out.println("line: " + line);
            }
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

A run of Listing 5.1 produces:

E:\classes\com\javaworld\jpitfalls\article3>java -Djava.compiler=NONE 
com.javaworld.jpitfalls.article3.BadURLPost 
http://localhost/cgi-bin/echocgi.exe
Received a : sun.net.www.protocol.http.HttpURLConnection
Getting an input stream...
Getting an output stream...
java.net.ProtocolException: Can't reset method: already connected
        at 
java.net.HttpURLConnection.setRequestMethod(HttpURLConnection.java:10
2)
        at 
sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLCo
nnection.java:349)
        at 
com.javaworld.jpitfalls.article2.BadURLPost.main(BadURLPost.java:38)

When we try to obtain the HttpURLConnection class's output stream, the program informs us that we cannot reset the method because we are already connected. The Javadoc for the HttpURLConnection class contains no reference to setting a method. The program is referring to the HTTP method, which should be POST when we send data to the URL and GET when we retrieve data from the URL.

The getOutputStream() method causes the program to throw a ProtocolException with the error message "Can't reset the method." The JDK source code reveals that the error message results because the getInputStream() method has the side effect of sending the request (whose default request method is GET) to the Web server. This is similar to a side effect in the ObjectInputStream and ObjectOutputStream constructors, detailed in my book, Java Pitfalls: Time Saving Solutions and Workarounds to Improve Programs (John Wiley & Sons, 2000).

The pitfall is the assumption that the getInputStream() and getOutputStream() methods behave just as they do for a Socket connection. Since the underlying mechanism for communicating to the Web server actually is a Socket, it is not an unreasonable assumption. A better implementation of HttpURLConnection would postpone the side effects until the initial read or write to the respective input or output stream. You can do that by creating an HttpInputStream and an HttpOutputStream, which would keep the Socket model intact. You could argue that HTTP is a request/response stateless protocol, and the Socket model does not fit. Nevertheless, the API should fit the conceptual model; if the current model is identical to a Socket connection, it should behave as such. If it does not, you have stretched the bounds of abstraction too far.

In addition to the error message, there are two problems with the above code:

  • The setRequestProperty() method parameters are not checked, which we demonstrate by setting a property called stupid with a value of nonsense. Since those properties actually go into the HTTP request and are not validated by the method (as they should be), you must take extra care to ensure that the parameter names and values are correct.
  • Although the code is commented out, it is also illegal to attempt to set a request property after obtaining an input or output stream. The documentation for URLConnection indicates the sequence to set up a connection, although it does not state that it is a mandatory sequence.

If we did not have the luxury of examining the source code -- which should definitely not be a requirement to use an API -- we would be reduced to trial and error, the absolute worst way to program. Neither the documentation nor the API of the HttpURLConnection class afford us any understanding of how the protocol is implemented, so we feebly attempt to reverse the order of calls to getInputStream() and getOutputStream(). Listing 5.2, BadURLPost1.java, is an abbreviated version of that program.

Listing 5.2 BadURLPost1.java

package com.javaworld.jpitfalls.article3;
import java.net.*;
import java.io.*;
public class BadURLPost1
{
    public static void main(String args[])
    {
// ...
        try
        {
// ...
            System.out.println("Getting an output stream...");
            OutputStream os = con.getOutputStream();
            System.out.println("Getting an input stream...");
            InputStream is = con.getInputStream();
// ...
        } catch (Throwable t)
          {
            t.printStackTrace();
          }
    }
}

A run of Listing 5.2 produces:

E:\classes\com\javaworld\jpitfalls\article3>java -Djava.compiler=NONE 
com.javaworld.jpitfalls.article3.BadURLPost1 
http://localhost/cgi-bin/echocgi.exe
Received a : sun.net.www.protocol.http.HttpURLConnection
Getting an output stream...
Getting an input stream...
After flushing output stream.
line: <HEAD>
line: <TITLE> Echo CGI program </TITLE>
line: </HEAD>
line: <BODY BGCOLOR='#ebebeb'><CENTER>
line: <H2> Echo </H2>
line: </CENTER>
line: No content! ERROR!
line: </BODY>
line: </HTML>

Although the program compiles and runs, the CGI program reports that no data was sent! Why? The side effects of getInputStream() bite us again, causing the POST request to be sent before anything is placed in the post's output buffer, thus sending an empty POST request.

After failing twice, we understand that getInputStream() is the key method that actually writes the requests to the server. Therefore we must perform the operations serially (open output, write, open input, read) as we do in Listing 5.3, GoodURLPost.

Listing 5.3 GoodURLPost.java

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more