Take control of the servlet environment, Part 2

Alternatives to servlet session management

In Part 1 of this series, we introduced the Rudimental Servlet Extension Framework (RSEF) and delved into its bowels, exposing the potential power of intercepting communications between your servlets and the servlet engine. In Part 2, we will introduce a concrete implementation of one of the wrappers to show you how to extend and use the power of the framework. That example will allow you to take control of session management, facilitating a flexible plug-and-play mechanism. You can switch from client-stored sessions, to in-memory server sessions, to persistent database sessions without hacking your existing servlets. Each flavor solves a unique problem in the Web-based application genre.

TEXTBOX: TEXTBOX_HEAD: Take control of the servlet environment: Read the whole series!

What is a session?

The interaction between a Web browser and a Web server is stateless. The browser connects to the server, requests a single piece of information, then disconnects, at which point the server completely forgets about the transaction and awaits the next request. Sessions are traditionally used to create a state for Web-based communications. Essentially, they are dictionaries of name-value pairs that persist from one request to the next.

Behind the scenes, most servers store the session data in memory and map it to a respective browser via a special cookie. When the browser connects to the server for the first time, the server assigns it a unique identification code and tells the browser to save that code as a cookie. Any future requests from the browser include the cookie, which the server uses to look up any session data stored in memory.

Alternatives to storing session data include encoding the session ID into all of the URLs on the page being served or using the client's IP address, but those options are either too complex or unreliable. URL encoding requires that you visit each link via code, which is cumbersome and, if you are using a template system, impractical. Using the client's IP isn't reliable because the client might be behind a proxy that allows multiple machines to share a single IP to the outside network.

What is wrong with sessions?

Nothing is wrong with the concept of sessions, but the way the server handles sessions can produce problems. Storing session data in memory precludes the effective use of load balancing on a farm of Web servers. If a browser were directed to a different server each time it connected, it would have multiple sessions in existence on each server it visited. And, of course, those sessions would not synchronize with each other, thus leading to complete pandemonium.

The most common solution is to use a load-balancing mechanism that makes a browser sticky to a particular server. The load-balancing mechanism remembers which browsers visited which servers and ensures that they keep returning to the same place. Figure 1 illustrates the separate copies of each session across the farm of Web servers.

Figure 1. Session management can become futile

Making a client sticky to a single server turns the server into a single point of failure. If the server crashes, the client loses all its session data. If the server undergoes an excessive load, and thus responds slowly, the user experience degrades. In addition, it is entirely possible that a majority of abnormally active users will coincidentally be routed to a single server, overburdening that server more than its brethren.

An alternative

The obvious solution, to anybody familiar with multitier applications, is to move the sessions out of the Web servers and into a central point of reference. So, no matter which Web server in the farm a browser connects to, it will receive the same session in the same state following the last request. This central point of reference could be a database (JDBC), a remote object (Remote Method Invocation/Enterprise JavaBeans), a naming server (JNDI), or even a cookie (assuming the browser can handle cookies). Figure 2 shows the session relocated from the Web server to the database.

Figure 2. Session management centralized

In Part 2, we will use a database for storing session data.

The database

To plug database-stored sessions into the RSEF, we have to implement a version of SessionWrapper that reads from and writes to a database.

As stated above, a session is just a dictionary of name-value pairs, so our database table needs to reflect that. The table also needs a way to remember which session each data pair belongs to. Here's what the table looks like:

SESSION_MASTER
--------------
SESSION_ID  CHAR(50)    KEY
NAME        CHAR(50)    KEY
VALUE       CHAR(1024)
DATE        DATETIME

(We assume that you have a basic knowledge of database theory, as the subject expands beyond the scope of this article. In addition, for purposes of brevity and simplicity, we will ignore the issues of char versus varchar versus clob or blob fields in relation to indexing, keys, etc.)

The DATE column time-stamps the session entries, thus accommodating expiration or garbage-collection services. You'll see how the following code examples handle this column, but it will not be part of the overall discussion.

Now that we have laid the foundation for storing session data, we need to code the logic for utilizing the database. This logic will be neatly packed into our first wrapper.

The wrapper

The concept of a wrapper is simple. RSEF allows us to intercept any references to the session object via the SessionWrapper. So, whenever a servlet needs to put data into a session, retrieve data from a session, or complete any function related to the session's data, we intercept the request and map it to our database table.

The first thing we need to do is create our wrapper class:

(Note: The following code examples use the SQLUtil tool discussed in "Clever Facade Makes JDBC Look Easy," Thomas Davis (JavaWorld, May 1999). The class name has changed to JdbcFacade since the original publication. The tool hides much of the code bloat required by the JDBC API; it also hides connection pooling behind the scenes. Regardless of whether you've read the article or not, you should be able to understand what the code is doing.)

package net.rudiment.servlet.session.database;
public class SessionWrapper extends net.rudiment.servlet.SessionWrapper
{
    private static final long expiration = net.rudiment.util.Times.oneDay;
    private JdbcFacadeFactory _factory;
    public SessionWrapper( RequestWrapper request, ResponseWrapper
response, HttpSession session, JdbcFacadeFactory factory )
    {
        super( request, response, session );
        this._factory = factory;
        load();
    }
}

We won't go into the details of the JdbcFacadeFactory; all you need to know is that it produces instances of JdbcFacade, which are used to communicate with the database. Since the factory is required in the constructor, your bootstrap servlet (see Part 1) must provide it. Here's how your SessionWrapperFactory, from the bootstrap, might look:

    new SessionWrapperFactory()
    {
        public SessionWrapper wrapSession(
            RequestWrapper request,
            ResponseWrapper response,
            HttpSession session )
        {
            return(
                new net.rudiment.servlet.session.database.SessionWrapper(
                    request,
                    response,
                    session,
                    new JdbcFacadeFactory()
                    {
                        public JdbcFacade getInstance() throws SQLException
                        {
                            return( new com.xyzzy.util.JdbcFacade() );
                        }
                    }
                )
            );
        }
    }

You may also disregard the expiration variable. It garbage-collects old session data, but that too is outside the scope of this article.

So what does that enigmatic load() method do? It loads all the session values from the database into memory. Traditionally, a bunch of strings constitute data stored in the session. But, since the servlet API supports it, data placed into the session may be any arbitrary object. And to store an object into the database, it must be serialized. Likewise, when data is retrieved, it must be deserialized. Since serialization also extends beyond the scope of this discussion, all of the magic disappears behind the Serialize object in the following code. All you need to know is that Serialize returns a string (not byte array) representation to any object. The wrapper passes each data piece loaded from the database up to its superclass -- which, for purposes of this article, we assume will store in volatile memory. Here's the code:

    protected void load()
    {
        JdbcFacade util = null;
        try
        {
            util = this._factory.getInstance();
            util.setSQL( "select name, value from session_master " +
                         "where session_id = ? and date >  ?" );
            util.setString( 1, getId() );
            util.setDate( 2, new Date( System.currentTimeMillis() - 
expiration ) );
            ResultSet rset = util.executeQuery();
            while( rset.next() )
            {
                String name = rset.getString( "name" );
                Object obj = Serialize.objectFromString( rset.getString( 
"value" ) );
                if( obj != null )
                {
                    super.putValue( name, obj );
                }
            }
        }
        catch( SQLException e )
        {
            System.err.println( e );
        }
        finally
        {
            if( util != null )
            {
                util.close();
            }
        }
    }

Why do we load them all at once, rather than wait for them to be requested? Isn't that a waste of time and memory? I'm glad you asked. From the get point of view, it does seem rather inefficient. The wrapper loads all of the data even though none of it might be requested; in which case, it wastes the time required to load the data and the memory required to store the data. But you need to step back and look at the whole picture. How does this appear from the set point of view? If I put a piece of data into the session, the wrapper serializes the data and writes it to the database. If I request that data within the same execution context of the servlet, the data must be loaded back out of the database and deserialized. This transaction becomes expensive as the number of puts and gets increase in frequency.

But, performance isn't the only downside. A nasty and obscure bug hides beneath the code. Take, for example, the following code that might appear somewhere in one of your servlets:

    Date date = new Date( yesterday );
    session.putValue( "date", date );
    date.setTime( tomorrow );

And then this code in another servlet:

    out.println( session.getValue( "date" ) );

Which value appears on the page: yesterday or tomorrow? tomorrow should appear. Though sloppy, we can legally change the date object's state after placing it into the session. If our wrapper had immediately serialized that object and written it to the database, the retrieval code would have seen the value of yesterday. We'll admit it, an early draft of RSEF contained such a bug.

In the end, we don't override putValue or getValue. We only access the database twice: once to load all the data and once to store the data. But how in the world does the session object know when to write all the data back into the database? Certainly you wouldn't be so callous as to force the programmer to call a save() method at the end of each servlet? Of course not. save() does exist, but the programmer need not know about it. It is called at the end of the RSEF's version of the service() method in its HttpServlet class:

    try
    {
        ((SessionWrapper)wrappedRequest.getSession( true )).save();
    }
    catch( ClassCastException e )
    {
        e.printStackTrace();
    }

We check for a class cast exception, because possibly the session wasn't wrapped at all.

The save() method simply iterates through all the data stored in the superclass and writes each entity to the database. Rather than keep track of which values are already in the database (an exercise for the reader), it simply attempts to insert each one, and upon failure, assumes a primary key constraint violation and reverts to an update:

1 2 Page 1