All aboard for more efficient Web applications

The Train architecture dynamically batches user requests to improve server performance

Train is a design architecture that enables you to group multiple user requests into a single database or network query. Creator Edward Salatovka demonstrates Train's performance-boosting characteristics via a load test against a live implementation.

Imagine a railroad station operating in such a way that each passenger who buys a ticket immediately gets a train dedicated only to him! This modus operandi is absurd in real life, but is widely accepted in the world of Web application servers and data access applications. The conventional paradigm implies that each user request receives its own thread and database connection. Each user request requires an immediate dedicated trip to the database or other network resource. Obviously, there should be a smarter way of handling external traffic than buying extra hardware. Let's explore a simple but overlooked way of increasing your application's productivity.

In this article, we employ a new approach where each interaction with the database or network resource occurs on behalf of multiple users rather than only one, where negative effects of high concurrency like timeouts and deadlocks are greatly reduced, and where heavy traffic performance regression is almost negligible.

To be able use this approach—a paradigm I call Train—we must create an adequate environment for running our application and perform proof-of-concept tests.

Build a sandbox

To illustrate the advantage of the proposed architecture, we are going to build two simple, functionally equivalent servlets. Both deliver the same HTML pages with data retrieved from a sample database. Each servlet represents a different implementation—the conventional paradigm and the new one. To build our servlets, we need a Web application server, a database and a load-test runner. You are free to use your software of choice; my pieces are Tomcat 4, JMeter, and IBM'S DB2 Universal Database. Tomcat 4 and JMeter are open source applications and free. The choice of DB2 is just an attempt to imitate the commercial Web environment as closely as possible.

Populate the database with random content

Assuming the name of your database is "trdata," let's create the necessary schema:

 //schema.sql
connect to trdata;
create table trentry (ID integer not null , NAME char(25), DESCR varchar(128), views integer with default 0, constraint p_trentry  primary key (ID));

The simple Java application PropSamples populates table trentry with 250,000 rows of random content:

 

//PropSamples.java package train; import java.util.*; import java.sql.*; public class PropSamples {

public static String GetString(int size, Random rand) { StringBuffer strBuff = new StringBuffer(); for (int i = 0; i < size; i++) { char b = (char) (rand.nextInt(25) + 65); strBuff.append(b); } return strBuff.toString(); }

public void Process() throws SQLException { String sqlString = "insert into trentry values(?,?,?,?)"; Connection connection = Util.getDBConnection(); PreparedStatement stmt = connection.prepareStatement(sqlString); Random rand = new Random(); for (int i = 1; i <= 250000; i++) { stmt.clearParameters(); stmt.setInt(1, i); stmt.setString(2, GetString(25, rand)); stmt.setString(3, GetString(128, rand)); stmt.setInt(4, 0); stmt.execute(); if (i%1000 == 0) { connection.commit(); System.out.println(i + " rows committed"); } } } public static void main(String[] args) throws SQLException{ PropSamples propSamples = new PropSamples(); propSamples.Process(); } }

This code is nothing to write home about. Class Util (available in the source code, which is downloadable from Resources) contains DB2-related specifics that should change depending on the environment.

Build a conventional servlet

Here you go—the servlet at its best: each user request is serviced by a separate lightweight thread, database connections could have been taken from the pool, there is clear separation between model and view, and so on. See our conventional ClassicServlet below:

 

// ClassicServlet.java package train;

import javax.servlet.*; import javax.servlet.http.*; import java.io.*; import java.util.*; import java.sql.*;

public class ClassicServlet extends HttpServlet { int mEntryLength = 250000; Random mRand; public void init() throws ServletException { mRand = new Random(System.currentTimeMillis()); }

public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { PrintWriter out = response.getWriter(); Statement stmt = null; Connection connection = null; String name = ""; String descr = ""; int views = 0; int id = 0; boolean isError = false; synchronized (mRand) { id = mRand.nextInt(mEntryLength); } try { connection = Util.getDBConnection(); stmt = connection.createStatement(); String sqlStr = "Select id, name, descr, views from trentry where id =" + id; ResultSet rs = stmt.executeQuery(sqlStr); while (rs.next()) { //retrieves data from from DB name = rs.getString("NAME"); descr = rs.getString("DESCR"); id = rs.getInt("ID");

views = rs.getInt("VIEWS"); } Statement stmtViews = connection.createStatement(); String sqlViewStr = "update trentry set views = views+1 where id =" + id; stmtViews.executeUpdate(sqlViewStr); //updates number of the page views } catch (SQLException ex) {isError = true;} finally { try { if (stmt != null) { stmt.close(); } if (connection != null) { connection.commit(); connection.close(); } } catch (SQLException ex1) {isError = true;} } if (isError) { out.println("<html>System error</html>"); } else { //Delivers html page to browser out.println("<html>"); out.println("<p>ID: " + id + "</p>"); out.println("<p>Name: " + name + "</p>"); out.println("<p>Description: " + descr + "</p>"); out.println("<p>Views: " + views + "</p>"); out.println("</body></html>"); } }

}

I deployed the servlet, typed http://localhost:8080/train/classicservlet in my browser, and received the following page:

 ID: 178866
Name: XEIRVYPSFTNRXYEWQWSKOOPES
Description: JBGPGKSMDQKVXVPJCXKIMWLEWJABSGBNTOYRXRKUMDBWOYOCIAKDWGGEBHKIFONGSRBIBJIHSBNGEYIO
RKGFOVWYXYXXJKUBBLVBSKOKLFCHIGRUGROKESIJQFERWJTV
Views: 0

Not much is happening here, but this is a fair imitation of many routine commands in the e-commerce world. A servlet extracts some meaningless data from the database, updates number of views, and spits out the result. Please note that we are generating random keys to retrieve random rows from the trentry table. A good database management system usually caches the data. As a result, a second select statement against the same entry in the table would execute much faster. Our trick with random keys will keep our data "cold" and our following performance measurements accurate.

Conventional servlet performance

So, how well does our servlet perform? Let's ask JMeter. I set up JMeter to simulate 50 concurrent users. They access the servlet six times with intervals of five seconds.

To make Tomcat work under heavy traffic, I had to modify the Tomcat configuration file server.xml. Changed parameters for the connector are: maxProcessors="150" and acceptCount="150".

Figure 1 is a snapshot of the JMeter graph result. It illustrates the results of our performance measurements.

Figure 1. Conventional servlet performance. Click on thumbnail to view full-sized image.

The results are self-explanatory: Average time of the request execution is 2.2 seconds. (For sure, had we used precompiled statements and connection pooling, the result would have been a little better.) When I simulated just one user, the execution time was 70 milliseconds. Thus, you could conclude that heavy traffic causes significant regression in the conventional servlet's performance.

Train pattern implementation

Using our railroad station analogy, let's try to bring common sense to the servlet design. What if the user request (like a real passenger) must wait for a scheduled trip to the database and ride with other requests? That functionality can be easily achieved by using this JDBC (Java Database Connectivity) 2.0 code sequence:

 statement.addBatch();  //Load the first passenger
statement.addBatch(); // Load the second passenger
…
statement.executeBatch(); //Train is departing

This code is the general technique and decreases the number of the trips to the database.

A more efficient approach would combine multiple SQL statements into one. It would reduce not only the number of trips to the database, but also the number of queries. For example, two different select statements select * from trentry where id=333 and select * from trentry where id=266 could be replaced with select * from trentry where id in (333,266). As a result, two or more user requests are fulfilled by one SQL statement inside the one transaction scope.

In addition to the performance boost, another tremendous benefit results: this technique reduces deadlocks and timeouts! The fewer concurrent connections to the database we have, the fewer (depending on the isolation level) exclusive locks are acquired and fewer deadlocks and timeouts occur.

In some cases where it's impossible to combine several SQL statements into one (for instance, a plain update statement) we must use the statement.addBatch() technique.

The Train paradigm uses a combination of these two methods.

Code is better than word

The sequence diagram in Figure 2 and the list of steps below explain the design's basics.

Figure 2. Sequence diagram of the Train pattern. Click on thumbnail to view full-sized image.
  • TrainServlet instantiates the Job object, sends the Job object to the Dispatcher, and is suspended
  • Dispatcher groups Jobs into batches
  • For each batch, Dispatcher creates an instance of the class Worker
  • After a given time period or after a specific number of Jobs are received in the batch, the Worker generates SQL statements and interacts with the database
  • Each SQL statement performs tasks related to all Jobs in the batch
  • Worker interrupts the TrainServlet's thread
  • TrainSevlet delivers result to the users

Now we are going to dissect a real implementation of the suggested design. Let's look at the servlet that evolved from the conventional one:

 //TrainServlet.java
package train;
import javax.servlet.*;
import javax.servlet.http.*;
import java.io.*;
import java.util.*;
public class TrainServlet extends HttpServlet {
  int mEntryLength=250000;
  Dispatcher mDispatcher;
  Random mRand;
  public void init() throws ServletException {
    mRand = new Random(System.currentTimeMillis());
    mDispatcher = new Dispatcher();
    Thread ht = new Thread(mDispatcher); //Instantiate and execute in the separate thread.
    ht.start();
  }
  public void doGet(HttpServletRequest request, HttpServletResponse response) throws         ServletException, IOException {
    int id=0;
    synchronized(mRand){
      id=mRand.nextInt(mEntryLength);
    }
    Job job = new Job(String.valueOf(id)); //Each concurrent request creates job instance.
    job.mJobThread = Thread.currentThread(); //Job should know the thread of the request.
    PrintWriter out = response.getWriter(); //Job should know the output stream of the browser.
    job.mOut = out;
    mDispatcher.AddJob(job); //Job is sent to the dispatcher.
   //Dispatcher is a container for all concurrent jobs.
    try {
        Thread.sleep(100000); //Let's wait until database interaction is finished.
        System.out.println("Error: Request is timed out"); //Too bad. 100 seconds was not enough.
    }
    catch (InterruptedException ex2) {
            //Success! Members of the Job instances are populated.
    }
    job.Marshall(); // Let's display the page in the browser.
 }
}

This new servlet resembles the ClassicServlet, but with a twist: the user request is wrapped in the instance of the Job class, and the process of the interaction with the database is delegated to the instance of the Dispatcher class.

The Job class is shown below:

 //Job.java
package train;
import java.io.*;
public class Job {
  String mName;
  String mDescr;
  int mViews;
  String mID;
  PrintWriter mOut;
  Thread mJobThread;
  boolean mHasFailed = false;
  //Sorry, no getters and setters to save space
  
  public Job(String id) {
    mID = id;
  }
 
  public void Marshall(){ // displays html page
   mOut.println("<html><body>");
   if(mHasFailed){
     mOut.println("System error");
   }else{
   mOut.println("<p>ID: "+mID+"</p>");
   mOut.println("<p>Name: "+mName+"</p>");
   mOut.println("<p>Description: "+mDescr+"</p>");
   mOut.println("<p>Views: "+mViews+"</p>");
   }
   mOut.println("</body></html>");
  }
}

Again, Job is a pure representation of the user's request. The marshall() method displays information in the browser. Member mJobThread is a thread created by the Web application server to execute the request.

Class Dispatcher is important and simple:

1 2 Page 1
Page 1 of 2