JDBC scripting, Part 1

Introducing JudoScript, a functional scripting language

JDBC (Java Database Connectivity) is a standard Java API for SQL database accesses. What does JDBC scripting mean? It means using JDBC for manipulating SQL databases in a way comparable to and beyond what native database clients do. JDBC scripting is an attractive, practical matter, because SQL databases are and will be the dominant data sources in any enterprise system. It is also important to note that all major RDBMS (relational database management system) vendors provide pure-Java JDBC drivers for their own products, so a capable JDBC scripting tool can easily take advantage of that capability. All you need is a JDBC URL and the JDBC driver jar file in the classpath. You can connect to multiple databases simultaneously for multisource, heterogeneous data processing.

Philosophically, JDBC scripting is part of a grandiose idea: using the Java platform, in addition to Java's traditional role as an object-oriented language, to regularly perform practical computing tasks. The JudoScript language was designed for this idea.

Today, enterprise software handles not only relational databases, but also richer data formats such as XML and SGML, abstract data types such as Enterprise JavaBeans (EJB), Web services, as well as old friends such as flat files and spreadsheets. Spreadsheets are so popular in business—they are touted as the language of business. Traditionally, and naturally, people use various tools for data querying, reporting, and processing. Wouldn't it be nice to have a unified tool for all data-processing needs? The benefits are obvious:

  • You can do any processing with any data sources of any format simultaneously
  • Correct solutions and results are faster to reach
  • You have less tools to learn, install, configure, and launch
  • A single, free tool is cheaper to own
  • The problem solving process is mentally less taxing because you can focus on the problem at hand without constant worries over environmental issues, context switching, and integration of heterogeneous components

Java looks like a candidate, but it falls short as a practical tool because as a system language, Java is designed for creating object-oriented software systems; it is too verbose and detail-oriented for practical uses. Various Java scripting languages may help; they can "script" Java components and are easier to code, but they are just as programming-centric and detail-oriented as Java or Pascal.

A good scripting tool should have abstract functional features for people to intuitively specify today's common computing tasks and, at the same time, not lose programming power. This is the design goal of the JudoScript language.

Pure SQL is so limited, simple query tools are weak in capability. JudoScript has many powerful data structures and is fully potent to script Java components. You can use JudoScript to easily collect data from database A, compute some intermediate result in memory, and finish the task with database B. If you use Oracle, for example, you will need to use PL/SQL, database links, and possibly temporary tables. If database B is not an Oracle database, things can quickly spin out of control. JudoScript can easily deal with flat files, spreadsheets, XML, SGML, EJBs, etc. It is the synergy of all these capabilities that makes JudoScript's JDBC scripting so cool and useful, especially for J2EE applications.

This two-part series introduces JDBC scripting with JudoScript and its related features. You will discover how much you can easily accomplish with nothing but JudoScript and the underlying Java platform. In Part 1, I discuss JudoScript's support for JDBC scripting.

Note: You can download the source code that accompanies this article from Resources.

Read the whole series on JDBC scripting:

JDBC support in JudoScript

JudoScript has dedicated syntactic support for running SQL. The data definition language (DDL) and data manipulation language (DML) statements can be run individually or in groups or batches. You can prepare parametized SQL statements and bind parameters to the prepared statement to run; you can call stored procedures with in-, out- and in-out parameters. Let's see a simple example:

connect to 'jdbc:oracle:thin:@dbsvr:1521:dbname', 'user', 'pass';
// Create table and insert a few rows.
executeSQL {
  CREATE TABLE emp(emp_no      INTEGER PRIMARY KEY,
                   first_name  VARCHAR(100),
                   last_name   VARCHAR(100),
                   birth_date  DATE,
                   salary      NUMBER);
  CREATE INDEX emp_fname ON emp(first_name);
  CREATE INDEX emp_lname ON emp(last_name);
  INSERT INTO emp(emp_no,first_name,last_name,birth_date,salary)
  VALUES(100, 'Jim', 'Billups', to_date('1954-1-3','yyyy-mm-dd'), 86500.0);
  INSERT INTO emp(emp_no,first_name,last_name,birth_date,salary)
  VALUES(101, 'Linda', 'Jordan', to_date('1980-7-24','yyyy-mm-dd'),45250.0);
}
// Query and print out rows.
executeQuery qry:
  SELECT emp_no, first_name, last_name, salary
    FROM emp
    WHERE salary < 50000
    ORDER BY salary ASC
;
while qry.next() {
  println '#', qry[1], '  ', qry.last_name, ', ', qry.first_name, ': ',
qry.salary;
}
disconnect(); // From database

The first thing to note is that this script is all about business! This program connects to the database, runs a number of SQL DDL and DML statements, and finally executes a query to print the results in the table. In the executeQuery statement, the qry variable is fundamentally a java.sql.ResultSet instance, so you can call its next() method. Columns in a row can be accessed via their names or indices (starting at 1). The next example demonstrates how values can be bound; it omits the connection parts:

// Prepare a SQL
prepare qry:
  SELECT emp_no, first_name, last_name, salary
    FROM emp
    WHERE salary < ?
    ORDER BY salary ASC
;
// Run the query
executeQuery qry with @1:number = 5000.0;
while qry.next() {
  println '#', qry[1], '  ', qry.last_name, ', ', qry.first_name, ': ',
qry.salary;
}

The @1:number syntax is used to bind parameter values. If no type is appended, it is assumed to be a string.

You can dynamically create SQL statements as well:

// Construct an IN list for the query
last_names = [ 'Olajuwon', 'Yao' ]; // an array
in_expr = last_names.toCsv(',', function(x){ return "'"+x+"'"; });
// Now run the query
exeucteQuery qry:
  SELECT * FROM emp WHERE last_name IN ( (* in_expr *) )
;

The (* *) syntax within SQL statements can take any expression. In the code above, we used Array's toCsv() method and supplied an anonymous function to quote the values. The resultant SQL statement is: SELECT * FROM emp WHERE last_name IN ( 'Olajuwon','Yao' ).

We have seen examples that dynamically construct SQL statements and run SQL directly. Another mode of running SQL is batch, either statically (by using executeBatch instead of executeSQL) or dynamically. I don't discuss batch running in detail in this series.

Next, we dive into the particulars of major JDBC scripting topics. Keep in mind that JDBC scripting (actually, JudoScript itself) is intended for everyone's use; because of the nature of this audience, I relate more to the underlying Java classes.

Database connections

You obtain a database connection like this:

connect mycon to 'jdbc:oracle:thin:@dbsvr:1521:dbname', 'user', 'pass';

The obtained connection is stored in the variable mycon. If the connection variable name is omitted, JudoScript uses a predefined global variable, $$con, which is typically used for single-connection programs. You can specify connection attributes like this:

connect mycon ( autoCommit=false ) to
  'jdbc:oracle:thin:@dbsvr:1521:dbname', 'user', 'pass';

How does JudoScript load the JDBC driver? JudoScript maintains a list of known JDBC driver class names and their corresponding URL prefixes. For instance, when JudoScript sees oracle in jdbc:oracle:..., it loads the Oracle JDBC driver class oracle.jdbc.driver.JdbcDriver. If a driver is absent from the list, it can be specified as a driver attribute or loaded via the old-fashioned Java style:

// JudoScript style
connect ( driver='my.db.jdbc.driver' ) to 'jdbc:mydb:...', '', '';
// Java style
(java::Class).forName('my.db.jdbc.driver');

The connection object is fundamentally a java.sql.Connection object, (or more precisely, a wrapper object of java.sql.Connection, so you can access all public methods of that Java class.) Sometimes you can obtain a connection, say, via a Java method call; it can be used just like one obtained through the connect statement. The use clause in all SQL execution statements designates which connection to use:

executeSQL use mydb { ... }

Once the database operations are complete, you should call the connection object's disconnect() method. The disconnect() system function calls the name-sake method of the global $$con object. A connection object also has many other methods, such as commit() and rollback().

Directly execute queries and query results

Individual DML statements, which include SELECT, INSERT, UPDATE, and DELETE, are run in these modes:

Table 1. Modes of running SQL

Running modeQueryUpdate
Run directlyexecuteQuery executeUpdate
Prepare and runprepare executeQuery prepare executeUpdate

Let's first look at how to run queries and process the results.

Run queries

The general syntax for executeQuery in BNF (Backus Naur Form) is:

  executeQuery variable [ ( attributes ) ] [ use variable ] :
    sql_statement ; [ with bind_list ; ] | executeQuery variable with bind_list ;

The syntax really includes two forms of the executeQuery statement; the second form runs a prepared statement stored in the variable, which I explain later. Both forms return result sets.

The use variable clause indicates which database connection to use, where the variable holds a connection object. sql_statement can be any query statement that returns a result set. SELECT is the most common query SQL statement, but is not the only one. Many JDBC drivers support their RDBMS's own commands that also return result sets, such as MySQL's show commands.

The query object (stored in the variable immediately following the executeQuery keyword) also represents the result set. It is a built-in object that has many methods. The most frequently used method is next(), which loops through the result set. For the query object's role as result set, columns in each row can be referenced either by indices (starting at 1) or by column names. JudoScript converts the returned values to JudoScript data types:

executeQuery qry:
  SELECT emp_no, first_name, last_name, birthday, salary
    FROM emp
    WHERE salary < 50000
    ORDER BY salary ASC
;
while qry.next() {
  println '#', qry[1], '  ', qry[2], ', ', qry[3],
          ' (', qry[4].fmtDate('yyyy-MM-dd'), '): ', qry.salary;
}

In the example above, the first column is an integer, the second and third are strings, the fourth is a date, and the last is a number. Because the fourth column is a date, we can call its fmtDate() method.

Beyond SQL

The power of JDBC scripting is not only its ability to run SQL statements, but also its synergy of SQL and computation. Suppose we have a Web server log stored in a table like this:

CREATE TABLE web_log (
  uri      VARCHAR(1500),
  referer  VARCHAR(1500),
  time     TIMESTAMP
);

We need to report the numbers of hits per day, per week, and per month. The daily and monthly reports may be possible with SQL if the RDBMS supports functions in the GROUP BY clause; the idea is to convert the time stamp into a string and use that in GROUP BY. But the weekly report is tricky. Let's see how JudoScript handles this:

 1: executeQuery qry: SELECT time from web_log;
2:
3: // Step 1. Collect weekly counts
4: counts = new Object;
5: while qry.next() {
6:   time = qry.time;
7:   token = time.year + '-week-';
8:   if time.weekOfYear < 10 {
9:     token += '0';  // Fill in a 0 for single digit numbers.
10:   }
11:   token += time.weekOfYear; // The week-token for the day.
12:   ++ counts.(token);
13: }
14:
15: // Step 2. Print out weekly counts
16: for wk in counts.keysSorted() {
17:   println wk, ':', counts.(wk) :>8; // Right-aligned, 8-digits
18: }

The keys to this simple solution are the built-in data structure Object and the Date value's features. Object is essentially a map; it stores name-value pairs. On lines 12 and 16, the .() operator gets and sets the value for a key. In lines 6-11, we construct a weekly token for the time and increment the count in the Object. If the key did not exist, the value would be null or, numerically, 0. On line 15, we obtain the keys sorted by natural text order (for this reason, we make all week numbers two digits in lines 8-10.) The result looks like this:

2004-week-06:    8438
2004-week-07:   21409
2004-week-08:   34940
2004-week-09:  128343
2004-week-10:   99827
2004-week-11:   78343
2004-week-12:   30968
2004-week-13:   44021

I further discuss the data structures and general programming techniques in Part 2.

Query result methods

The query object has some useful methods, as shown in Table 2.

Table 2. Methods of the result set object

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more