Validation with Java and XML Schema, Part 1

Learn the value of data validation and why pure Java isn't the complete solution for handling it

As technologies have matured and APIs for Java and other languages have taken more of the burden of low-level coding off your hands (JMS, EJB, and XML are just a few recent examples), business logic has become more important to application coding. With this increase in business logic comes an increase in the specification of data allowed.

Read the whole "Validation with Java and XML Schema" series:

For example, applications no longer just accept orders for shoes; they ensure that the shoe is of a valid size, in stock, and accurately priced. The business rules that must be applied even for a simple shoe store are extremely complex. The user input and the input combination must be validated; those data often result in computed data, which may have to be validated before it is passed on to another application component. With that added complexity, you spend more time writing validation methods. You ensure that a value is a number, a decimal, a dollar amount, that it's not negative, and on, and on, and on.

With servlets and JSP pages sending all submitted parameters as textual values (an array of Java Strings, to be exact), your application must convert to a different data type at every step of user input. That converted data is most likely passed to session beans. The beans can ensure type safety (requiring an int, for example), but not the value range. So validation must occur again. Finally, business logic may need to be applied. (Does Doc Marten make this boot in a size 10?) Only then can computation safely be performed, and results supplied to the user. If you're starting to feel overwhelmed, good! You are starting to see the importance of validation, and why this series might be right for you.

Coarse-grained vs. fine-grained validation

The first step in making your way through the "validation maze" is breaking the validation process into two distinct parts: coarse-grained validation and fine-grained validation. I'll look at both.

Coarse-grained validation is the process of ensuring that data meet the typing criteria for further action. Here, "typing criteria" means basic data constraints such as data type, range, and allowed values. These constraints are independent of other data, and do not require access to business logic. An example of coarse-grained validation is making sure that shoe sizes are positive numbers, smaller than 20, and either whole numbers or half sizes.

Fine-grained validation is the process of applying business logic to values. It typically occurs after coarse-grained validation, and is the final step of preparation, before one either returns results to the user or passes derived values to other application components. An example of fine-grained validation is ensuring that the requested size (already in the correct format because of coarse-grained validation) is valid for the requested brand. V-Form inline skates are only available in whole sizes, so a request for a size 10 1/2 should cause an error. Because that requires interaction with some form of data store and business logic, it is fine-grained validation.

The fine-grained validation process is always application-specific and is not a reusable component, so it is beyond the scope of this series. However, coarse-grained validation can be utilized in all applications, and involves applying simple rules (data typing, range checking, and so on) to values. In this series, I will examine coarse-grained validation and supply a Java/XML-based solution for handling it.

Data: Ever present, ever problematic

If you're still not convinced of the need for this sort of utility, consider the fact that data has become the commodity in today's global marketplace. It is not applications, not technology, not even people that drive business -- it is raw data. The tasks of selecting a programming language, picking an application server, and building an application are all byproducts of the need to support data. Thus, those decisions may all later be revisited and changed. (Ever had to migrate from SAP or dBase to Oracle? Ever switched from NetDynamics to Lutris Enhydra?)

However, the fundamental commodity, data, never changes. Platforms change, software changes, but you never hear anyone say, "Well, let's just trash all that old customer data and start fresh." So the problem of constraining data is a fundamental one. It will always be part of any application, in any language. And data is always problematic because of problematic users. People type too fast, type too slow, make a silly mistake, or spill coffee on their keyboards -- the bottom line is that validation is essential to preserving accurate data, and therefore is essential to a good application. With that in mind, I'll show you how people are solving that common problem today.

Current solutions (and problems)

Since data validation is so important, you'd probably expect there to be plenty of solutions for the problem. In reality, most solutions for handling validation are clumsy and not at all reusable, and result in a lot of code applicable only in specific situations. Additionally, that code often gets intertwined with business logic and presentation logic, causing trouble with debugging and troubleshooting. Of course, the most common solution for data validation is to ignore it, which causes exceptions for the user. Obviously, none of those are good solutions, but understanding the problems they don't solve can help establish requirements for the solution built here.

A big hammer

The most common way to handle data validation (besides ignoring it) is also the most heavy-handed. It involves simply coding the validation directly into the servlet, class, or EJB that deals with the data. In this example, validation is performed as soon as a parameter is obtained from a servlet:

Inline validation in a servlet

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
public class ShoeServlet extends HttpServlet {
    public void doGet(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException {
        // Get the shoe size
        int shoeSize;
        try {
            shoeSize = Integer.parseInt(req.getParameter("shoeSize"));
        } catch (NumberFormatException e) {
            throw new IOException("Shoe size must be a number.");
        }
        // Ensure viable shoe size
        if ((shoeSize <= 0) || (shoeSize > 20)) {
            throw new IOException("Invalid shoe size.");
        }
        // Get the brand
        String brand = req.getParameter("brand");
        // Ensure correct brand
        if (!validBrand(brand)) {
            throw new IOException("Invalid shoe brand.");
        }
        // Ensure correct size and brand
        if (!validSizeForBrand(shoeSize, brand)) {
            throw new IOException("Size not available in this brand.");
        }        
        // Perform further processing
    }
}

This code is neither cleanly separated nor reusable. The specific parameter, shoeSize, was presumably obtained from a submitted HTML form. The parameter is converted to a numeric value (hopefully!), then compared to the maximum and minimum acceptable values. This example doesn't even check for half sizes. In an average case where four or more parameters are received, the servlet's validation portion alone could result in more than 100 lines of code. Now imagine increasing that to 10 or 15 servlets. This approach results in a massive amount of code, often difficult to understand and poorly documented.

In addition to the code's lack of clarity, the business logic often mixes with the validation, making code modularization very difficult. In the following example, a session bean must not only perform its business task, but also ensure that the data are correctly formatted:

Inline validation in a session bean

import java.rmi.RemoteException;
public class ShoeBean implements javax.ejb.SessionBean {
    public Shoe getShoe(int shoeSize, String brand) {
        // Ensure viable shoe size
        if ((shoeSize <= 0) || (shoeSize > 20)) {
            throw new RemoteException("Invalid shoe size.");
        }
        // Ensure correct brand
        if (!validBrand(brand)) {
            throw new RemoteException("Invalid shoe brand.");
        }
        // Ensure correct size and brand
        if (!validSizeForBrand(shoeSize, brand)) {
            throw new RemoteException("Size not available in this brand.");
        }
        // Perform business logic
    }

An obvious problem here is that the only way to inform the calling component of a problem is by throwing an Exception, usually a java.rmi.RemoteException in EJBs. That makes fielding the exception and responding to the user difficult, at best. Of course, each business component that uses the shoeSize variable must perform the same validation, which could be wedged between different blocks of business logic.

This sort of "big hammer" solution doesn't help you in reusability, code clarity, or even reporting problems to the user. This solution, the most common method for handling data validation issues, should be used only as an example of what not to do in your next project.

A smaller hammer

Over time, some developers have seen the "big hammer" approach's problems. As servlets' popularity has increased, handling textual parameters has been recognized as a problem worth solving. As a result, utility classes that parse parameters and convert them to a specific data type have been developed. The most popular solution is Jason Hunter's com.oreilly.servlet.ParameterParser class, introduced in his O'Reilly book, Java Servlet Programming. (See Resources.) Hunter's class allows a textual value to be supplied, formatted into a specific data type, and returned. A portion of that class is shown here:

The com.oreilly.servlet.ParameterParser class

package com.oreilly.servlet;
import java.io.*;
import javax.servlet.*;
public class ParameterParser {
    private ServletRequest req;
    public ParameterParser(ServletRequest req) {
        this.req = req;
    }
    public String getStringParameter(String name)
        throws ParameterNotFoundException {
        // Use getParameterValues() to avoid the once-deprecated getParameter()
        String[] values = req.getParameterValues(name);
        if (values == null)
            throw new ParameterNotFoundException(name + " not found");
        else if (values[0].length() == 0)
            throw new ParameterNotFoundException(name + " was empty");
        else
            return values[0];  // ignore multiple field values
    }
    public String getStringParameter(String name, String def) {
        try { return getStringParameter(name); }
        catch (Exception e { return def; }
    }
    public int getIntParameter(String name)
        throws ParameterNotFoundException, NumberFormatException {
        return Integer.parseInt(getStringParameter(name));
    }
    public int getIntParameter(String name, int def) {
        try { return getIntParameter(name); }
        catch (Exception e) { return def; }
    }
    // Methods for other Java primitives
}

Two versions of the utility method are provided for each Java primitive data type. One returns the converted value or throws an exception if conversion fails, and another returns the converted value or returns a default if no conversion can occur. Using the ParameterParser class in a servlet significantly reduces the problems described above:

Using the com.oreilly.servlet.ParameterParser class in a servlet

import java.io.*;
import javax.servlet.*;
import javax.servlet.http.*;
import com.oreilly.servlet.ParameterParser;
public class ShoeServlet extends HttpServlet {
    public void doGet(HttpServletRequest req, HttpServletResponse res)
        throws ServletException, IOException {
        ParameterParser parser = new ParameterParser(req);
        // Get the shoe size
        int shoeSize = parser.getIntParameter("shoeSize", 0);
        // Ensure viable shoe size
        if ((shoeSize <= 0) || (shoeSize > 20)) {
            throw new IOException("Invalid shoe size.");
        }
        // Get the brand
        String brand = parser.getStringParameter("brand");
        // Ensure correct brand
        if (!validBrand(brand)) {
            throw new IOException("Invalid shoe brand.");
        }
        // Ensure correct size and brand
        if (!validSizeForBrand(shoeSize, brand)) {
            throw new IOException("Size not available in this brand.");
        }        
        // Perform further processing
    }
}

This is a better solution, but still clumsy; you can obtain the appropriate data type, but range checking is still a manual process. It also doesn't allow, for example, just a set of values to be permitted (such as allowing only "true" or "false," rather than any textual value). Trying to implement that sort of logic in the ParameterParser class results in a clumsy API, with at least four different variations for each data type.

1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more