Processing command line arguments in Java: Case closed

Facilitate command line argument processing for Java tools with a simple helper class

Many Java applications started from the command line take arguments to control their behavior. These arguments are available in the string array argument passed into the application's static main() method. Typically, there are two types of arguments: options (or switches) and actual data arguments. A Java application must process these arguments and perform two basic tasks:

  1. Check whether the syntax used is valid and supported
  2. Retrieve the actual data required for the application to perform its operations

Often, the code that performs these tasks is custom-made for each application and thus requires substantial effort both to create and to maintain, especially if the requirements go beyond simple cases with only one or two options. The Options class described in this article implements a generic approach to easily handle the most complex situations. The class allows for a simple definition of the required options and data arguments, and provides thorough syntax checks and easy access to the results of these checks. New Java 5 features like generics and typesafe enums were also used for this project.

Command line argument types

Over the years, I have written several Java tools that take command line arguments to control their behavior. Early on, I found it annoying to manually create and maintain the code for processing the various options. This led to the development of a prototype class to facilitate this task, but that class admittedly had its limitations since, on close inspection, the number of possible different varieties for command line arguments turned out to be significant. Eventually, I decided to develop a general solution to this problem.

In developing this solution, I had to solve two main problems:

  1. Identify all varieties in which command line options can occur
  2. Find a simple way to allow users to express these varieties when using the yet-to-be-developed class

Analysis of Problem 1 led to the following observations:

  • Command line options contrary to command line data arguments—start with a prefix that uniquely identifies them. Prefix examples include a dash (-) on Unix platforms for options like -a or a slash (/) on Windows platforms.
  • Options can either be simple switches (i.e., -a can be present or not) or take a value. An example is:

    java MyTool -a -b logfile.inp
    
  • Options that take a value can have different separators between the actual option key and the value. Such separators can be a blank space, a colon (:), or an equals sign (=):

    java MyTool -a -b logfile.inp
    java MyTool -a -b:logfile.inp
    java MyTool -a -b=logfile.inp
    
  • Options taking a value can add one more level of complexity. Consider the way Java supports the definition of environment properties as an example:

    java -Djava.library.path=/usr/lib ...
    
  • So, beyond the actual option key (D), the separator (=), and the option's actual value (/usr/lib), an additional parameter (java.library.path) can take on any number of values (in the above example, numerous environment properties can be specified using this syntax). In this article, this parameter is called "detail."
  • Options also have a multiplicity property: they can be required or optional, and the number of times they are allowed can also vary (such as exactly once, once or more, or other possibilities).
  • Data arguments are all command line arguments that do not start with a prefix. Here, the acceptable number of such data arguments can vary between a minimum and a maximum number (which are not necessarily the same). In addition, typically an application requires these data arguments to be last on the command line, but that doesn't always have to be the case. For example:

    java MyTool -a -b=logfile.inp data1 data2 data3    // All data at the end
    

    or

    java MyTool -a data1 data2 -b=logfile.inp data3    // Might be acceptable to an application
    
  • More complex applications can support more than one set of options:

    java MyTool -a -b datafile.inp
    java MyTool -k [-verbose] foo bar duh
    java MyTool -check -verify logfile.out
    
  • Finally, an application might elect to ignore any unknown options or might consider such options to be an error.

So, in devising a way to allow users to express all these varieties, I came up with the following general options form, which is used as the basis for this article:

<prefix><key>[[<detail>]<separator><value>]

This form must be combined with the multiplicity property as described above.

Within the constraints of the general form of an option described above, the Options class described in this article is designed to be the general solution for any command line processing needs that a Java application might have.

The helper classes

The Options class, which is the core class for the solution described in this article, comes with two helper classes:

  1. OptionData: This class holds all the information for one specific option
  2. OptionSet: This class holds a set of options. Options itself can hold any number of such sets

Before describing the details of these classes, other important concepts of the Options class must be introduced.

Typesafe enums

The prefix, the separator, and the multiplicity property have been captured by enums, a feature provided for the first time by Java 5:

public enum Prefix {
  DASH('-'),
  SLASH('/');
  private char c;
  private Prefix(char c) {
    this.c = c;
  }
  char getName() {
    return c;
  }
}
public enum Separator {
  COLON(':'),
  EQUALS('='),
  BLANK(' '),
  NONE('D');
  private char c;
  private Separator(char c) {
    this.c = c;
  }
  char getName() {
    return c;
  }
}
public enum Multiplicity {
  ONCE,
  ONCE_OR_MORE,
  ZERO_OR_ONE,
  ZERO_OR_MORE;
}

Using enums has some advantages: increased type safety and tight, effortless control over the set of permissible values. Enums can also conveniently be used with genericized collections.

Note that the Prefix and Separator enums have their own constructors, allowing for the definition of an actual character representing this enum instance (versus the name used to refer to the particular enum instance). These characters can be retrieved using these enums' getName() methods, and the characters are used for the java.util.regex package's pattern syntax. This package is used to perform some of the syntax checks in the Options class, details of which will follow.

The Multiplicity enum currently supports four different values:

  1. ONCE: The option has to occur exactly once
  2. ONCE_OR_MORE: The option has to occur at least once
  3. ZERO_OR_ONCE: The option can either be absent or present exactly once
  4. ZERO_OR_MORE: The option can either be absent or present any number of times

More definitions can easily be added should the need arise.

The OptionData class

The OptionData class is basically a data container: firstly, for the data describing the option itself, and secondly, for the actual data found on the command line for that option. This design is already reflected in the constructor:

OptionData(Options.Prefix prefix,
           String key,
           boolean detail,
           Options.Separator separator,
           boolean value,
           Options.Multiplicity multiplicity)

The key is used as the unique identifier for this option. Note that these arguments directly reflect the findings described earlier: a full option description must have at least a prefix, a key, and multiplicity. Options taking a value also have a separator and might accept details. Note also that this constructor has package access, so applications cannot directly use it. Class OptionSet's addOption() method adds the options. This design principle has the advantage that we have much better control on the actual possible combinations of arguments used to create OptionData instances. For example, if this constructor were public, you could create an instance with detail set to true and value set to false, which is of course nonsense. Rather than having elaborate checks in the constructor itself, I decided to provide a controlled set of addOption() methods.

The constructor also creates an instance of java.util.regex.Pattern, which is used for this option's pattern-matching process. One example would be the pattern for an option taking a value, no details, and a nonblank separator:

pattern = java.util.regex.Pattern.compile(prefix.getName() + key + separator.getName() + "(.+)$");

The OptionData class, as already mentioned, also holds the results of the checks performed by the Options class. It provides the following public methods to access these results:

int getResultCount()
String getResultValue(int index)
String getResultDetail(int index)

The first method, getResultCount(), returns the number of times an option was found. This method design directly ties in with the multiplicity defined for the option. For options taking a value, this value can be retrieved using the getResultValue(int index) method, where the index can range between 0 and getResultCount() - 1. For value options that also accept details, these can be similarly accessed using the getResultDetail(int index) method.

The OptionSet class

The OptionSet class is basically a container for a set of OptionData instances and also the data arguments found on the command line.

The constructor has the form:

OptionSet(Options.Prefix prefix,
          Options.Multiplicity defaultMultiplicity,
          String setName,
          int minData,
          int maxData)

Again, this constructor has package access. Option sets can only be created through the Options class's different addSet() methods. The default multiplicity for the options specified here can be overridden when adding an option to the set. The set name specified here is a unique identifier used to refer to the set. minData and maxData are the minimum and maximum number of acceptable data arguments for this set.

The public API for OptionSet contains the following methods:

General access methods:

String getSetName()
int getMinData()
int getMaxData()

Methods to add options:

OptionSet addOption(String key)
OptionSet addOption(String key, Multiplicity multiplicity)
OptionSet addOption(String key, Separator separator)
OptionSet addOption(String key, Separator separator, Multiplicity multiplicity)
OptionSet addOption(String key, boolean details, Separator separator)
OptionSet addOption(String key, boolean details, Separator separator, Multiplicity multiplicity)

Methods to access check result data:

java.util.ArrayList<OptionData> getOptionData()
OptionData getOption(String key)
boolean isSet(String key)
java.util.ArrayList<String> getData()
java.util.ArrayList<String> getUnmatched()

Note that the methods for adding options that take a Separator argument create an OptionData instance accepting a value. The addOption() methods return the set instance itself, which allows invocation chaining:

Options options = new Options(args);
options.addSet("MySet").addOption("a").addOption("b");

After the checks have been performed, their results are available through the remaining methods. getOptionData() returns a list of all OptionData instances, while getOption() allows direct access to a specific option. isSet(String key) is a convenience method that checks whether an options was found at least once on the command line. getData() provides access to the data arguments found, while getUnmatched() lists all options found on the command line for which no matching OptionData instances were found.

The Options class

Options is the core class with which applications will interact. It provides several constructors, all of which take the command line argument string array that the main() method provides as the first argument:

Options(String args[])
Options(String args[], int data)
Options(String args[], int defMinData, int defMaxData)
Options(String args[], Multiplicity defaultMultiplicity)
Options(String args[], Multiplicity defaultMultiplicity, int data)
Options(String args[], Multiplicity defaultMultiplicity, int defMinData, int defMaxData)
Options(String args[], Prefix prefix)
Options(String args[], Prefix prefix, int data)
Options(String args[], Prefix prefix, int defMinData, int defMaxData)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity, int data)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity, int defMinData, int defMaxData)

The first constructor in this list is the simplest one using all the default values, while the last one is the most generic.

Table 1: Arguments for the Options() constructors and their meaning

Value Description Default
prefixThis constructor argument is the only place where a prefix can be specified. This value is passed on to any option set and any option created subsequently. The idea behind this approach is that within a given application, it proves unlikely that different prefixes will need to be used.Prefix.DASH
defaultMultiplicityThis default multiplicity is passed to each option set and used as the default for options added to a set without specifying a multiplicity. Of course, this multiplicity can be overridden for each option added.Multiplicity.ONCE
defMinDatadefMinData is the default minimum number of supported data arguments passed to each option set, but it can of course be overridden when adding a set.0
defMaxDatadefMaxData is the default maximum number of supported data arguments passed to each option set, but it can of course be overridden when adding a set.0
1 2 Page
Join the discussion
Be the first to comment on this article. Our Commenting Policies
See more