Processing command line arguments in Java: Case closed

Facilitate command line argument processing for Java tools with a simple helper class

Many Java applications started from the command line take arguments to control their behavior. These arguments are available in the string array argument passed into the application's static main() method. Typically, there are two types of arguments: options (or switches) and actual data arguments. A Java application must process these arguments and perform two basic tasks:

  1. Check whether the syntax used is valid and supported
  2. Retrieve the actual data required for the application to perform its operations

Often, the code that performs these tasks is custom-made for each application and thus requires substantial effort both to create and to maintain, especially if the requirements go beyond simple cases with only one or two options. The Options class described in this article implements a generic approach to easily handle the most complex situations. The class allows for a simple definition of the required options and data arguments, and provides thorough syntax checks and easy access to the results of these checks. New Java 5 features like generics and typesafe enums were also used for this project.

Command line argument types

Over the years, I have written several Java tools that take command line arguments to control their behavior. Early on, I found it annoying to manually create and maintain the code for processing the various options. This led to the development of a prototype class to facilitate this task, but that class admittedly had its limitations since, on close inspection, the number of possible different varieties for command line arguments turned out to be significant. Eventually, I decided to develop a general solution to this problem.

In developing this solution, I had to solve two main problems:

  1. Identify all varieties in which command line options can occur
  2. Find a simple way to allow users to express these varieties when using the yet-to-be-developed class

Analysis of Problem 1 led to the following observations:

  • Command line options contrary to command line data arguments—start with a prefix that uniquely identifies them. Prefix examples include a dash (-) on Unix platforms for options like -a or a slash (/) on Windows platforms.
  • Options can either be simple switches (i.e., -a can be present or not) or take a value. An example is:

    java MyTool -a -b logfile.inp
    
  • Options that take a value can have different separators between the actual option key and the value. Such separators can be a blank space, a colon (:), or an equals sign (=):

    java MyTool -a -b logfile.inp
    java MyTool -a -b:logfile.inp
    java MyTool -a -b=logfile.inp
    
  • Options taking a value can add one more level of complexity. Consider the way Java supports the definition of environment properties as an example:

    java -Djava.library.path=/usr/lib ...
    
  • So, beyond the actual option key (D), the separator (=), and the option's actual value (/usr/lib), an additional parameter (java.library.path) can take on any number of values (in the above example, numerous environment properties can be specified using this syntax). In this article, this parameter is called "detail."
  • Options also have a multiplicity property: they can be required or optional, and the number of times they are allowed can also vary (such as exactly once, once or more, or other possibilities).
  • Data arguments are all command line arguments that do not start with a prefix. Here, the acceptable number of such data arguments can vary between a minimum and a maximum number (which are not necessarily the same). In addition, typically an application requires these data arguments to be last on the command line, but that doesn't always have to be the case. For example:

    java MyTool -a -b=logfile.inp data1 data2 data3    // All data at the end
    

    or

    java MyTool -a data1 data2 -b=logfile.inp data3    // Might be acceptable to an application
    
  • More complex applications can support more than one set of options:

    java MyTool -a -b datafile.inp
    java MyTool -k [-verbose] foo bar duh
    java MyTool -check -verify logfile.out
    
  • Finally, an application might elect to ignore any unknown options or might consider such options to be an error.

So, in devising a way to allow users to express all these varieties, I came up with the following general options form, which is used as the basis for this article:

<prefix><key>[[<detail>]<separator><value>]

This form must be combined with the multiplicity property as described above.

Within the constraints of the general form of an option described above, the Options class described in this article is designed to be the general solution for any command line processing needs that a Java application might have.

The helper classes

The Options class, which is the core class for the solution described in this article, comes with two helper classes:

  1. OptionData: This class holds all the information for one specific option
  2. OptionSet: This class holds a set of options. Options itself can hold any number of such sets

Before describing the details of these classes, other important concepts of the Options class must be introduced.

Typesafe enums

The prefix, the separator, and the multiplicity property have been captured by enums, a feature provided for the first time by Java 5:

public enum Prefix {
  DASH('-'),
  SLASH('/');
  private char c;
  private Prefix(char c) {
    this.c = c;
  }
  char getName() {
    return c;
  }
}
public enum Separator {
  COLON(':'),
  EQUALS('='),
  BLANK(' '),
  NONE('D');
  private char c;
  private Separator(char c) {
    this.c = c;
  }
  char getName() {
    return c;
  }
}
public enum Multiplicity {
  ONCE,
  ONCE_OR_MORE,
  ZERO_OR_ONE,
  ZERO_OR_MORE;
}

Using enums has some advantages: increased type safety and tight, effortless control over the set of permissible values. Enums can also conveniently be used with genericized collections.

Note that the Prefix and Separator enums have their own constructors, allowing for the definition of an actual character representing this enum instance (versus the name used to refer to the particular enum instance). These characters can be retrieved using these enums' getName() methods, and the characters are used for the java.util.regex package's pattern syntax. This package is used to perform some of the syntax checks in the Options class, details of which will follow.

The Multiplicity enum currently supports four different values:

  1. ONCE: The option has to occur exactly once
  2. ONCE_OR_MORE: The option has to occur at least once
  3. ZERO_OR_ONCE: The option can either be absent or present exactly once
  4. ZERO_OR_MORE: The option can either be absent or present any number of times

More definitions can easily be added should the need arise.

The OptionData class

The OptionData class is basically a data container: firstly, for the data describing the option itself, and secondly, for the actual data found on the command line for that option. This design is already reflected in the constructor:

OptionData(Options.Prefix prefix,
           String key,
           boolean detail,
           Options.Separator separator,
           boolean value,
           Options.Multiplicity multiplicity)

The key is used as the unique identifier for this option. Note that these arguments directly reflect the findings described earlier: a full option description must have at least a prefix, a key, and multiplicity. Options taking a value also have a separator and might accept details. Note also that this constructor has package access, so applications cannot directly use it. Class OptionSet's addOption() method adds the options. This design principle has the advantage that we have much better control on the actual possible combinations of arguments used to create OptionData instances. For example, if this constructor were public, you could create an instance with detail set to true and value set to false, which is of course nonsense. Rather than having elaborate checks in the constructor itself, I decided to provide a controlled set of addOption() methods.

The constructor also creates an instance of java.util.regex.Pattern, which is used for this option's pattern-matching process. One example would be the pattern for an option taking a value, no details, and a nonblank separator:

pattern = java.util.regex.Pattern.compile(prefix.getName() + key + separator.getName() + "(.+)$");

The OptionData class, as already mentioned, also holds the results of the checks performed by the Options class. It provides the following public methods to access these results:

int getResultCount()
String getResultValue(int index)
String getResultDetail(int index)

The first method, getResultCount(), returns the number of times an option was found. This method design directly ties in with the multiplicity defined for the option. For options taking a value, this value can be retrieved using the getResultValue(int index) method, where the index can range between 0 and getResultCount() - 1. For value options that also accept details, these can be similarly accessed using the getResultDetail(int index) method.

The OptionSet class

The OptionSet class is basically a container for a set of OptionData instances and also the data arguments found on the command line.

The constructor has the form:

OptionSet(Options.Prefix prefix,
          Options.Multiplicity defaultMultiplicity,
          String setName,
          int minData,
          int maxData)

Again, this constructor has package access. Option sets can only be created through the Options class's different addSet() methods. The default multiplicity for the options specified here can be overridden when adding an option to the set. The set name specified here is a unique identifier used to refer to the set. minData and maxData are the minimum and maximum number of acceptable data arguments for this set.

The public API for OptionSet contains the following methods:

General access methods:

String getSetName()
int getMinData()
int getMaxData()

Methods to add options:

OptionSet addOption(String key)
OptionSet addOption(String key, Multiplicity multiplicity)
OptionSet addOption(String key, Separator separator)
OptionSet addOption(String key, Separator separator, Multiplicity multiplicity)
OptionSet addOption(String key, boolean details, Separator separator)
OptionSet addOption(String key, boolean details, Separator separator, Multiplicity multiplicity)

Methods to access check result data:

java.util.ArrayList<OptionData> getOptionData()
OptionData getOption(String key)
boolean isSet(String key)
java.util.ArrayList<String> getData()
java.util.ArrayList<String> getUnmatched()

Note that the methods for adding options that take a Separator argument create an OptionData instance accepting a value. The addOption() methods return the set instance itself, which allows invocation chaining:

Options options = new Options(args);
options.addSet("MySet").addOption("a").addOption("b");

After the checks have been performed, their results are available through the remaining methods. getOptionData() returns a list of all OptionData instances, while getOption() allows direct access to a specific option. isSet(String key) is a convenience method that checks whether an options was found at least once on the command line. getData() provides access to the data arguments found, while getUnmatched() lists all options found on the command line for which no matching OptionData instances were found.

The Options class

Options is the core class with which applications will interact. It provides several constructors, all of which take the command line argument string array that the main() method provides as the first argument:

Options(String args[])
Options(String args[], int data)
Options(String args[], int defMinData, int defMaxData)
Options(String args[], Multiplicity defaultMultiplicity)
Options(String args[], Multiplicity defaultMultiplicity, int data)
Options(String args[], Multiplicity defaultMultiplicity, int defMinData, int defMaxData)
Options(String args[], Prefix prefix)
Options(String args[], Prefix prefix, int data)
Options(String args[], Prefix prefix, int defMinData, int defMaxData)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity, int data)
Options(String args[], Prefix prefix, Multiplicity defaultMultiplicity, int defMinData, int defMaxData)

The first constructor in this list is the simplest one using all the default values, while the last one is the most generic.

Table 1: Arguments for the Options() constructors and their meaning

Value Description Default
prefixThis constructor argument is the only place where a prefix can be specified. This value is passed on to any option set and any option created subsequently. The idea behind this approach is that within a given application, it proves unlikely that different prefixes will need to be used.Prefix.DASH
defaultMultiplicityThis default multiplicity is passed to each option set and used as the default for options added to a set without specifying a multiplicity. Of course, this multiplicity can be overridden for each option added.Multiplicity.ONCE
defMinDatadefMinData is the default minimum number of supported data arguments passed to each option set, but it can of course be overridden when adding a set.0
defMaxDatadefMaxData is the default maximum number of supported data arguments passed to each option set, but it can of course be overridden when adding a set.0

In the constructors above, where only one integer argument is present (data), this value is used to set both defMinData and defMaxData to the same value. This means that the number of acceptable data arguments is fixed to exactly that number, and there is no acceptable range for that number.

Adding an option set is possible through these methods:

OptionSet addSet(String setName)
OptionSet addSet(String setName, int data)
OptionSet addSet(String setName, int minData, int maxData)

Again, the newly created set is returned to allow for subsequent invocation chaining of the addOption() methods.

Option sets can be accessed through these methods:

OptionSet getSet()
OptionSet getSet(String setName)

Note one important concept here: one default OptionSet instance does not need to be explicitly created. This instance is available through the getSet() method and is useful for simpler applications that require only one set. In this case, setting up the Options instance could look like this:

Options options = new Options(args);
options.getSet().addOption("a").addOption("b");

Under the hood, this default set is of course based on a standard OptionSet instance with a name given by:

public final static String DEFAULT_SET = "DEFAULT_OPTION_SET";

Some convenience methods have been added to the Options class to simplify the creation of the same option for all known sets at the same time:

void addOptionAllSets(String key)
void addOptionAllSets(String key, Multiplicity multiplicity)
void addOptionAllSets(String key, Separator separator)
void addOptionAllSets(String key, Separator separator, Multiplicity multiplicity)
void addOptionAllSets(String key, boolean details, Separator separator)
void addOptionAllSets(String key, boolean details, Separator separator, Multiplicity multiplicity)

These options correspond directly to the addOption() methods described earlier for the OptionSet class. One case where I have found using these methods useful was an optional verbosity option (-v), which had to be available for all sets of an application:

options.addOptionAllSets("v", Multiplicity.ZERO_OR_ONE);

Perform the checks

Performing the actual checks of the command line arguments against the specified options for all sets is obviously a core component of the Options class. The following check methods are available:

boolean check(String setName)
boolean check(String setName, boolean ignoreUnmatched, boolean requireDataLast)
boolean check()
boolean check(boolean ignoreUnmatched, boolean requireDataLast)

The first two methods check the specified option set, whereas the latter two check the default option set. The two Booleans have the following meanings.

Table 2: Arguments to the check() methods and their meanings

ValueDescription Default
ignoreUnmatchedSpecifies whether command line options for which no corresponding OptionData instance was created are acceptable. Applications can choose to ignore such unmatched options or react with an error.false
requireDataLastSpecifies whether the actual data arguments need to be the last arguments on the command line or whether they can be interspersed within the options.true

Again, the introduction of these methods is based on the observations made early in the project about the requirements for a class such as Options.

Two more convenience methods are provided:

OptionSet getMatchingSet()
OptionSet getMatchingSet(boolean ignoreUnmatched, boolean requireDataLast)

These methods run the checks for each known OptionSet and return the first one, which is successfully checked.

The last public method in the list is:

String getCheckErrors()

During the checks, the check() methods write all observed problems into a StringBuffer, the value of which can then be accessed through the getCheckErrors() method. This method proves useful for debugging purposes, but applications can also use it to tell its users about the problem with the provided input.

The actual check process consists of the following steps:

  1. Some trivial cases are caught. No options have been defined for the set to check, or no command line arguments have been provided.
  2. All command line arguments are processed in a loop. Using java.util.regex's pattern-matching capabilities, these arguments are compared with the known options, and, if a match is found, the value and the detail information are retrieved for options expecting such information. All this information is stored in the OptionData instance that matched the option.
  3. Any unmatched options are identified and stored in a list. In addition, the data arguments are identified and stored in another list.
  4. The multiplicity is checked for all the options based on the number of matches found for each one.
  5. The range of the data arguments is checked against the defined boundaries.
  6. If desired, data arguments can be checked to verify whether they are last on the command line.
  7. If desired, the presence of unmatched options are checked.

If all checks are successful, true returns. If, at any of the stages above, a check failure results, false returns immediately, and a comment explaining the problem is written to the error log (which is accessible through the getCheckErrors() method).

Examples

The following examples are designed to demonstrate the use of the Options class, ranging from a simple case of an application requiring just one option set to a complex case, with many different option sets and multiplicities for the options.

Example 1: A simple case

The first example is a simple case that demonstrates how quickly a tool can leverage the capabilities of the Options class.

The command line syntax for this example looks like this:

java Example1 [-a] [-log=<logfile>] <inpfile> <outfile>

I used the standard syntax here, which denotes optional data (like [a]) with square brackets.

The code to handle these options can look like this:

Options opt = new Options(args, 2);
opt.getSet().addOption("a", Multiplicity.ZERO_OR_ONE);
opt.getSet().addOption("log", Separator.EQUALS, Multiplicity.ZERO_OR_ONE);
if (!opt.check()) {
  // Print usage hints
  System.exit(1);
}
// Normal processing
if (opt.getSet().isSet("a")) {
  // React to option -a
}
if (opt.getSet().isSet("log")) {
  // React to option -log
  String logfile = opt.getSet().getOption("log").getResultValue(0);
}
...
String inpfile = opt.getSet().getData().get(0);
String outfile = opt.getSet().getData().get(1);
...

The Options instance is created, specifying that exactly two data arguments are required. After that, the two options are added with the multiplicity of ZERO_OR_ONE, which corresponds to the angle brackets. The checks are run by invoking check(), and if the checks are not successful, a usage description can be written.

Using Options.getSet().isSet(), you can easily check whether the options in square brackets have been specified, and the program can react accordingly. If -log was specified, that option's value is available from the OptionData instance's getResultValue() method.

The data arguments can be accessed using the getData() method on the default option set.

Actually, the code above can be further simplified by specifying a different default multiplicity directly in Options's constructor and by using invocation chaining for the options definition:

Options opt = new Options(args, Multiplicity.ZERO_OR_ONE, 2);
opt.getSet().addOption("a").addOption("log", Separator.EQUALS);

Example 2: A more complex case

This more complex example demonstrates using several OptionSet instances, different option multiplicities, and option details.

The command line syntax looks like this:

java Example2 -c [-v] [-D<detail>=<value> [...]] data1 data2
java Example2 -a [-v] [-check] data1 [data2] [data3]
java Example2 -d [-v] -k <kval> -t <tval> data1 data2 [data3] [data4]

So this tool has three main modes of operation, which are chosen by a (mandatory) option (either -c, -a, or -d).

The code could look like this:

Options opt = new Options(args, 2);
opt.addSet("cset").addOption("c").addOption("D", true, Separator.EQUALS,
Multiplicity.ZERO_OR_MORE);
opt.addSet("aset", 1, 3).addOption("a").addOption("check",
Multiplicity.ZERO_OR_ONE);
opt.addSet("dset", 2, 4).addOption("d").addOption("k",
Separator.BLANK).addOption("t", Separator.BLANK);
opt.addOptionAllSets("v", Multiplicity.ZERO_OR_ONE);
OptionSet set = opt.getMatchingSet();
if (set == null) {
  // Print usage hints
  System.exit(1);
}

Note how simple it is to capture this complex set of options!

The evaluation section could look like this (where System.out.println() calls have been inserted for clarity):

// This can be used for ALL sets since we added it using addOptionAllSets()
if (set.isSet("v")) {
  System.out.println("v is set");
}
// Evaluate the different option sets
if (set.getSetName().equals("cset")) {
  for (String d : set.getData())
    System.out.println(d);
  OptionData d = set.getOption("D");
  for (int i = 0; i < d.getResultCount(); i++) {
    System.out.println("D detail " + i + " : " + d.getResultDetail(i));
    System.out.println("D value  " + i + " : " + d.getResultValue(i));
  }
} else if (set.getSetName().equals("aset")) {
  for (String d : set.getData())
    System.out.println(d);
  if (set.isSet("check"))
    System.out.println("check is set");
} else {                               // We _know_ it has to be the third set now
  for (String d : set.getData())
    System.out.println(d);
  System.out.println(set.getOption("k").getResultValue(0));
  System.out.println(set.getOption("t").getResultValue(0));
}

Even this relatively complex example can be handled easily with the Options class, and one particular benefit becomes clear here: no check code is required at the application level, since the Options class handles it. All relevant result data is accessible through a simple and convenient set of methods.

Example 3: A really complex case

For the third example, I decided to retrofit the Options class into the URLManager package. This package contains the three Java command line tools URLManage, URLCheck, and URLPublish, each of which takes a large set of options. The most complex case is URLManage, whose usage description looks like this:

Create a new entry in the DB:                 java URLManage [-v] -c <dbprop> <url> <desc> <context>
                                              java URLManage [-v] -bc <dbprop> <urlfile>
Update the description of an entry in the DB: java URLManage [-v] -u <dbprop> <url> <desc>
Delete an entry from the DB:                  java URLManage [-v] -d <dbprop> <url>
Select URL entries from the DB:               java URLManage [-v] -s <dbprop> <pattern>
                                              java URLManage [-v] -sa <dbprop>
Select contexts from the DB:                  java URLManage [-v] -con <dbprop>
Init the tables in the DB:                    java URLManage [-v] -init <dbprop>
Delete the tables from the DB:                java URLManage [-v] -drop <dbprop>
Add the URL to a specific context:            java URLManage [-v] -ac <dbprop> <url> <context>
                                              java URLManage [-v] -bac <dbprop> <confile>
Remove the URL from a specific context:       java URLManage [-v] -rc <dbprop> <url> <context>

It turns out that the Options class can be used to handle these option sets with limited coding effort; the code resembles Example 2:

...
ml.options.Options options = new ml.options.Options(args, 1);
options.addSet("create", 4).addOption("c");
options.addSet("createBatch", 2).addOption("bc");
options.addSet("update", 3).addOption("u");
options.addSet("delete", 2).addOption("d");
options.addSet("select", 2).addOption("s");
options.addSet("addURL", 3).addOption("ac");
options.addSet("addURLBatch", 2).addOption("bac");
options.addSet("removeURL", 3).addOption("rc");
options.addSet("selectAll").addOption("sa");
options.addSet("contexts").addOption("con");
options.addSet("initTables").addOption("init");
options.addSet("deleteTables").addOption("drop");
options.addOptionAllSets("v", ml.options.Options.Multiplicity.ZERO_OR_ONE);
ml.options.OptionSet optionSet = options.getMatchingSet();
...

Conclusion

This article describes a Java class that allows for the convenient processing of command line options for Java programs. The structure is flexible enough to handle even complex situations, while at the same time offering an API that allows for the definition of acceptable command line syntax with limited coding effort. The Options class provides all the checking algorithms required to ensure that acceptable sets of command line arguments are identified, which relieves application programmers of having to hand-code the same algorithms time and again. This class can add a lot of value to every Java application requiring command line options. If some capability is missing, I'd of course appreciate feedback.

Dr. Matthias Laux is a senior engineer for Sun Microsystems working in the Global SAP-Sun Competence Center in Walldorf, Germany. His main interests are Java and J2EE technology, architecture, and programming, as well as Web services and XML technology in general, databases, and performance and benchmarking. Although he also has a background in aerospace engineering and HPC/parallel programming, today his languages of choice are Java and Perl. He is a certified Solaris Administrator, Java Programmer, and Java Enterprise Architect.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies