Log4j delivers control over logging

The open source log4j API for Java offers fast, efficient log services

1 2 Page 2
Page 2 of 2
log4j.rootCategory=DEBUG, A1
log4j.appender.A1=org.log4j.FileAppender
log4j.appender.A1.File=System.out
log4j.appender.A1.layout=org.log4j.PatternLayout
# Print the date in ISO 8601 format
log4j.appender.A1.layout.ConversionPattern=%d [%t] %-5p %c - %m%n
# Print only messages of priority WARN or above in the package com.foo.
log4j.category.com.foo=WARN

The output of MyApp configured with this file is shown below:

2000-09-07 14:07:41,508 [main] INFO  MyApp - Entering application.
2000-09-07 14:07:41,529 [main] INFO  MyApp - Exiting application.

As the category com.foo.Bar does not have an assigned priority, it inherits its priority from com.foo, which was set to WARN in the configuration file. The log statement from the Bar.doIt() method has the priority DEBUG, lower than the category priority WARN. Consequently, doIt()'s log request is suppressed.

Next, we see another configuration file that uses multiple appenders:

log4j.rootCategory=debug, stdout, R
log4j.appender.stdout=org.log4j.FileAppender
log4j.appender.stdout.File=System.out
log4j.appender.stdout.layout=org.log4j.PatternLayout
# Pattern to output the caller's file name and line number.
log4j.appender.stdout.layout.ConversionPattern=%5p [%t] (%F:%L) - %m%n
log4j.appender.R=org.log4j.RollingFileAppender
log4j.appender.R.File=example.log
log4j.appender.R.MaxFileSize=100KB
# Keep one backup file
log4j.appender.R.MaxBackupIndex=1
log4j.appender.R.layout=org.log4j.PatternLayout
log4j.appender.R.layout.ConversionPattern=%p %t %c - %m%n

Calling the enhanced MyApp with the that configuration file will output the following on the console:

 INFO [main] (MyApp2.java:12) - Entering application.
DEBUG [main] (Bar.java:8) - Doing it again!
 INFO [main] (MyApp2.java:15) - Exiting application.

In addition, as the root category has been allocated a second appender, output will also be directed to the example.log file. That file will be rolled over when it reaches 100 KB. When rollover occurs, the old version of example.log automatically moves to example.log.1.

Note that to obtain those different logging behaviors, we did not need to recompile code. We could just as easily have logged to a Unix Syslog daemon and redirected all com.foo output to an NT Event logger. Similarly, we could have forwarded logging events to a remote log4j server, which would log according to local server policy -- for example, by logging to a local file as well as forwarding the log event to a second log4j server.

Nested diagnostic contexts

Most real-world systems must deal with multiple clients simultaneously. In a typical multithreaded implementation of such a system, different threads will handle different clients. Considering that, logging is especially well suited to tracing and debugging complex distributed applications. A common approach to differentiate the logging output of one client from another is to instantiate a new separate category for each client. However, that approach promotes category proliferation and increases the management overhead of logging.

In a lighter technique, you can uniquely stamp each log request initiated from the same client interaction, an approach described by Neil Harrison (see Resources). To uniquely stamp each request, the user pushes contextual information into the Nested Diagnostic Context (NDC). The NDC class is shown below:

  public class NDC {
    // Used when printing the diagnostic
    public static String get();
    // Remove the top of the context from the NDC.
    public static String pop();
    // Add diagnostic context for the current thread.
    public static void push(String message);
    // Remove the diagnostic context for this thread.
    public static void remove();
  }

The NDC is managed per thread as a stack of contextual information. Note that all of org.log4j.NDC's methods are static. Assuming that NDC printing is turned on, every time a log request is made, the appropriate log4j component will include the entire NDC stack for the current thread in the log output. That is done without the intervention of the user, who is responsible only for placing the correct information in the NDC by using the push() and pop() methods at a few well-defined points in the code. In contrast, the per-client category approach commands extensive changes in the code.

To illustrate that point, let us take the example of a servlet delivering content to numerous clients. The servlet can build the NDC at the very beginning of the request before executing other code. The contextual information can be the client's host name and other information inherent to the request, typically information contained in cookies. Hence, even if the servlet is serving multiple clients simultaneously, the logs initiated by the same code (that is, belonging to the same category) can still be distinguished because each client request will have a different NDC stack. Contrast that with the complexity of passing a freshly instantiated category to all code exercised during the client's request.

Nevertheless, some sophisticated applications, such as virtual hosting Web servers, must log differently, depending on the virtual host context and the software component issuing the request. The most recent log4j release supports multiple hierarchy trees, an enhancement that allows each virtual host to possess its own copy of the category hierarchy.

Performance

One of the often-cited arguments against logging stems from its computational cost. That is a legitimate concern, as even moderately sized applications can generate thousands of log requests. Much effort was spent measuring and tweaking logging performance. Log4j claims to be fast and flexible: speed first, flexibility second.

Nevertheless, the user should be aware of the following performance issues:

  1. Logging performance when logging is turned off.

    When logging is turned off entirely or for a set of priorities, the cost of a log request is a method invocation plus an integer comparison. The method invocation involves the hidden cost of parameter construction.

    For some category cat, writing

       cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
    

    incurs the cost of constructing the message parameter that is converting both integer i and entry[i] to a String and concatenating intermediate strings, regardless of whether the message will be logged or not.

    If one is worried about speed, then write:

       if(cat.isDebugEnabled() {
          cat.debug("Entry number: " + i + " is " + String.valueOf(entry[i]));
       }
    

    That will not incur the cost of parameter construction if debugging is disabled. On the other hand, if the category is debug enabled, it will twice incur the cost of evaluating whether the category is enabled or not: once in debugEnabled() and once in debug(). That represents an insignificant overhead because evaluating a category takes about 1 percent of the time it takes to actually log.

    In log4j, logging requests are made to instances of the Category class. Because Category is a class and not an interface, the cost of method invocation is arguably lower but at the cost of some flexibility.

    On a 233 MHz Pentium II machine, the typical cost of logging when logging is turned off is in the range of 5 to 50 nanoseconds.

  2. The performance of deciding whether to log or not to log when logging is turned on.

    Under that scenario, the question centers on the performance cost of walking the category hierarchy. When logging is turned on, log4j still needs to compare the priority of the log request with the priority of the request category. However, categories may not have an assigned priority; they can inherit them from the category hierarchy. Thus, to discover its priority, the category may need to search its ancestors.

    A serious effort has been made to make that hierarchy walk as fast as possible. For example, child categories link only to their existing ancestors. In the BasicConfigurator example shown earlier, the category named com.foo.Bar links directly to the root category, thereby circumventing the nonexistent com or com.foo categories, significantly improving the speed of the walk, especially in sparse hierarchies.

    The typical cost of walking the hierarchy is in the range of 5 to 15 microseconds, again on a 233 MHz Pentium II machine.

  3. Actual logging.

    Another performance issue stems from the cost of formatting the log output and sending it to its target destination. Here again, a serious effort was made to make layouts (formatters) perform as quickly as possible. The same is true for appenders. The typical cost of actually logging ranges from about 100 to 300 microseconds.

Although log4j has a many features, its first design goal was speed. Some log4j components have been rewritten many times to improve performance. Nevertheless, contributors frequently come up with new optimizations.

Sun's upcoming logging API

Sun has initiated JSR47 in the Java Community Process (JCP) to define a logging API for Java. The resulting API will require JDK version 1.4 and should soon become ready for public review.

The JSR47 API and log4j are quite similar at the architectural level. The JSR47 API will support a hierarchical namespace, one of the central features of log4j. On the other hand, log4j possesses many useful features missing in JSR47. Given the momentum behind log4j, in my partisan opinion, JSR47 is likely to be obsolete by the time it launches. Log4j is written by the people who use it, not by a closed committee.

By the way, log4j already provides support for bridging into the JSR47 API when the latter becomes available.

Let me also mention JLog, another logging API available from IBM's alphaWorks. JLog was previously named the RAS Toolkit until IBM renamed it. On the alphaWorks site JLog has the label "Logging Toolkit for Java," almost the same label that log4j had on alphaWorks before it migrated to http://log4j.org. JLog, while a good logging package, should not be confused with log4j.

Conclusion

Log4j is a popular logging package written in Java. One of its distinctive features includes the notion of inheritance in categories. Using a category hierarchy, it is possible to control which log statements are output at arbitrary granularity, thus reducing the volume of logged output and minimizing the cost of logging.

One of the advantages of the log4j API is its manageability. Once the log statements have been inserted into the code, they can be controlled with configuration files. Moreover, they can be selectively enabled or disabled, and sent to different and multiple output targets in user-chosen formats. Further, the log4j package is designed so that log statements can remain in shipped code without incurring a heavy performance cost.

Log4j is the result of a collective effort. My special thanks go to all the authors who have contributed to the project. Without exception, the best features in the package have all originated in the user community.

Learn more about this topic

1 2 Page 2
Page 2 of 2