Protecting corporate internal networks from hackers, thieves, or high load traffic is a common concern. A typical security measure consists of placing an intermediary Web server, known as an HTTP(S) proxy server, between the Internet and the internal network for controlling access. Such an intermediary Web server forwards HTTP(S) requests from clients to other servers, making those requests look like they originated from the proxy server and vice versa (reverse proxy).
Malicious users or excessive traffic load are just two good reasons for controlling access to internal servers. More generally, they are two application scenarios of the well-known structural design pattern Proxy, or Surrogate, whose intent, according to the Gang of Four, (GoF) is to "provide a surrogate or placeholder for another object to control access to it." Obviously such a surrogate adds a cost in terms of complexity. Complex systems are more difficult to understand and any modification is made harder by added complexity. So, why should we pay such a complexity fee? Typical reasons are:
- Security: You will eventually have to control access to resources (Protection Proxy). Typically, the proxy server is placed outside the main corporate firewall in the so-called demilitarized zone, while proxied servers are placed inside the firewall in the so-called militarized zone. Furthermore, you will eventually have to provide clients with different levels of access (Protection Access Proxy). For instance, in banking applications, clients may have different levels of access to financial reports.
- Scalability: As traffic load increases, you will eventually have to add more application servers to maintain good performance, or you might need some kind of automatic failover in case one of the nodes breaks.
- Audit: You will eventually have to count the number of accesses or trace access details to requested references (Smart Reference Proxy). For instance, this approach is common in banner advertising applications.
When such complexity is necessary in an application, PippoProxy, a Java HTTP proxy designed and implemented for Tomcat, can be used in place of standard Apache-Tomcat solutions. This article presents the rationale behind the development of PippoProxy, the need for this type of proxy, and its advantages over more traditional proxies. In addition, PippoProxy's typical deployment scenarios and comparison benchmarks to more traditional solutions are presented.
Typical Apache-Tomcat proxy configurations
The standard Apache-Tomcat proxy configuration places an Apache (proxy) HTTP server before the Tomcat application servers in a "neutral zone" between the company's private network and the Internet (or some other outside public network) for secure access to the company's private data. This proxy server also acts as a load balancer and as a server of static content. Figure 1 shows such a configuration scenario.
To connect Apache to Tomcat, you can choose one of the standard connectors. For production deployment, mod_jk is the best choice (see Tomcat FAQ and "Fronting Tomcat" for further details). In particular, the mod_jk connector is said to provide approximately double the performance than mod_proxy for several reasons, including a persistent connection pool to Tomcat and a custom optimized protocol named AJP (see the Apache Jakarta Tomcat Connector). For a step-by-step explanation on how to connect an array of Tomcats to Apache using such a connector, see "High Availability Tomcat" (JavaWorld, December 2004).
Limitations of Apache-Tomcat
In typical Apache-Tomcat configurations, static content lives with the proxy server and is typically served without processing by filters or security constraints (Figure 1). This architecture proves inadequate for those security-conscious environments that deliver documents from internal servers to external customers in a controlled manner according to specific business rules inherently bound to the application itself and its lifecycle.
The following section considers application scenarios where such inadequacies are evident and where the adoption of a Tomcat-embeddable HTTP proxy has clear advantages.
Sample application: Managing financial reports at MegaBank
Consider a banking application at MegaBank, a large financial institution, where customers may have different levels of access to financial reports (PDF files, for instance) or other documents such as research produced by financial advisors regarding the companies they are considering investing in. These documents are typically provided by a content management system (CMS) that deploys them in an internal Web server (not Tomcat, in general), for which our initial Web application acts as a service consumer. For example, a user request to access a particular report is processed according to the user profile and other business rules before the document is delivered from the internal server. Moreover, such a CMS and its Web server typically live in a more internal security layer than our Web application. Figure 2 shows such a scenario.
In the standard Apache-Tomcat configuration, the Web server is responsible for proxying the documents after applying business logic. Since the internal Web server will not necessarily be Tomcat, the proxy must use the mod_proxy module. Besides performance penalties related to mod_proxy, this solution has the main disadvantage of not complying with standard security policies since the proxy server must pass through two firewalls, see Figure 3.
To block out malicious requests for internal resources protected by security constraints, all business rules used by Tomcat to filter requests to Apache should be replicated (using another programming language) by Apache itself. Thus, the whole setup is difficult to manage. In addition to the correct forwarding of HTTP headers up and down the chain, Apache must also include additional modules (e.g., mod_rewrite and mod_auth) to implement such rules that must remain consistent with the rest of the system. For further details see "URL Rewriting Guide." In particular, this setup violates two design-level principles:
- Once and only once: This is a principle within agile development methods, such as extreme programming, that strives to eliminate code and data duplication. Generally, if you find yourself duplicating a code fragment or datastructure, you should instead create abstractions or use indirection to remove the duplication. Applying the same philosophy at the enterprise level, this principle means you should not allow different modules to perform the same logical work.
- Law of Demeter: The simple version of this guideline is "only talk to your immediate friends." Bringing this philosophy from the object-oriented design level to the enterprise level, applications should talk only to those applications on the functional levels immediately above or below them.
The above limitations of typical Apache-Tomcat configurations are related to a quite complex application scenario. Simple scenarios, on the other hand, might also suffer from lack of resources when, for instance, for simple or internal Websites, no Apache Web server is available for proxy use. In addition, in the case of static content (or quasi-static content, as in the MegaBank example), caching mechanisms are important for boosting performance and reducing the offered load to internal Web servers not built for production use.
To summarize, application scenarios are common where the adoption of a Tomcat-embeddable HTTP proxy has clear advantages. Enter PippoProxy.
PippoProxy is a 100 percent pure Java HTTP proxy designed/implemented for Tomcat that can be used instead of standard Apache-Tomcat solutions. Technically, it is implemented as a servlet and requires:
- J2SE 1.4.1 or newer
- Apache Ant 1.6.2 or newer
- Apache Tomcat 5.0.x or newer
PippoProxy is deployable in one of two modes: it can be plugged into any existent Web application acting as a service provider or serve as a standalone Web application.
In the first deployment scenario, classes responsible for handling business logic may use PippoProxy on demand. For instance, in our MegaBank example, a user request to access a particular report may be processed according to a user profile and other business rules, and eventually forward to PippoProxy.
In particular, let's assume a front end that uses some kind of Model-View-Controller (MVC) framework, whose servlet acts as controller running under http://[domain]:[port]/[context]/servlet/*.[extension]. Such a servlet (or the classes handling its actions in MVC frameworks such as Struts) receives requests from clients, decides whether they have the required authorization, sets a suitable request/session attribute to some value, and forwards the request to PippoProxy, running, for example, under http://[domain]:[port]/[context]/proxy/.
PippoProxy checks the attribute, fetches the required resource from the internal server (or its cache, if it is static), and returns the resource to the client (see Figure 4). This way, malicious users attempting to directly request resources under security constraints without the required authorization fail since their HTTP request/session has no authorization attribute set.
To maximize performance, PippoProxy manages a persistent (configurable) connection pool to the internal server, avoiding the opening and closing of connections for each request. In the case of static content, the performance is further improved with an efficient caching mechanism that uses a hierarchical structure both in memory and in the filesystem. Such a caching structure consists of a chain composed of a first node for memory cache, followed by another node for the filesystem cache. PippoProxy's caching mechanism implements the well-known GoF behavioral design pattern Chain of Responsibility, see Figure 5.
PippoProxy's main servlet asks for a resource to the first node. If the node has the requested resource, it returns it; otherwise, the node passes the request along the chain to the second node. If the second node has the requested resource, it returns it; otherwise the second node fetches the resource from the internal Web server.
The memory and filesystem cache can be configured to have a maximum size (in MB). The cache use a LRU (least-recently used) replacement strategy to decide which resource to force-pass to successive nodes or remove (last node only). Also, the cache is structured as an exclusive cache hierarchy, meaning that the contents of the memory and filesystem nodes are exclusive (eliminating redundant copies). See PippoProxy documentation for further details about implementation. Also see the discussion on Ephemeral Cache Item in Java Enterprise Design Patterns for a sample of a LRU cache in Java.
The following sections show how to install, configure, and deploy PippoProxy.
For the impatient
If you don't already have Tomcat or Ant, download the recent copies and install them. Then download PippoProxy and unpack it in a directory (e.g.,
/usr/local/pippoproxy). Edit the
_ant.properties file, have
deploy_local point to the local Tomcat Web applications (e.g.,
/usr/local/tomcat/webapps), and set
application_name to the name of the Web context under which PippoProxy works (e.g.,
pp). Now the command line
ant deploy deploys PippoProxy under the local Tomcat, producing the output shown in Figure 6.
To test your installation, go to http://localhost:8080/pp/lp/ and you should see a well-known Website.
PippoProxy can also be deployed as a standard J2EE application. As a result, the related
web.xml deployment descriptor must contain a
servlet element for specifying the servlet name and setting other servlet-specific properties, for example:
<servlet> <servlet-name>PippoProxyServlet</servlet-name> <servlet-class>org.pippo.proxy.WebCachedProxyServlet</servlet-class> <init-param> <param-name>ENABLE_SESSION_ATTR_KEY_FOR_LOGIN</param-name> <param-value>true</param-value> </init-param> ... ... ... <load-on-startup>1</load-on-startup> </servlet>
The deployment descriptor also must have a
servlet-mapping element for mapping it to one or more URL patterns according to the Servlet specification:
<servlet-mapping> <servlet-name>PippoProxyServlet</servlet-name> <url-pattern>/lp/*</url-pattern> </servlet-mapping>
You can complete this customization by either editing your
web.xml deployment descriptor or using PippoProxy's Ant scripts. PippoProxy's Ant scripts have all you need if you want to deploy PippoProxy as a standalone Web application. Thus, you should need to edit your
web.xml only when you integrate PippoProxy into an existing Web application.
To edit your deployment descriptor, place PippoProxy's Java classes into your J2EE application's
WEB-INF/lib directory as a JAR according to the Servlet specification and then edit the
web.xml deployment descriptor. For packaging PippoProxy's classes into a JAR, you can use the
jarPkg Ant target after editing the
# # Application name. # application_name=pp
# # Local temp directory to do stuff. # outdir=build
For example, if your PippoProxy home is
ant jarPkg in the previous instance of
_ant.properties will package PippoProxy's Java classes into the JAR
/usr/local/pippoproxy/build/dist/pp.jar and the HTTP client library used by PippoProxy into the JAR
/usr/local/pippoproxy/build/dist/HTTPClient.jar. These JARs should be copied into your J2EE application's
WEB-INF/lib directory. Next, edit the
web.xml deployment descriptor, setting all parameters, whose semantics are described in the next section.
If you want to deploy PippoProxy as a standalone Web application—running under http://[domain]:[port]/pippo/, for example—edit the
_ant.properties file (as above), setting the property
deploy_local as below:
# # Local Tomcat webapps. # deploy_local=C:/java/tomcat/5.0/Tomcat 5.0/webapps
So configure the
_proxy.properties file (see next section) and launch
ant deploy to generate PippoProxy's war file and copy it into Tomcat Web applications automatically.
As a further deployment mode, you can allocate a complete Tomcat server as a proxy server using PippoProxy. In this mode, PippoProxy will proxy all URLs matching the pattern http://[domain]:[port]/* (for further details, see PippoProxy documentation).
This section describes parameters used by PippoProxy, whether you configure them manually in your J2EE application's deployment descriptor or edit them in the
ENABLE_SESSION_ATTR_KEY_FOR_LOGIN: True for checking the value of a session attribute before proxying (e.g., true).
SESSION_ATTR_KEY_FOR_LOGIN: The eventual attribute to check before proxying (e.g., my_attr_for_proxy).
CACHE_ENABLED: True (static content only) for enabling PippoProxy's cache (e.g., true).
CACHE_TIMEOUT: Lifetime (in milliseconds) of a resource before being removed from the cache (e.g., 3600000).
CACHE_MAX_MEMORY_SIZE: The maximum size (in MB) of memory cache (e.g., 10).
CACHE_MAX_DISK_SIZE: The maximum size (in MB) of filesystem cache (e.g., 50).
CACHE_PATH_DIR: The absolute path where resources are stored (e.g., /usr/local/pippoproxy/cache).
REMOTE_HOST: The remote host to proxy (e.g., cocoon.apache.org).
REMOTE_PORT: The remote port to proxy (e.g., 80).
IS_ROOT: True for making the application server act as a proxy server (e.g., true); in this mode, PippoProxy handles all http://[domain]:[port]/* URLs. See PippoProxy documentation for further details.
LOCAL_PREFIX: The local prefix under which PippoProxy runs (e.g., by setting /lp, PippoProxy handles all client requests matching the URL pattern http://[domain]:[port]/[context]/lp/*).
REMOTE_PREFIX: The remote prefix to proxy (e.g., by setting /2.1, a client request for http://[domain]:[port]/[context]/lp/index.html will map to http://[REMOTE_HOST]:[REMOTE_PORT]/2.1/index.html).
NOT_ALLOWED_HEADERS: The pipe-separated list of HTTP headers to block (e.g., Content-Encoding|Content-Type).
PROXY_ENABLED: True if a further proxy server is available for reaching the internal server to proxy (e.g., true).
PROXY_HOST: The eventual proxy host for reaching the internal server to proxy (e.g., my_company.proxy).
PROXY_PORT: The eventual proxy port for reaching the internal server to proxy (e.g., 80).
PROTOCOL: Only http currently supported.
INIT_CONNECTION: The initial number of HTTP connections (e.g., 10).
MAX_CONNECTION: The maximum HTTP connections in the pool (e.g, 15).
This section compares the performance of PippoProxy deployed to Tomcat (5.5.4) and a standard Apache(2.0.49)-Tomcat(5.5.4) pair connected with mod_jk2 in the same machine.
Note: mod_jk2 has been configured so that the required resources are not forwarded by Apache to Tomcat, but handled locally by Apache itself (
- Intel 730 GHz Pentium III
- Single processor
- 256 MB RAM
- Linux 2.6.8
The tests attempt to simulate user scenarios where the same set of static resources is requested at a high repetition rate, hence, where the efficiency of the cache directly affects overall performance. Two file sizes have been used, 13 KB and 128 KB, and the same resource is requested 1,000 and 500 times, respectively.
As shown in Figure 7, the performance gap relative to the standard proxy increases as the file size increases. This is due to the fact that the PippoProxy cache works on the well-known principle of temporal locality, where programs tend to reuse data and instructions they have recently used (see Resources for more information).
For small files, mod_jk and PippoProxy performances are quite similar, since the file initial-access time is a substantial portion of the total time for handling requests. But for larger files, differences among diverse caching behaviors are more evident. Though elapsed time for first calls is quite comparable, after 500 calls, on average, PippoProxy is five times faster than mod_jk. See Figure 8, where PippoProxy's throughput increases as the file size increases, while mod_jk throughput decreases.
This comparison shows how PippoProxy's built-in cache speeds up the delivery of static content for medium to large file sizes, without any penalty on the delivery of dynamic content, since the latter is managed in both cases by the supporting servlet engine.
This article introduced PippoProxy, a one-to-one replacement for standard Apache-Tomcat proxy solutions, and illustrated application scenarios in which this Tomcat-embeddable HTTP proxy offers clear advantages over standard Apache-Tomcat proxy solutions. Regardless of the deployment scenario, in case of static (or quasi-static) content, PippoProxy caching proves to be efficient; its performance is five times that of the traditional mod_jk-based proxy. Finally, PippoProxy does not require any optional modules or even a Web server for connecting to Tomcat, and configuration and deployment are easy when using Apache Ant.
I'd like to thank my manager Marco Fillo at Virgilio for his support while preparing this article and Vanessa di Lima for her invaluable assistance.
Learn more about this topic
- PippoProxy homepage
- Tomcat homepage
- Proxy Support How-To
- Ant homepage
- HTTPClient homepage
- Design PatternsElements of Reusable Object-Oriented Sofware, E. Gamma, R. Helm, R. Johnson, J. Vlissides (Addison-Wesley Professional, 1995; ISBN0201633612)
- For Ephemeral Cache Item, see "Chapter 7, Concurrency Patterns" in Java Enterprise Design Patterns, Volume 3, Mark Grand (John Wiley & Sons, 2002; ISBN0471333158)
- "High Availability Tomcat," Graham King (JavaWorld, December 2004)
- Best practices for fronting Tomcat with Apache or IIS"Fronting Tomcat," Mladen Turk
- Tomcat FAQ about connectors
- The Apache Jakarta Tomcat Connector
- "URL Rewriting Guide," Ralf S. Engelschall (December 1997)
- Visit Wikipedia for the Law of Demeter
- Visit Wikipedia for the principle of locality
- For an introduction to extreme programming, read the bible by Kent BeckExtreme Programming Explained (Addison-Wesley Professional; 1999ISBN0201616416)
- For more articles on performance, browse the Performance Tuning section of JavaWorld's Topical Index
- For more articles on Java development tools, browse the Development Tools section of JavaWorld's Topical Index
- For more articles on design patterns, browse the Design Patterns section of JavaWorld's Topical Index