Protecting corporate internal networks from hackers, thieves, or high load traffic is a common concern. A typical security measure consists of placing an intermediary Web server, known as an HTTP(S) proxy server, between the Internet and the internal network for controlling access. Such an intermediary Web server forwards HTTP(S) requests from clients to other servers, making those requests look like they originated from the proxy server and vice versa (reverse proxy).
Malicious users or excessive traffic load are just two good reasons for controlling access to internal servers. More generally, they are two application scenarios of the well-known structural design pattern Proxy, or Surrogate, whose intent, according to the Gang of Four, (GoF) is to "provide a surrogate or placeholder for another object to control access to it." Obviously such a surrogate adds a cost in terms of complexity. Complex systems are more difficult to understand and any modification is made harder by added complexity. So, why should we pay such a complexity fee? Typical reasons are:
- Security: You will eventually have to control access to resources (Protection Proxy). Typically, the proxy server is placed outside the main corporate firewall in the so-called demilitarized zone, while proxied servers are placed inside the firewall in the so-called militarized zone. Furthermore, you will eventually have to provide clients with different levels of access (Protection Access Proxy). For instance, in banking applications, clients may have different levels of access to financial reports.
- Scalability: As traffic load increases, you will eventually have to add more application servers to maintain good performance, or you might need some kind of automatic failover in case one of the nodes breaks.
- Audit: You will eventually have to count the number of accesses or trace access details to requested references (Smart Reference Proxy). For instance, this approach is common in banner advertising applications.
When such complexity is necessary in an application, PippoProxy, a Java HTTP proxy designed and implemented for Tomcat, can be used in place of standard Apache-Tomcat solutions. This article presents the rationale behind the development of PippoProxy, the need for this type of proxy, and its advantages over more traditional proxies. In addition, PippoProxy's typical deployment scenarios and comparison benchmarks to more traditional solutions are presented.
Typical Apache-Tomcat proxy configurations
The standard Apache-Tomcat proxy configuration places an Apache (proxy) HTTP server before the Tomcat application servers in a "neutral zone" between the company's private network and the Internet (or some other outside public network) for secure access to the company's private data. This proxy server also acts as a load balancer and as a server of static content. Figure 1 shows such a configuration scenario.
To connect Apache to Tomcat, you can choose one of the standard connectors. For production deployment, mod_jk is the best choice (see Tomcat FAQ and "Fronting Tomcat" for further details). In particular, the mod_jk connector is said to provide approximately double the performance than mod_proxy for several reasons, including a persistent connection pool to Tomcat and a custom optimized protocol named AJP (see the Apache Jakarta Tomcat Connector). For a step-by-step explanation on how to connect an array of Tomcats to Apache using such a connector, see "High Availability Tomcat" (JavaWorld, December 2004).
Limitations of Apache-Tomcat
In typical Apache-Tomcat configurations, static content lives with the proxy server and is typically served without processing by filters or security constraints (Figure 1). This architecture proves inadequate for those security-conscious environments that deliver documents from internal servers to external customers in a controlled manner according to specific business rules inherently bound to the application itself and its lifecycle.
The following section considers application scenarios where such inadequacies are evident and where the adoption of a Tomcat-embeddable HTTP proxy has clear advantages.
Sample application: Managing financial reports at MegaBank
Consider a banking application at MegaBank, a large financial institution, where customers may have different levels of access to financial reports (PDF files, for instance) or other documents such as research produced by financial advisors regarding the companies they are considering investing in. These documents are typically provided by a content management system (CMS) that deploys them in an internal Web server (not Tomcat, in general), for which our initial Web application acts as a service consumer. For example, a user request to access a particular report is processed according to the user profile and other business rules before the document is delivered from the internal server. Moreover, such a CMS and its Web server typically live in a more internal security layer than our Web application. Figure 2 shows such a scenario.
In the standard Apache-Tomcat configuration, the Web server is responsible for proxying the documents after applying business logic. Since the internal Web server will not necessarily be Tomcat, the proxy must use the mod_proxy module. Besides performance penalties related to mod_proxy, this solution has the main disadvantage of not complying with standard security policies since the proxy server must pass through two firewalls, see Figure 3.
To block out malicious requests for internal resources protected by security constraints, all business rules used by Tomcat to filter requests to Apache should be replicated (using another programming language) by Apache itself. Thus, the whole setup is difficult to manage. In addition to the correct forwarding of HTTP headers up and down the chain, Apache must also include additional modules (e.g., mod_rewrite and mod_auth) to implement such rules that must remain consistent with the rest of the system. For further details see "URL Rewriting Guide." In particular, this setup violates two design-level principles:
- Once and only once: This is a principle within agile development methods, such as extreme programming, that strives to eliminate code and data duplication. Generally, if you find yourself duplicating a code fragment or datastructure, you should instead create abstractions or use indirection to remove the duplication. Applying the same philosophy at the enterprise level, this principle means you should not allow different modules to perform the same logical work.
- Law of Demeter: The simple version of this guideline is "only talk to your immediate friends." Bringing this philosophy from the object-oriented design level to the enterprise level, applications should talk only to those applications on the functional levels immediately above or below them.
The above limitations of typical Apache-Tomcat configurations are related to a quite complex application scenario. Simple scenarios, on the other hand, might also suffer from lack of resources when, for instance, for simple or internal Websites, no Apache Web server is available for proxy use. In addition, in the case of static content (or quasi-static content, as in the MegaBank example), caching mechanisms are important for boosting performance and reducing the offered load to internal Web servers not built for production use.
To summarize, application scenarios are common where the adoption of a Tomcat-embeddable HTTP proxy has clear advantages. Enter PippoProxy.
PippoProxy is a 100 percent pure Java HTTP proxy designed/implemented for Tomcat that can be used instead of standard Apache-Tomcat solutions. Technically, it is implemented as a servlet and requires:
- J2SE 1.4.1 or newer
- Apache Ant 1.6.2 or newer
- Apache Tomcat 5.0.x or newer
PippoProxy is deployable in one of two modes: it can be plugged into any existent Web application acting as a service provider or serve as a standalone Web application.
In the first deployment scenario, classes responsible for handling business logic may use PippoProxy on demand. For instance, in our MegaBank example, a user request to access a particular report may be processed according to a user profile and other business rules, and eventually forward to PippoProxy.
In particular, let's assume a front end that uses some kind of Model-View-Controller (MVC) framework, whose servlet acts as controller running under http://[domain]:[port]/[context]/servlet/*.[extension]. Such a servlet (or the classes handling its actions in MVC frameworks such as Struts) receives requests from clients, decides whether they have the required authorization, sets a suitable request/session attribute to some value, and forwards the request to PippoProxy, running, for example, under http://[domain]:[port]/[context]/proxy/.
PippoProxy checks the attribute, fetches the required resource from the internal server (or its cache, if it is static), and returns the resource to the client (see Figure 4). This way, malicious users attempting to directly request resources under security constraints without the required authorization fail since their HTTP request/session has no authorization attribute set.
To maximize performance, PippoProxy manages a persistent (configurable) connection pool to the internal server, avoiding the opening and closing of connections for each request. In the case of static content, the performance is further improved with an efficient caching mechanism that uses a hierarchical structure both in memory and in the filesystem. Such a caching structure consists of a chain composed of a first node for memory cache, followed by another node for the filesystem cache. PippoProxy's caching mechanism implements the well-known GoF behavioral design pattern Chain of Responsibility, see Figure 5.
PippoProxy's main servlet asks for a resource to the first node. If the node has the requested resource, it returns it; otherwise, the node passes the request along the chain to the second node. If the second node has the requested resource, it returns it; otherwise the second node fetches the resource from the internal Web server.
The memory and filesystem cache can be configured to have a maximum size (in MB). The cache use a LRU (least-recently used) replacement strategy to decide which resource to force-pass to successive nodes or remove (last node only). Also, the cache is structured as an exclusive cache hierarchy, meaning that the contents of the memory and filesystem nodes are exclusive (eliminating redundant copies). See PippoProxy documentation for further details about implementation. Also see the discussion on Ephemeral Cache Item in Java Enterprise Design Patterns for a sample of a LRU cache in Java.
The following sections show how to install, configure, and deploy PippoProxy.
For the impatient
If you don't already have Tomcat or Ant, download the recent copies and install them. Then download PippoProxy and unpack it in a directory (e.g.,
/usr/local/pippoproxy). Edit the
_ant.properties file, have
deploy_local point to the local Tomcat Web applications (e.g.,
/usr/local/tomcat/webapps), and set
application_name to the name of the Web context under which PippoProxy works (e.g.,
pp). Now the command line
ant deploy deploys PippoProxy under the local Tomcat, producing the output shown in Figure 6.
To test your installation, go to http://localhost:8080/pp/lp/ and you should see a well-known Website.
PippoProxy can also be deployed as a standard J2EE application. As a result, the related
web.xml deployment descriptor must contain a
servlet element for specifying the servlet name and setting other servlet-specific properties, for example:
<servlet> <servlet-name>PippoProxyServlet</servlet-name> <servlet-class>org.pippo.proxy.WebCachedProxyServlet</servlet-class> <init-param> <param-name>ENABLE_SESSION_ATTR_KEY_FOR_LOGIN</param-name> <param-value>true</param-value> </init-param> ... ... ... <load-on-startup>1</load-on-startup> </servlet>
The deployment descriptor also must have a
servlet-mapping element for mapping it to one or more URL patterns according to the Servlet specification:
<servlet-mapping> <servlet-name>PippoProxyServlet</servlet-name> <url-pattern>/lp/*</url-pattern> </servlet-mapping>