An ounce of prevention: Avoid J2EE data layer bottlenecks

Best practices for tackling data bottlenecks within J2EE environments

J2EE application servers provide scalable performance for a wide range of applications. However, the general-purpose nature of J2EE, which aims to address the needs of every enterprise, also limits its ability to provide a best-of-breed solution for mission-critical applications. In particular, data-intensive applications expose a serious data bottleneck in all J2EE server architectures.

A recent survey of 360 J2EE users found that 57 percent of application performance and availability issues can be traced to inefficient data access problems, and only 42 percent of applications perform as planned during initial deployment. Not surprisingly, the survey went on to state that Java applications fail to meet user expectations 60 percent of the time. Worse yet, a 2004 survey conducted by Forrester Research found that more than two-thirds of respondents discovered application performance problems only when a user called the help desk.

Typically, J2EE servers convert every request for persistent data into one or more SQL statements. For applications with complex object models and heavy request volumes, this approach creates inevitable problems, as illustrated in Figure 1.

Figure 1. J2EE application server bottleneck. Click on thumbnail to view full-sized image.

This article defines the three most common causes of application data bottlenecks and offers a proactive approach for eliminating them. It also illustrates the architecture using a real-world J2EE application with a data services layer that has been deployed globally and is now providing high performance 24-7.

Indicators of potential application data bottlenecks

The two application characteristics that most frequently contribute to data bottlenecks are:

  1. The number of data objects, which drives the complexity of the object-relational mapping (mapping the entity bean to a persistent store)
  2. The peak transaction rate, which drives the volume of database requests

Applications in jeopardy of experiencing serious problems have one of the following three requirements:

  • Model-intensive requirements: As the size of the application object model grows, the difficulty of defining an efficient object-relational mapping increases. Highly efficient mapping is necessary to prevent mapping bottlenecks.
  • Transaction-intensive requirements: As the application request volume grows, the database will become a bottleneck based on the sheer volume of database queries. Intelligent caching is needed to prevent query bottlenecks.
  • Data-intensive requirements: For applications with both complex object models and high request volumes, eliminating data bottlenecks requires a more systemic approach. A data services layer is data access infrastructure software that integrates mapping and caching transparently within a J2EE server to eliminate data bottlenecks for data intensive applications.

The 50/50 rule of thumb

While enterprise applications are complex and may perform poorly for a variety of reasons, a good rule of thumb for predicting data bottlenecks is the 50/50 rule. J2EE applications that have more than 50 data classes and/or more than 50 transactions per second during peak times are much more likely to experience serious data bottlenecks. Figure 2 illustrates how to assess your application using the 50/50 rule for data bottlenecks.

Figure 2. The 50/50 rule of thumb. Click on thumbnail to view full-sized image.

Many basic applications have a low risk of data bottlenecks, and the performance provided by a standard J2EE server is adequate. Model-intensive applications are characterized by numerous data classes and/or complex relationships between objects. Transaction-intensive applications currently have or are anticipated to have a high rate of requests as demand grows. Data-intensive applications suffer from both a large number of data classes and a high peak transaction rate. The rest of this article addresses best practices for addressing the data bottlenecks created by these types of applications within a J2EE environment.

Best practices for model-intensive applications

For applications with a small number of data classes, it is relatively easy to hand-tune the mapping of application components to a relational database. For larger object models, however, hand-coding the data class implementations or specifying the mapping through descriptors becomes increasingly unmanageable.

In general, any application that has an object model with 20 or more data objects will experience performance issues related to the object-relational mapping (ORM) challenges. Inefficient mappings will cause the application to bottleneck on time-consuming database queries. Several approaches address ORM bottlenecks:

  • Use lazy loading: One common mistake with data objects is to fetch entire objects or object hierarchies from the database when an object slice or hierarchy node is sufficient for the application's needs. Loading data only when absolutely needed is one way to reduce the overall database traffic.
  • Understand J2EE mapping limitations: J2EE servers offer features for automating the mapping between entity beans and relational data. These automated mappings save considerable development time. Sometimes, however, simplistic object-relational mappings may perform poorly and make accessing existing data difficult.
  • Use stored procedures: One way to optimize database performance for complex data is to take advantage of the database's stored procedure facilities. The data classes then map directly to a set of stored procedures for reading and writing the data.
  • Use native database APIs: JDBC (Java Database Connectivity) APIs have known inefficiencies relative to native C database libraries, such as Oracle Call Interface (OCI). In addition, native APIs present tuning parameters and functionality that is unavailable through the generic JDBC layer offered by a J2EE server.

Best practices for transaction-intensive applications

For applications that must support many requests per second, data bottlenecks are almost guaranteed to occur. This is particularly true for applications with distributed Web-based deployment and subsecond response requirements.

The standard J2EE architecture, which produces one or more SQL queries for each data object access, quickly saturates the database server in a transaction-intensive application. These bottlenecks can be addressed by increasing database capacity through replication or by reducing database requests using caching inside the J2EE server:

  • Database replication: Replicating the database is one way to break up a flood of requests so they can be managed more efficiently. This also provides an effective way for boosting performance in a particular region. However, this can be an expensive approach, requiring additional servers and database licenses.
  • Object caching within a J2EE server: Caching frequently used objects in-memory within the J2EE server reduces the database's load and improves response time. Some J2EE servers include limited caching for CMP (container-managed persistence) beans; however, this may not adequately address the performance issues, as described later in this article.
  • Add-on object caching: Many stand-alone caching products are available for example products supporting the JCache API. However, these caches lack a tight integration with the J2EE persistent object lifecycle, leaving the cache's data integrity completely in the hands of each developer to manually manage.

Figure 3 illustrates the two common integrity errors in architectures that separate the data layer from the caching functionality. Each of these errors can cause the cache to serve inaccurate information to the application.

Figure 3. Two cache errors. Click on thumbnail to view full-sized image.

The first error is to miss local data changes—typically this occurs because the cache is not integrated with the persistent object lifecycle. For example, a transaction commit that affects the number of seats available on a flight may not cause the affected seat information to update in the local cache. The second cache error is to miss changes to data made by other servers in the cluster. For example, when an application server cancels a flight, other caches in the cluster may not update, causing cache queries for seat availability to produce inaccurate results.

Best practices for data-intensive applications

The toughest data bottlenecks occur in applications that have both complex object models and high transaction loads. The 50/50 rule described above can help architects and developers predict data bottlenecks.

For data-intensive applications, neither efficient mapping nor object caching by themselves can provide acceptable results. These applications require a more integrated data services layer solution that combines an efficient object-relational mapping layer with an intelligent object cache that is linked to the J2EE persistent object lifecycle.

Market-leading J2EE servers, such as WebLogic and WebSphere, offer some caching features, but these capabilities are limited in terms of the kinds of objects that can be cached and the level of integrity provided across a cache cluster. The following table describes some limitations of the caching provided within typical J2EE serves.

Cache function J2EE caching Intelligent caching Benefit
Object accessObjects accessible only within transactionsObjects accessible across transactionsEasier to access cached objects
Object queryBy reference and primary key onlyBy reference, primary key, attribute, relationship, and indexed queriesCache queries speed performance
Object modelOnly caches simple objects, no relationshipsCaches complete object model, including relationshipsNo "impedance mismatch" with cache
ClusteringOnly sends invalidation sync messagesSends changed state in sync messagesClustered caches have complete state
CompletenessNo sync messages sent for inserts, relationship changesAll cache changes are replicated across cachesClustered caches are consistent with database
CoherencyNo guaranteed messaging, can lose sync messagesGuaranteed sync messagingClustered caches have guaranteed integrity
PortabilityProprietary-caching capabilities not portable across J2EE serversSupports WebLogic, WebSphere, Java, C++, .NetCaches can sync across multiple platforms

For data-intensive applications, the data access layer must be developed to support the performance, scalability, and availability requirements of the business processes or a bottleneck results. Standard J2EE servers fail to meet these requirements out of the box, so data-intensive applications require either significant hand-coding to get around the J2EE framework's limitations or purchase of a third-party data services product that integrates mapping and caching transparently within the J2EE framework.

Data services layers for data-intensive J2EE applications

Over the last two years, several software companies have introduced data services layers that incorporate object-relational mapping with intelligent caching capabilities and transparently integrate with J2EE servers. These products can address both model-intensive and transaction-intensive applications, but are particularly suited to solve the most serious data bottlenecks created by data-intensive applications. Examples include EdgeXtend from Persistence Software and TopLink from Oracle.

Data services products offer a "best-of-breed" antidote to the bottlenecks inherent in the J2EE architecture. A data services layer's capabilities include:

  • Model-driven development: A data services layer should use a model-driven approach to generate data objects that map to relational data. The data services layer tools should integrate with data dictionaries with UML (Unified Modeling Language) tools.
  • Seamless J2EE integration: Generated entity EJBs (Enterprise JavaBeans) should integrate seamlessly within the J2EE server, for example, by using the JCA (Java Connector Architecture) transaction interface to integrate with the persistent object lifecycle.
  • Transparent caching: All data objects implement transparent caching, with caching rules specified through a configuration file.
  • Guaranteed synchronization: Cache data changes propagate reliably to other caches, enabling scalability, failover, and high availability using JMS (Java Message Service), Tibco, or MQ.
  • Rule-based alert propagation: Cache data changes may propagate to other servers, devices, or users based on propagation rules.
  • Cross-platform: A data services layer should support multiple J2EE servers and be capable of running in a pure Java environment. Ideally, the data services layer should support multiple application platforms, including C++, Portable Java Objects (PDOs), J2EE objects (entity beans), and C# assemblies.

Figure 4 shows how a data services layer fits into a standard J2EE architecture. Note that the API for the data services layer can be either entity beans or plain old Java objects, so that presence of the data services layer is transparent to the rest of the application.

Figure 4. J2EE data services layer. Click on thumbnail to view full-sized image.

Data services layer application example

To illustrate the benefits offered by a data services layer, consider a financial portfolio management application with the following requirements:

  • Complex business entities: Portfolios, positions, and security data, whose cardinality is in hundreds, thousands, and tens of thousands, respectively
  • Real-time position updates: Any change to a model portfolio causes automatic changes to its child portfolios, each of which is subject to a set of business rules
  • Large numbers of users: Hundreds of portfolio managers use the application to actively manage the portfolios and/or to model "what-if" scenarios

To meet these business requirements, any Java server-based solution must provide:

  • Near real-time performance guarantees
  • Guaranteed high levels of business data integrity and 24-7 operation
  • Ability to scale along with increased business loads

In building the portfolio management application, a best practices data services layer first provides a model-driven approach for defining the object model and specifying how it should map to the database for maximum efficiency. Using native database APIs and database-specific performance features will help optimize the mapping performance.

For example, data services layer tools can accept object-modeling input from a data dictionary, Rational Rose or Eclipse, and then use that information to generate standard entity beans that run transparently inside the J2EE server.

On deployment of the portfolio management application, a best practices data services layer caches local copies of frequently accessed business data, assuring maximum throughput rates, and cooperates in synchronizing data changes with other caches, ensuring that the data's local copy is synchronized across all server instances. The data services layer effectively provides a data virtualization model for the J2EE architectures, greatly reducing the complexity of scaling a J2EE application, while simultaneously enhancing its robustness and integrity.

For distributed applications, a data services layer can provide even greater value. Each geographically distributed, independent server instance uses a local cache as its data layer. The local cache handles read access in these remote sites, greatly accelerating performance, while writes still write back to the primary database, providing centralized data management.

Summary and conclusions

Under the J2EE specification, the container manages the mapping between Java components and the underlying database schema. This approach provides a clean, managed component architecture, but has the inherent limitation that each data access operation results in one or more physical disk I/Os. The 50/50 rule of thumb gives architects and developers an easy way to assess the potential for data bottlenecks.

Even for market-leading application servers such as WebLogic and WebSphere, the standard J2EE architecture leads to data bottlenecks which can only be addressed with significant hand-coding or the adoption of a third-party data services layer. A best practices approach for a data services layer must fit transparently within J2EE and integrate mapping and caching with J2EE transactions.

Christopher Keene is CEO of Persistence Software. He has appeared as a featured expert on real-time data convergence in numerous publications and conferences. Keene is a frequent presenter on such topics as the future of software development, real-time information management, and data convergence through the virtual data layer. Keene earned an MBA from the Wharton School and holds a bachelor's degree with honors in mathematics from Stanford University.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies