Should you go with JMS?

Why JMS isn't always the best solution for distributed system development

Distributed system development is growing rapidly as software developers build systems that must keep up with the ever-increasing requirements imposed by e-business. But, never before has the design and implementation of a message-processing layer within a distributed system been as complex as it is today. This is mostly due to the dramatic increase in potential functionality enabled by standards like Java Message Service (JMS) that connect many vendors' technologies in a single system. In addition, the proliferation of the Internet has given rise to new, expansive user bases and has made available several protocols for communication within a distributed system. Such protocols include CORBA IIOP (Internet Inter-ORB Protocol), Microsoft DCOM (Distributed Component Object Model), and Java RMI (Remote Method Invocation).

The natural evolution of these protocols has led to the introduction of message-oriented middleware (MOM), which allows for looser coupling within distributed systems by abstracting translation, security, and the underlying communications protocols from clients and servers. Middleware solutions include SOAP (Simple Object Access Protocol) and JMS. Proprietary, middle-layer transaction processing has existed since the early days of COBOL (Common Business Oriented Language), but it wasn't very complex because of early messaging technologies' limitations.

With the advent of standards like JMS, developers can now connect numerous technologies. Distributed-system design decisions are more difficult, and their implications on data integrity and distribution are critical to system success or failure.

A pervasive and tacit assumption is that the introduction of technology is an asset while its liabilities are oftentimes ignored. Not accounting for the liabilities often results in a system that is either unnecessarily complicated and/or over-engineered. A basic understanding of JMS and its inherent qualities (system-independent qualities), followed by a careful analysis in relation to specific distributed-system scenarios can indicate how well JMS might solve system requirements versus either altering existing problems or even introducing new ones.

JMS overview

JMS, introduced by Sun Microsystems in 1999 as part of the Java 2 Platform, Enterprise Edition (J2EE) specification, is a set of standards that describe the foundations for a message-processing middleware layer. JMS allows systems to communicate synchronously or asynchronously via both point-to-point and publish-subscribe models. Today, several vendors provide JMS implementations such as BEA Systems, Hewlett-Packard, IBM, Macromedia, and Oracle, thereby allowing JMS to interact with multiple vendor technologies.

Figure 1 shows a simple JMS-based system with an outgoing queue populated with messages for clients to process, and an incoming queue, which collects the client processing results for insertion into a database.

Figure 1. JMS translates database rows into messages for distribution

As mentioned above, MOM (like JMS) allows looser coupling within distributed systems by abstracting translation, security, and the underlying communications protocols from the clients and servers. One of the message-processing layer's main assets is that, because it introduces this abstraction layer, the implementation of either the client or server can change, sometimes radically, without affecting other system components.

Two specific scenarios

In this section, I present two distributed systems that are potential candidates for JMS and explain each system's goals and why the systems are JMS candidates.

Scenario 1

The first candidate is a distributed encoding system (shown in Figure 2). This system has a set of N clients that retrieve encoding jobs from a central database server. The clients then execute the actual transformation (encoding) from digital master to encoded files, and finish by reporting their post-processing status (e.g., success/failed) back to the central database server.

Figure 2. Scenario 1

The types of encoding (e.g., text, audio, or video) or transformations (e.g., .pdf to .xml, .wav to .mp3, .avi to .qt) do not matter. It is important to understand that encoding is CPU-intensive and requires distributed processing across multiple clients to scale.

At a glance, this system is a potential JMS candidate because:

  • Processing must be distributed as it is extremely processor (CPU) intensive
  • It may be problematic, from a system performance standpoint, to connect multiple clients directly to a single database server

Scenario 2

The second JMS candidate system is a global registration system for an Internet portal. Global registration handles requests for new user creation (registration), login, and authentication.

Figure 3. Scenario 2

Specific registration information (e.g., name, address, favorite color) and user-authentication methods (e.g., server-side user objects, HTTP cookies) are unimportant. However, it is important that this system scale to handle millions of users, and usage patterns are difficult, if not impossible, to predict. (During a televised ESPN World Cup game the announcer says, "Log in and vote in our online poll. We'll present the results at the end of the show." All of a sudden, 500,000 users log in within a three-minute interval. 3 minutes = 180 seconds; 500,000 user logins/180 seconds = 2,778 user logins/sec.)

This system is a potential JMS candidate for the following reasons:

  • The system must be distributed to scale the transaction volume
  • Transactions are atomic (e.g., login), so they are stateless and therefore candidates for distribution

The two systems are architecturally alike. Several client machines extract data from a central database server (possibly replicated out to M read-only database servers), execute some logic on the client, and then report the status back to the central database server. One system delivers encoded files to a filesystem over UNC/FTP; the other delivers HTML content to Web browsers over HTTP. Both systems are distributed.

This is as far as many engineers go with their analyses before applying JMS. In the rest of this article, I explain that, although these systems share many characteristics, the appropriateness of JMS becomes clearer and more divergent as we examine each system's requirements, including system performance, data distribution, and scalability.

System analysis: To integrate or not to integrate

JMS has intrinsic, system-independent qualities. Some of these qualities (pros denoted by +, cons denoted by -) that apply to both systems include:

  • (+) JMS is a set of standards created by multiple vendor implementations; therefore, you avoid the dreaded vendor lock-in problem.
  • (+) JMS allows for abstraction (via a generic API) between client and server; you can change a database schema or platform without changing the application layer (implicit here are other potential system changes, isolated from one another by the messaging layer).
  • (+)/(-) JMS can help a system scale (a pro). The con is that any system that scales with JMS can scale without it.
  • (-) JMS is complicated. It's an entirely new layer with a new set of servers. Software rollout management, server monitoring, and security are just a few of the nontrivial problems associated with a JMS rollout. Costs should also be considered.
  • (-) Vendors do not always interpret and therefore implement standards exactly the same way, so differences exist between various implementations.
  • (-) With JMS, you need more system checks and balances. You not only introduce a new layer, you also introduce asynchronous data distribution and acknowledgement, which has the added complexity of asynchronous notification.
  • (-) No message reporting/updating/monitoring queues without custom software.

JMS also has system-dependent qualities. The appropriateness of JMS depends on how well these qualities map to the problem set you're trying to solve. Some of these qualities and how they relate to the two systems of interest follow:

Caching

Caching is a primary consideration for capacity planning within any distributed system. JMS has many features that allow its use as a caching technology (mainly that it's distributed, synchronous or asynchronous, and data exchanges as objects in messages). Therefore, an existing JMS installation can be leveraged as a caching infrastructure if required.

When considering the encoding system, caching is generally not useful to increase overall system performance, as most file transformations execute once and move to a hosting facility or SAN (storage area network), and there is little content overlap between customers. Global registration is a prime candidate for a user-information cache, as users usually log in, browse, and then log out. Login creates a user's cache entry, and this object provides subsequent user authentication while the user is on the site.

Processing order

Within the global registration system, there is no scheduling and/or order for transaction processing. Pseudo-random users enter the system at pseudo-random intervals upon login, browse content (and are therefore authenticated when they access restricted content and/or applications), and then log out.

Within the encoding system, processing is ordered. Content batches into groups for delivery depending on the availability of removable storage (e.g., DLT Solutions or Network Appliance storage). Content is not delivered until the batch is complete, so batches must execute in order (although transforms within a batch can potentially be unordered). Implementation of priority queues within JMS to preserve processing order is possible, but maintaining this order of message batches between multiple JMS servers and multiple queues becomes quite complicated. A relational database server with support for transactions is a more suitable technology for managing this workflow.

Security

Security is not part of the JMS specification. The security problem is not necessarily changed with a JMS-based implementation (if you have a security requirement pre-JMS, you will have a similar security requirement post-JMS). Knowing this, it's important to understand how JMS might relate to existing infrastructure security.

In general, the more technology you use, the more vulnerable your system becomes to hackers and security violations. Because the global registration application server is Web-facing, security flaws discovered in your vendors' JMS implementation and published in Internet news groups quickly become security liabilities for your site. Also, because JMS is a generic API, it's more prone to security breaches than a proprietary system that uses an unpublished API.

While you can leverage your existing firewall and IP-based network security to protect your back-end (read: not Web-facing—pun intended) application and database servers from security violations, there is a significant security risk created by exposing JMS application servers directly to the Internet.

The encoding system generally exists on the same network (also a network isolated from the Internet). So, there's nothing inherent about this system's network topography that relates to JMS and leveraging this topography to provide security (there are far fewer security requirements for the encoding system, as it is not Web-facing).

Scalability

Because the global registration system is subject to the whims of a large and capriciously-clicking user base, the system's scalability requirements warrant JMS. JMS will not only help scale the system, it will queue transactions, although it won't be much help when user requests flood the system.

Because the distributed encoding system has carefully regulated data traffic (as it's presumably a self-contained system), the system's scalability requirements are not as formidable. For distributed encoding, you can connect your O[100] clients directly to your database and throttle their traffic to balance encoding throughput with database server performance.

Performance

The introduction of a single JMS server can change performance issues rather than solve them. For this reason, a JMS system should be designed with multiple JMS servers (and therefore multiple queues). Figure 4 shows why performance problems are altered instead of solved. It illustrates the processing layers required for a generic data server to respond to client-connection requests:

Figure 4. Data access on a server

Data exchange between client and server is a two-part process, whether this is a client-to-database or client-to-JMS server:

  1. Data access
  2. Thread and socket management, pooling, and caching

A JMS and a database server look exactly the same (Figure 4). They handle socket connections, thread management, and access to the server's data.

With only one JMS server, potential performance problems simply commute from the database server to the JMS server. In addition to possible performance degradation associated with context switching within your database server, performance problems are now potentially greater due to JVM performance issues within your JMS server.

A single JMS server adds significant complexity to your system and might also introduce performance problems related to the connection of multiple clients to a single server. The impact of multiple JMS servers on your system design and data flow can mean the difference between a successful or failed system rollout.

1 2 Page 1
Page 1 of 2