Use a distributed cache to cluster your Spring remoting services

Add automatic discovery and clustering to Spring remoting

As enterprise computing enters the brave new world of service-oriented architecture (SOA), it becomes more and more important to seek new ways to describe, publish, and discover your services. The Web services-based approach does not offer automatic service discovery and often is too complex. New, lightweight development frameworks warrant new, lightweight approaches to service publishing.

Over the past several years, the Spring Framework has emerged as the de facto standard for developing simple, flexible, easy-to-configure J2EE applications. At the heart of Spring lies the Inversion of Control (IoC) principle. According to IoC, an application must be developed as a set of simple JavaBeans (or plain-old Java objects—POJOs), with a a lightweight IoC container wiring them together and setting the dependencies.

In Spring's case, the container is configured via a set of bean definitions, typically captured in XML context files:

 <bean id="MyServiceBean" class="mypackage.MyServiceImpl">
   <property name="otherService" ref="OtherServiceBean"/>
</bean> 

Then, when the client code needs to use MyService, as defined in this Spring context, you do something like:

 MyServiceInterface service = (MyServiceInterface)context.getBean("MyServiceBean");
service.doSomething(); 

In addition to IoC, Spring provides literally hundreds of other services, coding conveniences, and "hooks" into standard APIs that ease the development of a modern Java server-side application. Whether your application uses heavy-lifting J2EE APIs such as Enterprise JavaBeans (EJB), Java Message Service (JMS), or Java Management Extensions (JMX), or utilizes one of the popular Model-View-Controller frameworks for building a Web interface, Spring offers you something to simplify your development efforts.

As the Spring Framework matures, more and more people are using it as a foundation for their large-scale enterprise projects. Spring has passed the test of development scalability and can be used as sort of "component glue" to put together complex distributed systems.

Any nontrivial enterprise application combines many diverse components: gateways to legacy and enterprise resource planning systems, third-party systems, Web/presentation/persistence tiers, etc. It is not unusual for an e-commerce site that began as a simple Web application to eventually grow to contain hundreds of subapplications and subsystems, and face a situation where the complexity starts inhibiting further growth. Often the solution is to break the monolithic application into a few coarsely-grained services and release them on the network.

Whether your application was designed as an integration point for dispersed services or has morphed into one, the task of managing all distributed components and their configuration quickly becomes a time-consuming and expensive one. If your application components are developed using Spring, you can use Spring remoting to expose your Spring-managed beans to remote clients via a multitude of protocols. Using Spring, making your application distributed is as simple as making a few changes in your Spring context files.

The simplest (and most recommended) approach to Java-to-Java remoting in Spring is through HTTP remoting. For example, after registering your Spring dispatcher servlet in web.xml, the following context piece exposes MyService for public consumption:

 <bean name="/MyRemoteService" class="org.springframework.remoting.httpinvoker.HttpInvokerServiceExporter">
  <property name="service" ref="MyServiceBean"/>
  <property name="serviceInterface" value="mypackage.MyServiceInterface"/>
</bean> 

As you can see, the actual service is injected into this bean definition and thus made available for the remote calls.

On the client, the context definition reads:

 <bean id="MyServiceBean"
      class="org.springframework.remoting.httpinvoker.HttpInvokerProxyFactoryBean">
      <property name="serviceUrl"
         value="http://somehost:8080/webapp-context/some-mapping-for-spring-servlet/MyRemoteService" />
      <property name="serviceInterface"
         value="mypackage.MyServiceInterface" />
   </bean> 

By the magic of Spring, the client-side code (obtain the service from the context and invoke its methods) doesn't change, and the remote method invocation occurs just as the local one did before.

In addition to HTTP remoting, Spring supports several other remoting protocols out of the box, including other HTTP-based solutions (Web services, Hessian, and Burlap) and heavier ones like remote method invocation (RMI).

Configure and deploy URL-based remoting services

Deploying your services via HTTP-based remoting has several distinct advantages, one of which is that, compared with straight RMI or EJB-based solutions, you have far fewer configuration issues to worry about. Anyone who has tried to work through a nontrivial JNDI (Java Native and Directory Interface) configuration (several load-balanced or clustered J2EE containers from different vendors or even different versions of the same container) can attest to that.

If you base your distributed components on Spring remoting, defining a service on your network is simple. All you need to know is the service URL pointing to the server, port, Web application, context path, and name of the Spring bean implementing this service.

URLs are plain-text strings, and plain text is your friend. At the same time, defining a service via a URL makes the definition somewhat brittle. All the individual portions of the URL listed in the previous paragraph are subject to change, and change they will. Network topography (and network administrators) change, load-balanced server farms replace servers, Web applications deploy onto different containers under different names, holes are punched and closed in inter-network firewalls, and so on.

In addition, those brittle URLs must be stored in Spring context files on every client that could possibly access the service. When they change, all the clients must be updated. And one more thing: As your newly-forged service progresses from development to staging to production, the URL pointing to the service must change to reflect the environment the service is in.

Finally we arrive at the problem definition: Spring's ability to easily expose individual Spring-managed beans as remotely-accessible services is great. It would be even better if all we needed to define (or access) a service was the service name, with all the details about service location hidden from the clients.

Cache service descriptions for auto-discovery and failover

The obvious solution to this problem would employ some kind of naming service to provide dynamic, real-time (or almost real-time) resolution of service name to service location(s). Indeed, I once built such a system using the JmDNS library to register Spring remoting services in the Zeroconf namespace (Zero Configuration Networking, technology also known as Apple Rendezvous).

The problem with the DNS-based approach is that updates to the service definitions are never real-time or transactional. A failed server still appears in the service list until all kinds of timeouts and "keep-alive" games are played. What we need is the ability to quickly publish and alter the lists of URLs implementing our services and make those changes happen simultaneously (read: transactionally) across our entire network.

The systems that satisfy these requirements are available. They are various implementations of a distributed cache. The easiest way to visualize a cache for a Java programmer is to think of it as an implementation of the java.util.Map interface. You can put something in there using a key, and then you can get something out using the same key later. A distributed cache ensures that the same key-value mapping will exist in all the copies of the same Map on every server participating in the cache and will update the caches everywhere in a lockstep.

A good implementation of a distributed cache solves our problem. We associate a service name with one or more URLs pointing to the place(s) on the network where this service is implemented. Then, we store the name=(list of URLS) associations in a distributed cache and update them accordingly as the network situation changes (servers come online and are removed, servers crash, etc.). The clients to our services participate in the distributed cache as well and, as such, always have access to the current information about the individual service implementations' locations.

As an added bonus, we can introduce a simple load-balancing/failover solution in this scenario. If a client knows that a certain service is associated with several service URLs, it can pick one of them at random and provide crude but effective load-balancing across the several servers serving those URLs. And, if a remote call fails, a client can simply mark that URL as "bad" and pick the next one, thus providing failover as well. Because the list of service URLs is stored in the distributed cache, the fact that Server A went bad is communicated to the other clients as well.

Distributed caches find use in conventional J2EE applications that provide the backbone for server clustering. For example, if you have a distributed, clustered Web application, a distributed cache will provide session replication among your cluster's members. Though highly reliable, J2EE clustering is a serious bottleneck. Session data change quickly, and the overhead of updating all the cluster members and failing over in case of failure is great. Clustered Web applications with session replication are typically several times less scalable than share-nothing load-balancer-based solutions.

Distributed caching works for our scenario due to the small amount of data being cached. Instead of thousands of session objects typical for distributed session replication, we have only a small list of services and the URLs implementing them. In addition, updates to our list happen infrequently. A distributed cache with such a small list may scale well to numerous member servers and clients.

For the rest of this article, let's look at the real-life implementation of our "service description caching algorithm."

Use Spring and JBoss Cache for service description caching

JBoss Application Server is probably the most successful (and the most controversial) open source J2EE project today. Love it or hate it, JBoss Application Server occupies a well-deserved spot on the list of top deployed servers, and its modular nature makes it very developer-friendly.

The JBoss distribution packs many ready-to-go services. One of interest to us is JBoss Cache. This cache implementation provides high-performance caching of arbitrary Java objects both locally and across the network. JBoss Cache has many configuration options and features, and I encourage you to learn more about it to see how it may fit into your next project.

The features that make JBoss Cache attractive for our project are:

  • It provides high-quality, transactional replication of Java objects
  • It can run as part of JBoss server or standalone
  • It is already available "inside" JBoss as an MBean (managed bean)
  • It can use either UDP multicast or "normal" TCP connections

The network foundation for JBoss Cache is JGroups library. JGroups provides network communication between cluster members and can work over either UDP multicast (for dynamic auto-discovery of cache members) or over TCP/IP (for working off a fixed list of server names/addresses).

For this article, I show how to use JBoss Cache to store the definitions of our services and provide dynamic, automatic service discovery.

Note: See Resources to download a zipped file containing an Eclipse project for a Web application that exposes a service via Spring remoting and uses JBoss Cache to share the service descriptions with a client application (set of JUnit tests). All of the code discussed below can be found there.

To begin, we introduce a custom class, AutoDiscoveredServiceExporter that extends the Spring standard HttpInvokerServiceExporter to expose our TestService for remoting:

 <bean name="/TestService" class="app.service.AutoDiscoveredServiceExporter">
  <property name="service" ref="TestService"/>
  <property name="serviceInterface" value="app.service.TestServiceInterface"/>
</bean> 

There is really nothing worth mentioning in this class. We basically use it to mark the Spring remoting services as exposed in our special way.

Next, the server-side cache configuration. JBoss already comes with a cache implementation, and we can use the Spring built-in JMX proxy to bring the cache into the Spring context:

 <bean id="CustomTreeCacheMBean" class="org.springframework.jmx.access.MBeanProxyFactoryBean">
    <property name="objectName">
      <value>jboss.cache:service=CustomTreeCache</value>
    </property>
    <property name="proxyInterface">
      <value>org.jboss.cache.TreeCacheMBean</value>
    </property>
</bean> 

This creates a CustomTreeCacheMBean in the server-side Spring context. Through the magic of auto-proxying, this bean implements the methods in the org.jboss.cache.TreeCacheMBean interface. For this to deploy on the JBoss server, just drop the provided custom-cache-service.xml file into your server's deploy directory.

To simplify our code, we introduce a simple CacheServiceInterface:

  public void put(String path, Object key, Object value) throws Exception;
   public Object get(String path, Object key) throws Exception; 

JBoss Cache is a tree-like structure, which is why we need the path parameter.

This interface's server-side implementation references the cache MBean as follows:

 <bean id="CacheService" class="app.service.JBossCacheServiceImpl">
   <property name="cacheMBean" ref="CustomTreeCacheMBean"/>
</bean> 

The JBossCacheServiceImpl bean simply delegates calls from the cache service to the cache MBean.

And, as the last of the server-side Spring beans, we need a ServicePublisher that watches the Spring container's lifecycle, and publishes or removes the service definitions (URLs) to and from our cache:

 <bean id="ServicePublisher" class="app.service.ServicePublisher">
   <property name="cache" ref="CacheService"/>
</bean> 

This code illustrates how ServicePublisher would react if the Spring context were refreshed (when, for example, an application has just deployed):

 

private void contextRefreshed() throws Exception { logger.info("context refreshed");

String[] names = context .getBeanNamesForType(AutoDiscoveredServiceExporter.class); logger.info("exporting services:" + names.length); for (int i = 0; i < names.length; i++) { String serviceUrl = makeUrl(names[i]); try { Set services = (Set) cache.get(SERVICE_PREFIX + names[i], SERVICE_KEY); if (services == null) services = new HashSet(); services.add(serviceUrl); cache.put(SERVICE_PREFIX + names[i], SERVICE_KEY, services); logger.info("added:" + serviceUrl); } catch (Exception ex) { logger.error("exception adding service:", ex); } }

As you can see, the publisher simply iterates over the list of services exported via cached service descriptions and adds the definitions (URLs) to the cache. Our cache is designed such that the path (or cache region) contains the name of the service, whose URL list is stored as a Set object under some static key in this region. Making the service name part of the path is important for the JBoss Cache implementation because it creates and releases the transactional locks based on the path. This way, the updates made for Service A will not interfere with updates made for Service B because they are mapped to different paths: /some/prefix/serviceA/key=(list of URLs) and /some/prefix/serviceB/key=(list of URLs).

The code for removing service definitions (when context is closed) is similar.

Now, let's move to the client side. We need to have a cache implementation to share with the server-side one:

 <bean id="LocalCacheService" class="app.auto.LocalJBossCacheServiceImpl">
</bean> 

The LocalJBossCacheServiceImpl holds a reference to a standalone copy of JBoss Cache configured from the same custom-cache-service.xml file we used on the server:

  public LocalJBossCacheServiceImpl() throws Exception {
      super();
      cache = new TreeCache();
      PropertyConfigurator config = new PropertyConfigurator();
      config.configure(cache, "app/context/custom-cache-service.xml");
   } 

This cache definition file includes the configuration for the JGroups layer, allowing all the cache members to find each other via UDP multicast, both clients and servers.

The LocalJBossCacheServiceImpl also implements our CacheServiceInterface and provides caching services to our last Spring bean, AutoDiscoveredService. This bean extends the standard Spring HttpInvokerProxyFactoryBean but is configured differently:

  <bean id="TestService"
      class="app.auto.AutoDiscoveredService">
      <property name="serviceInterface"
         value="app.service.TestServiceInterface" />
      <property name="cache" ref="LocalCacheService"/>
   </bean> 

First and foremost, a URL is no longer present. The AutoDiscoveredService automatically discovers any of our special Spring remoting services on our network exposed under the TestService name. For that discovery to happen, this bean obtains the list of URLs from our distributed cache:

  private List getServiceUrls() throws Exception {
      Set services = (Set) cache.get(ServicePublisher.SERVICE_PREFIX
            + beanName, ServicePublisher.SERVICE_KEY);
      if (services == null)
         return null;
      ArrayList results = new ArrayList(services);
      Collections.shuffle(results);
      logger.info("shuffled:" + results);
      return results;
   } 

The Collections.shuffle call randomly rearranges the list of URLs associated with this service so the client method invocation is load-balanced between them. The actual remote call is as follows:

 

public Object invoke(MethodInvocation arg0) throws Throwable {

List urls = getServiceUrls(); if (urls != null) for (Iterator allUrls = urls.iterator(); allUrls.hasNext();) { String serviceUrl = null; try { serviceUrl = (String) allUrls.next(); super.setServiceUrl(serviceUrl); logger.info("going to:" + serviceUrl); return super.invoke(arg0); } catch (Throwable problem) { if (problem instanceof IOException || problem instanceof RemoteAccessException) { logger.warn("got error accessing:" + super.getServiceUrl(), problem); removeFailedService(serviceUrl); } else { throw problem; } } } throw new IllegalStateException("No services configured for name:" + beanName); }

As you can see, if a remote call results in a thrown exception, (RemoteAccessException for Spring problems or IOException for general IO problems like network outage), the client code will deal with the problem and then try the next URL from the list, thus providing transparent failover of this service. If a call failed because of some other kind of exception, it is rethrown to be handled by the client.

A removeFailedService() method below simply removes the failed URL from the list and updates the distributed cache, making this information available to all other clients simultaneously:

  private void removeFailedService(String url) {
      try {
         logger.info("removing failed service:" + url);
         Set services = (Set) cache.get(ServicePublisher.SERVICE_PREFIX
               + beanName, ServicePublisher.SERVICE_KEY);
         if (services != null) {
            services.remove(url);
            cache.put(ServicePublisher.SERVICE_PREFIX + beanName, ServicePublisher.SERVICE_KEY,
                  services);
            logger.info("removed failed service at:" + url);
         }
      } catch (Exception e) {
         logger.warn("failed to remove failed service:" + url, e);
      }
   } 

If you build and deploy the example Web application to more than one JBoss server and run the provided LoopingAutoDiscoveredRemoteServiceTest, you can see how the incoming requests are load-balanced between the members of your Spring cluster. You can also stop and restart any of the servers, and the calls dynamically rout to the remaining server(s). If you crash a server (by pulling the network cable out or sending kill -9 on Linux), you will see an exception logged on the client console, but all the requests will be served uninterruptedly by failing over to the other server.

Conclusion

In this article, we looked at how to cluster network services provided by Spring remoting. In addition, you learned how to simplify the deployment of complex multitier applications by defining the individual services by their names only and relying on automatic discovery to bind every service to the appropriate URL.

Mikhail Garber is a Dallas-based information technology professional with more than 14 years of experience in enterprise software development. Garber specializes in Java/J2EE, databases, messaging, and open source solutions. His services have been employed by such organizations as Mary Kay Cosmetics, Boeing Defense and Space, Verizon Wireless, the US government, Lockheed Martin, Sabre/Travelocity, and many others.

Learn more about this topic

Join the discussion
Be the first to comment on this article. Our Commenting Policies