Stability patterns applied in a RESTful architecture

Five ways to stabilize network behavior in distributed systems

Reliability in distributed systems is determined by the weakest component, so even a minor internal function could bring your entire system down. Learn how stability patterns anticipate the hot spots of distributed network behavior, then see five patterns applied to RESTful transactions in Jersey and RESTEasy.

Implementing highly available and reliable distributed systems means learning to expect the unexpected. Sooner or later, if you're running a larger software system, you will be faced with incidents in the production environment. Typically you will find two different types of bugs. The first type is related to functional issues like calculation errors or errors in handling and interpreting data. These bugs are easy to reproduce and will usually be detected and fixed before a software system goes into production.

The second type of bug is more challenging because it only becomes activated under specific infrastructure conditions. These bugs are harder to identify and reproduce and will typically not be found during testing; instead, you'll most likely encounter them in the production environment. Better testing and software quality assurance techniques like code reviews and automated tests will increase your chance of eliminating these bugs when you find them; they won't, however, ensure that your code is bug free.

At worst, bugs in your code could trigger a cascade of errors within the system, potentially leading to a serious system failure. This is especially true for distributed systems where services are shared between other services and clients.

Stabilizing network behavior in distributed systems

The number one hot spot for serious system failure is network communication. Unfortunately, architects and designers of distributed systems are often incorrect in their assumptions about network behavior. Twenty years ago, L. Peter Deutsch and others at Sun Microsystems documented the Fallacies of Distributed Computing, which are still pervasive:

  1. The network is reliable
  2. Latency is zero
  3. Bandwidth is infinite
  4. The network is secure
  5. Topology doesn't change
  6. There is one administrator
  7. Transport cost is zero
  8. The network is homogeneous

Many developers today rely on RESTful systems to address the challenges of network communication in distributed systems. An important characteristic of REST is that it doesn't hide the limitations of network communication behind high-level RPC stubs. But RESTful interfaces and endpoints alone don't ensure that the system is inherently stable; you still have to do more.

In this article I introduce four stability patterns that address common failures in distributed systems. My examples are focused on RESTful endpoints but the patterns could also be applied for other communication endpoints. The patterns demonstrated here were introduced by Michael Nygard in his book, Release It! Design and Deploy Production-Ready Software. The sample code and demos are my own.

download
Source code for "Stability patterns applied in a RESTful architecture." October 2014. Created by Gregor Roth for JavaWorld.

Stability patterns applied

Stability patterns are used to promote resiliency in distributed systems, using what we know about the hot-spots of network behavior to protect our systems against failure. The patterns I introduce in this article are designed to protect distributed systems against common failures in network communication, where integration points such as sockets, remote procedure calls, and database calls (that is, remote calls that are hidden by the database driver) are the first risk to system stability. Using these patterns can prevent a distributed system from shutting down just because some part of the system has failed.

The Webshop demo

In general online electronic payment systems do not have data about new customers. Instead, these systems often perform an external online credit-score check based on a new user's address data. The Webshop demo application determines which payment methods (such as credit card, PayPal account, pre-payment or per invoice) will be accepted based on the user's credit score.

This demo addresses a key scenario: What happens if a credit check fails? Should the order be rejected? In most cases a payment system will fall back on accepting only more reliable payment methods. Handling this external component failure is both a technology and a business decision; it requires weighing the tradeoff between losing orders and the possibility of a payment default.

Figure 1 shows a system overview of the Webshop demo.

Figure 1. A flow diagram of the electronic payment system

To determine the payment method, the Webshop application uses a payment service internally. The payment service provides functionality to get payment information and to determine the payment methods for a dedicated user. In this example the services are implemented in a RESTful way, meaning that HTTP methods such as GET or POST will be used explicitly. Furthermore, service resources are addressed by URIs. This approach is also reflected by the JAX-RS 2.0-specified annotations in the code examples. JAX-RS 2.0 specification implements a REST binding for Java and is part of the Java Platform, Enterprise Edition.

Listing 1. Determining the payment methods


@Singleton
@Path("/")
public class PaymentService {
    // ...
    private final PaymentDao paymentDao;
    private final URI creditScoreURI;
    private final static Function<Score, ImmutableSet<PaymentMethod>> SCORE_TO_PAYMENTMETHOD = score ->  {
                            switch (score) {
                            case Score.POSITIVE:
                                return ImmutableSet.of(CREDITCARD, PAYPAL, PREPAYMENT, INVOCE);
                            case Score.NEGATIVE:
                                return ImmutableSet.of(PREPAYMENT);
                            default:
                                return  ImmutableSet.of(CREDITCARD, PAYPAL, PREPAYMENT);
                            }
    };
    @Path("paymentmethods")
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public ImmutableSet<PaymentMethod> getPaymentMethods(@QueryParam("addr") String address) {
        Score score = Score.NEUTRAL;
        try {
            ImmutableList<Payment> payments = paymentDao.getPayments(address, 50);
            score = payments.isEmpty()
                         ? restClient.target(creditScoreURI).queryParam("addr", address).request().get(Score.class)
                         : (payments.stream().filter(payment -> payment.isDelayed()).count() >= 1) ? Score.NEGATIVE : Score.POSITIVE;
        } catch (RuntimeException rt) {
            LOG.fine("error occurred by calculating score. Fallback to " + score + " " + rt.toString());
        }
        return SCORE_TO_PAYMENTMETHOD.apply(score);
    }
    @Path("payments")
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public ImmutableList<Payment> getPayments(@QueryParam("userid") String userid) {
       // ...
    }
    // ...
}

The getPaymentMethods() in Listing 1 is bound to the URI path segment paymentmethods, which will result in a URI such as: http://myserver/paymentservice/paymentmethods. The @GET annotation defines that the annotated method will be performed if an HTTP GET request is received for the given URI. The Webshop application calls the getPaymentMethods() to determine a user's reliability score, which is based on his or her credit history. If no history data is available, a credit-score service will be called. In the case of exceptions on integration points, the system is designed to downgrade its getPaymentMethods() functionality, even if this means accepting a less reliable payment method from an unknown or less trusted customer. If the internal paymentDao query or creditScoreURI query fails, the getPaymentMethods() will return default payment methods.

Now let's see how we could apply four common stability patterns to address potentially destabilizing errors in the external credit-score component.

'Use Timeouts' pattern

One of the simplest and most efficient stability patterns is to use proper timeouts. Sockets programming is the fundamental technology for enabling software to communicate on a TCP/IP network. Essentially, the Sockets API defines two types of timeouts:

  1. The connection timeout denotes the maximum time elapsed before the connection is established or an error occurs.
  2. The socket timeout defines the maximum period of inactivity between two consecutive data packets arriving on the client side after a connection has been established.

In Listing 1, I used the JAX-RS 2.0 client interface to call the credit-score service, but what is a reasonable timeout period? The answer depends on your JAX-RS provider. The current version of Jersey, for example, uses HttpURLConnection. By default Jersey sets a connection or socket timeout of 0 millis, meaning that the timeout is infinite. If you don't think that's bad news, think again.

Consider that the JAX-RS client will be processed within a server/servlet engine, which uses a worker thread pool to handle incoming HTTP requests. If we're using the classic blocking request-handling approach, the getPaymentMethods() method from Listing 1 will be called via an exclusive thread of the pool. During the entire request-processing procedure, one dedicated thread will be bound to the request processing. If the internally called credit-score service (addressed by the creditScoreURI) responds very slowly, all of the worker pool threads will eventually be suspended. Then let's say that another method of the payment service, such as getPayments(), is called. That request won't be handled because all the threads will be waiting for the credit-score response. In the worst case, a badly behaving credit-score service could take down all of our payment service functions.

Implementing timeouts: Thread pools vs. connection pooling

Reasonable timeouts are fundamental for availability, but the JAX-RS 2.0 client interface doesn't define an interface to set timeouts. Instead, you have to use vendor-specific interfaces. In the code below I've set custom properties for Jersey.


        restClient = ClientBuilder.newClient();
        restClient.property(ClientProperties.CONNECT_TIMEOUT, 2000); // jersey specific
        restClient.property(ClientProperties.READ_TIMEOUT,    2000); // jersey specific

In contrast to Jersey, RESTEasy uses the Apache HttpClient by default, which is much more efficient than using HttpURLConnection. The Apache HttpClient supports connection pooling. Connection pooling ensures that after performing an HTTP transaction the connection will be reused for further HTTP transactions, assuming the connection is identified as a persistent connection. This approach saves the overhead of establishing new TCP/IP connections, which is significant. It's not uncommon in a high-load system for hundreds or thousands of outgoing HTTP transactions per second to be performed by a single HTTP client instance.

In order to use Apache HttpClient in Jersey, you would need to set an ApacheConnectorProvider, as shown in Listing 2. Note that the timeout is set within the request-config definition.

Listing 2. Using Apache HttpClient in Jersey


    ClientConfig clientConfig = new ClientConfig();                          // jersey specific
    ClientConfig.connectorProvider(new ApacheConnectorProvider());           // jersey specific
    RequestConfig reqConfig = RequestConfig.custom()                         // apache HttpClient specific
                                           .setConnectTimeout(2000)
                                           .setSocketTimeout(2000)
                                           .setConnectionRequestTimeout(200)
                                           .build();
    clientConfig.property(ApacheClientProperties.REQUEST_CONFIG, reqConfig); // jersey specific
    restClient = ClientBuilder.newClient(clientConfig);

Note that the connection pool specific connection request timeout is also set in the example above. The connection request timeout denotes the time elapsed from when a connection request was made to before HttpClient's internal connection-pool manager returns the requested connection. By default the timeout is infinite, which means that the connection-request call blocks until a connection becomes free. The effect is the same as it would be with infinite connection/socket timeouts.

As an alternative to using Jersey, you could set the connection request timeout in an indirect way via RESTEasy, as shown in Listing 3.

Listing 3. Connection request timeout in RESTEasy


    RequestConfig reqConfig = RequestConfig.custom()   // apache HttpClient specific
                                           .setConnectTimeout(2000)
                                           .setSocketTimeout(2000)
                                           .setConnectionRequestTimeout(200)
                                           .build();
    CloseableHttpClient httpClient = HttpClientBuilder.create()
                                                      .setDefaultRequestConfig(reqConfig)
                                                      .build();
    Client restClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient, true)).build();  // RESTEasy specific

For the Timeouts pattern I've demonstrated implementations based on RESTEasy and Jersey, two RESTful Web services frameworks implementing JAX-RS 2.0. I also demonstrated two approaches to timeouts, based on whether your JAX-RS 2.0 provider uses standard thread pools or connection pooling to manage external requests.

'Circuit Breaker' pattern

While timeouts limit system resource consumption, the Circuit Breaker pattern is more proactive. A circuit breaker detects failures and prevents the application from trying to perform an action that is doomed to fail. In contrast to the HttpClient Retry pattern, the Circuit Breaker pattern addresses persistent errors.

You can use Circuit Breaker to save client-side resources for calls that are doomed to fail, as well as to save resources on the server side. If the server is in an erroneous state, such as an overloaded state, it isn't a good idea to add extra load on the server in most cases.

Figure 2. A state diagram of the Circuit Breaker pattern

A circuit breaker decorates and monitors a protected function call. Depending on the current state, the call will be executed or rejected. In general a circuit breaker implements three types of state: open, half-open, and closed:

  • Within the closed state the call will be executed and transaction metrics will be recorded. These metrics are necessary to implement a health policy.
  • If the system's health becomes bad, the circuit breaker goes into the open state. In this state all calls will be rejected immediately without any calls being made. The purpose of the open state is to give the server-side time to recover and rectify the problem.
  • When the circuite breaker enters an open state, a timeout timer is started. If this timer expires, the circuit breaker switches to a half-open state. In the half-open state calls are occasionally executed to see if the problem has been fixed. If true, then the state switches back to closed.

Circuit Breaker on the client side

Figure 3 shows how to implement a circuit breaker using the JAX-RS filter interfaces. Note that there are several places to intercept the query. For instance, an interceptor interface of the underlying HttpClient would also be suitable to integrate a circuit breaker.

Figure 3. Implementing Circuit Breaker using the JAX-RS filter interfaces

On the client-side, set a circuit breaker filter by calling the register method of the JAX-RS client interface:


client.register(new ClientCircutBreakerFilter());

The circuit breaker filter implements a pre-execution as well as a post-execution method. Within the pre-execution method the system will check whether the request execution is allowed. A dedicated circuit breaker instance is used for each target host in order to avoid side-effects. If the call is allowed, the HTTP transaction will be recorded to maintain the metrics. This transaction metric object will be closed within the post-execution method by assigning the result to the transaction. A 5xx status response will be interpreted as error only.

Listing 4. Pre- and post-execution methods in the Circuit Breaker pattern


public class ClientCircutBreakerFilter implements ClientRequestFilter, ClientResponseFilter  {
    // ..
    @Override
    public void filter(ClientRequestContext requestContext) throws IOException, CircuitOpenedException {
        String scope = requestContext.getUri().getHost();
        if (!circuitBreakerRegistry.get(scope).isRequestAllowed()) {
            throw new CircuitOpenedException("circuit is open");
        }
        Transaction transaction = metricsRegistry.transactions(scope).openTransaction();
        requestContext.setProperty(TRANSACTION, transaction);
    }
    @Override
    public void filter(ClientRequestContext requestContext, ClientResponseContext responseContext) throws IOException {
        boolean isFailed = (responseContext.getStatus() >= 500);
        Transaction.close(requestContext.getProperty(TRANSACTION), isFailed);
    }
}

Implementing a system health policy

Based on the recorded transactions from Listing 4, a circuit breaker system health policy (HealthPolicy) implementation would be able to get metrics such as the totalRate/errorRate ratio. Typically, the health logic should also consider exceptional behaviors. For instance, the health policy could ignore the totalRate/errorRate ratio in cases where the request rate was very low.

Listing 5. Health policy logic


public class ErrorRateBasedHealthPolicy implements HealthPolicy  {
    // ...
    @Override
    public boolean isHealthy(String scope) {
        Transactions recorded =  metricsRegistry.transactions(scope).ofLast(Duration.ofMinutes(60));
        return ! ((recorded.size() > thresholdMinReqPerMin) &&      // check threshold reached?
                  (recorded.failed().size() == recorded.size()) &&  // every call failed?
                  (...                                        ));   // client connection pool limit almost reached?
    }
}

If the health policy returns negative, the circuit breaker enters the open and later the half-open state. In this simplified example 2 percent of the calls will be passed through to check whether the server is back in normal state.

Listing 6. Health policy response


public class CircuitBreaker {
    private final AtomicReference<CircuitBreakerState> state = new AtomicReference<>(new ClosedState());
    private final String scope;
    // ..
    public boolean isRequestAllowed() {
        return state.get().isRequestAllowed();
    }
    private final class ClosedState implements CircuitBreakerState {
        @Override
        public boolean isRequestAllowed() {
            return (policy.isHealthy(scope)) ? true
                                             : changeState(new OpenState()).isRequestAllowed();
        }
    }
    private final class OpenState implements CircuitBreakerState {
        private final Instant exitDate = Instant.now().plus(openStateTimeout);
        @Override
        public boolean isRequestAllowed() {
            return (Instant.now().isAfter(exitDate)) ? changeState(new HalfOpenState()).isRequestAllowed()
                                                     : false;
        }
    }
    private final class HalfOpenState implements CircuitBreakerState {
        private double chance = 0.02;  // 2% will be passed through
        @Override
        public boolean isRequestAllowed() {
            return (policy.isHealthy(scope)) ? changeState(new ClosedState()).isRequestAllowed()
                                             : (random.nextDouble() <= chance);
        }
    }
    // ..
}

Circuit Breaker on the server side

The Circuit Breaker pattern can also be implemented on the server side. The scope of the server-side filter is the target operation instead of the target host. If the target operation-processing is erroneous, calls will be immediately rejected with an error status. Using a server-side filter ensures that an erroneous operation will not be allowed to consume too many resources.

In the case of the getPaymentMethods() implementation from Listing 1, the credit-score service will be called internally by using the creditScoreURI. However, if the internal credit-score service call response was very slow (and no appropriate timeout was set), the credit-score service call would consume all available threads of the servlet engine's thread pool, implicitly. Other remote operations of the payment service such as getPayments() would no longer be callable, even though the getPayments() implementation will never query the credit-score service.

Listing 7. A server-side circuit breaker filter


@Provider
public class ContainerCircutBreakerFilter implements ContainerRequestFilter, ContainerResponseFilter {
    //..
    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        String scope = resourceInfo.getResourceClass().getName() + "#" + resourceInfo.getResourceClass().getName();
        if (!circuitBreakerRegistry.get(scope).isRequestAllowed()) {
            throw new CircuitOpenedException("circuit is open");
        }
        Transaction transaction = metricsRegistry.transactions(scope).openTransaction();
        requestContext.setProperty(TRANSACTION, transaction);
    }
    //..
}

Note that in contrast to the client-side HealthPolicy the server-side example uses an OverloadBasedHealthPolicy. Here, an operation will be seen as erroneous when all threads of the worker pool are active, more than 80 percent of the threads are consumed by the dedicated operation, and the maximum slow latency threshold is exceeded. The overload-based health policy is shown below.

Listing 8. Server-side OverloadBasedHealthPolicy


public class OverloadBasedHealthPolicy implements HealthPolicy  {
    private final Environment environment;
    //...
    @Override
    public boolean isHealthy(String scope) {
        // [1] all servlet container threads busy?
        Threadpool pool = environment.getThreadpoolUsage();
        if (pool.getCurrentThreadsBusy() >= pool.getMaxThreads()) {
            TransactionMetrics metrics = metricsRegistry.transactions(scope);
            // [2] more than 80% currently consumed by this operation?
            if (metrics.running().size() > (pool.getMaxThreads() * 0.8)) {
                // [3] is 50percentile higher than slow threshold?
                Duration current50percentile = metrics.ofLast(Duration.ofMinutes(3)).percentile(50);
                if (thresholdSlowTransaction.minus(current50percentile).isNegative()) {
                    return false;
                }
            }
        }
        return true;
    }
}

'Handshaking' pattern

The Circuit Breaker pattern is an all-or-nothing approach. Depending on the quality and granularity of the recorded metrics an alternative is to detect an overload situation in advance. If an impending overload is detected, the client can be signaled to reduce requests. In the Handshaking pattern a server communicates with the client in order to control its own workload.

An approach to the Handshaking pattern is for the server to provide regular system health updates via a load balancer. The load balancer could use a health-check URI like http://myserver/paymentservice/~health to decide to which server requests should be forwarded. For security reasons health-check pages are generally not provided for public internet access, so the scope of health checks is limited to company internal communication.

An alternative to this pull approach is to implement a server push approach by adding a flow-control header to the response. This enables the server to control its load on a per-client base. In this case the client must be identified. In Listing 9 I've added a proprietary client ID request header as well as a proprietary flow-control response header.

Listing 9. Flow control header for a Handshaking filter


@Provider
public class HandshakingFilter implements ContainerRequestFilter, ContainerResponseFilter {
    // ...
    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        String clientId = requestContext.getHeaderString("X-Client-Id");
        requestContext.setProperty(TRANSACTION, metricsRegistry.transactions(clientId).openTransaction());
    }
    @Override
    public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) throws IOException {
        String clientId = requestContext.getHeaderString("X-Client-Id");
        if (flowController.isVeryHighRequestRate(clientId)) {
            responseContext.getHeaders().add("X-FlowControl-Request", "reduce");
        }
        Transaction.close(requestContext.getProperty(TRANSACTION), responseContext.getStatus() >= 500);
    }
}

In this example the server signals the client to reduce requests if a given metric threshold is exceeded. The metrics are recorded using the client ID. Using the client ID for metric enables us to allow a specific client a quota of resources. Typically, the client will respond to a reduce request directive by switching off functionality such as pre-fetching or suggesting functions that require backend requests.

'Bulkheads' pattern

In the industrial world, a bulkhead is used to partition a ship or aircraft into sections, so that sections can be sealed off if there is a hull breach. You can use bulkheads similarly in software systems, to partition your system and protect it against cascading errors. Essentially, a bulkhead assigns limited resources to specific (groups of) clients, applications, operations, client endpoints, and so on.

Bulkheads in a RESTful system

There are a number of ways to set up bulkheads, or system partitions, as I will demonstrate below.

Resources-per-client is an approach to the Bulkheads pattern that sets up individual clusters for specific clients. For instance, Figure 4 is a diagram of a new, mobile version of the Webshop application. Partitioning the mobile WebshopApp ensures that an influx of mobile status requests will not have negative side-effects for the original Webshop application. Any system failures caused by new incoming requests to the mobile app would be limited to the mobile channel.

Figure 4. Mobile WebshopApp

Resources-per-application, illustrated in Figure 5, is an exclusive approach to the Bulkheads pattern. For instance, the payment service uses not only the credit-score service but an exchange-rate service. If both services were installed within the same container, a bad behaving credit-score service could tear down the exchange-rate service. From the bulkheads point of view it is preferable to run each application within an individual (servlet) container, thus protecting the services from each other.

Figure 5. Partitioning applications

The drawback of this approach is that a dedicated resource pool adds considerable resource overhead. Virtualization can help to reduce this overhead, however.

Resources-per-operation is a much more fine-grained approach that assigns individual system resources to (remote) operations. For instance if the payment service's getAcceptedPaymentMethods() operation runs into trouble, the getPayments() operation will still be handled. Such resource management would typically be done within a servlet container. Netflix's Hystrix framework is an example of a system that supports fine-grained bulkheads.

Resources-per-endpoint manages resources for dedicated client endpoints. For instance, you could use individual client instances for each service endpoint in the electronic payment system, as illustrated in Figure 6.

Figure 6. Partitioning endpoints

In this example the Apache HttpClient uses 20 network connections at maximum by default. A single HTTP transaction consumes exactly one connection. By using the classic blocking approach the number of maximum connections is equal to the maximum number of threads used by the HttpClient instance. In the example below, the client will consume 30 connections and by implication 30 threads at maximum.

Listing 10. Bulkhead: Controlling resource usage at system endpoints


    // ...
    CloseableHttpClient httpClient = HttpClientBuilder.create()
                                                      .setMaxConnTotal(30)
                                                      .setMaxConnPerRoute(30)
                                                      .setDefaultRequestConfig(reqConfig)
                                                      .build();
    Client addrScoreClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient, true)).build();// RESTEasy specific
    CloseableHttpClient httpClient2 = HttpClientBuilder.create()
                                                       .setMaxConnTotal(30)
                                                       .setMaxConnPerRoute(30)
                                                       .setDefaultRequestConfig(reqConfig)
                                                       .build();
    Client exchangeRateClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient2, true)).build();// RESTEasy specific

Another approach to this implementation of the Bulkhead pattern would be to use different maxConnPerRoute and maxConnTotal values. The maxConnPerRoute limits connections for a particular host. Instead of using two client instances, you could use a single client instance, thus limiting the number of connections per target host. In that case you would need to keep a close eye on your thread pools. For instance, if your server container used 300 worker threads, the configuration of the internal used clients would need to consider the maximum available threads.

Stability patterns in Java 8: Non-blocking, asynchronous calls

Thread usage has played an important part in the patterns and examples so far, mainly because hanging threads are so often the culprit in unresponsive systems. It isn't unusual for a serious system failure to be caused by an exhausted thread pool where all the threads are hanging in blocking calls and waiting for slow responses.

Java 8 gave us an alternative to programming around threads with its support for lambda expressions. Lambda expressions make asynchronous, non-blocking programming in Java much easier by enabling a more reactive approach to distributed computing.

A key principle of reactive programming is to be event-driven, which means the program flow is determined by events. Instead of calling blocking methods and waiting until the response returns, the event-driven approach defines code that reacts to events such as a "response received" event. Suspended threads waiting for responses are no longer necessary. The program is a composition of handler code that reacts to events.

In Listing 11, the thenCompose(), exceptionally(), thenApply(), and whenComplete() methods are reactive. The method arguments are Java 8 functions that will be processed asynchronously only if a specific event (such as "processing completed" or "error occurred) happens.

Listing 11 shows a fully non-blocking, asynchronous implementation of the original payment method call from Listing 1. In this case, if a request is received, the database will be called in an asynchronous manner, which means the getPaymentMethodsAsync() method call returns immediately without waiting for the database query response. If the database response is received, the function of the thenCompose() method will be processed. That function either calls the credit-score service asynchronously or returns the score based on the user's prior payment history. The score will then be mapped to the supported payment methods.

Listing 11. Getting the payment methods asynchronously


@Singleton
@Path("/")
public class AsyncPaymentService {
    // ...
    private final PaymentDao paymentDao;
    private final URI creditScoreURI;
    public AsyncPaymentService() {
        ClientConfig clientConfig = new ClientConfig();                    // jersey specific
        clientConfig.connectorProvider(new GrizzlyConnectorProvider());    // jersey specific
        // ...
        // use extended client (JAX-RS 2.0 client does not support CompletableFuture)
        restClient = Java8Client.newClient(ClientBuilder.newClient(clientConfig));
        // ...
        restClient.register(new ClientCircutBreakerFilter());
    }
    @Path("paymentmethods")
    @GET
    @Produces(MediaType.APPLICATION_JSON)
    public void getPaymentMethodsAsync(@QueryParam("addr") String address, @Suspended AsyncResponse resp) {
        paymentDao.getPaymentsAsync(address, 50)      // returns a CompletableFuture<ImmutableList<Payment>>
           .thenCompose(pmts -> pmts.isEmpty()        // function will be processed if paymentDao result is received
              ? restClient.target(addrScoreURI).queryParam("addr", address).request().async().get(Score.class) // internal async http call
              : CompletableFuture.completedFuture((pmts.stream().filter(pmt -> pmt.isDelayed()).count() > 1) ? Score.NEGATIVE : Score.POSITIVE))
           .exceptionally(error -> Score.NEUTRAL)     // function will be processed if any error occurs
           .thenApply(SCORE_TO_PAYMENTMETHOD)         // function will be processed if score is determined and maps it to payment methods
           .whenComplete(ResultConsumer.write(resp)); // writes result/error into async response
    }
    // ...
}

Note that in this implementation, request handling is no longer bound to a thread that has to wait for a response. Does this mean that the stability patterns are no longer necessary for the reactive style? No, you still should implement the stability patterns.

The non-blocking style requires that no blocking code is executed within the call path. For instance, if the PaymentDao has a bug that causes a blocking behavior under some circumstances, the non-blocking contract will be broken and the call path will become blocking. A worker pool thread will be implicitly bound to the call path. Furthermore, even though threads are no longer the bottleneck others resources, such as connection/response management, will become the next bottleneck.

In conclusion

The stability patterns I've introduced in this article describe best practices to prevent cascading failures in distributed systems. Even if a component fails, the system is able to continue its intended operation, often in a downgraded mode.

My examples are for an application architecture that uses RESTful endpoints, but the patterns can be applied to other communication endpoints. For instance, most systems include database clients, which have to be considered. I also haven't covered all of the stability-related patterns. In productive environments, server processes such as the servlet container process should be monitored by supervisors. A supervisor tracks the health of the container process and will restart it if the process is close to crashing. In many cases it is better to restart a service than to keep it alive. An erroneous, almost non-responsive service node is generally much worse than a removed, dead node.

More about this topic