Stability patterns applied in a RESTful architecture

Five ways to stabilize network behavior in distributed systems

1 2 3 Page 3
Page 3 of 3

Resources-per-application, illustrated in Figure 5, is an exclusive approach to the Bulkheads pattern. For instance, the payment service uses not only the credit-score service but an exchange-rate service. If both services were installed within the same container, a bad behaving credit-score service could tear down the exchange-rate service. From the bulkheads point of view it is preferable to run each application within an individual (servlet) container, thus protecting the services from each other.

jw stability patterns fig5

Figure 5. Partitioning applications

The drawback of this approach is that a dedicated resource pool adds considerable resource overhead. Virtualization can help to reduce this overhead, however.

Resources-per-operation is a much more fine-grained approach that assigns individual system resources to (remote) operations. For instance if the payment service's getAcceptedPaymentMethods() operation runs into trouble, the getPayments() operation will still be handled. Such resource management would typically be done within a servlet container. Netflix's Hystrix framework is an example of a system that supports fine-grained bulkheads.

Resources-per-endpoint manages resources for dedicated client endpoints. For instance, you could use individual client instances for each service endpoint in the electronic payment system, as illustrated in Figure 6.

jw stability patterns fig6

Figure 6. Partitioning endpoints

In this example the Apache HttpClient uses 20 network connections at maximum by default. A single HTTP transaction consumes exactly one connection. By using the classic blocking approach the number of maximum connections is equal to the maximum number of threads used by the HttpClient instance. In the example below, the client will consume 30 connections and by implication 30 threads at maximum.

Listing 10. Bulkhead: Controlling resource usage at system endpoints

    // ...
    CloseableHttpClient httpClient = HttpClientBuilder.create()
    Client addrScoreClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient, true)).build();// RESTEasy specific
    CloseableHttpClient httpClient2 = HttpClientBuilder.create()
    Client exchangeRateClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient2, true)).build();// RESTEasy specific

Another approach to this implementation of the Bulkhead pattern would be to use different maxConnPerRoute and maxConnTotal values. The maxConnPerRoute limits connections for a particular host. Instead of using two client instances, you could use a single client instance, thus limiting the number of connections per target host. In that case you would need to keep a close eye on your thread pools. For instance, if your server container used 300 worker threads, the configuration of the internal used clients would need to consider the maximum available threads.

Stability patterns in Java 8: Non-blocking, asynchronous calls

Thread usage has played an important part in the patterns and examples so far, mainly because hanging threads are so often the culprit in unresponsive systems. It isn't unusual for a serious system failure to be caused by an exhausted thread pool where all the threads are hanging in blocking calls and waiting for slow responses.

Java 8 gave us an alternative to programming around threads with its support for lambda expressions. Lambda expressions make asynchronous, non-blocking programming in Java much easier by enabling a more reactive approach to distributed computing.

A key principle of reactive programming is to be event-driven, which means the program flow is determined by events. Instead of calling blocking methods and waiting until the response returns, the event-driven approach defines code that reacts to events such as a "response received" event. Suspended threads waiting for responses are no longer necessary. The program is a composition of handler code that reacts to events.

In Listing 11, the thenCompose(), exceptionally(), thenApply(), and whenComplete() methods are reactive. The method arguments are Java 8 functions that will be processed asynchronously only if a specific event (such as "processing completed" or "error occurred) happens.

Listing 11 shows a fully non-blocking, asynchronous implementation of the original payment method call from Listing 1. In this case, if a request is received, the database will be called in an asynchronous manner, which means the getPaymentMethodsAsync() method call returns immediately without waiting for the database query response. If the database response is received, the function of the thenCompose() method will be processed. That function either calls the credit-score service asynchronously or returns the score based on the user's prior payment history. The score will then be mapped to the supported payment methods.

Listing 11. Getting the payment methods asynchronously

public class AsyncPaymentService {
    // ...
    private final PaymentDao paymentDao;
    private final URI creditScoreURI;
    public AsyncPaymentService() {
        ClientConfig clientConfig = new ClientConfig();                    // jersey specific
        clientConfig.connectorProvider(new GrizzlyConnectorProvider());    // jersey specific
        // ...
        // use extended client (JAX-RS 2.0 client does not support CompletableFuture)
        restClient = Java8Client.newClient(ClientBuilder.newClient(clientConfig));
        // ...
        restClient.register(new ClientCircutBreakerFilter());
    public void getPaymentMethodsAsync(@QueryParam("addr") String address, @Suspended AsyncResponse resp) {
        paymentDao.getPaymentsAsync(address, 50)      // returns a CompletableFuture<ImmutableList<Payment>>
           .thenCompose(pmts -> pmts.isEmpty()        // function will be processed if paymentDao result is received
              ?"addr", address).request().async().get(Score.class) // internal async http call
              : CompletableFuture.completedFuture(( -> pmt.isDelayed()).count() > 1) ? Score.NEGATIVE : Score.POSITIVE))
           .exceptionally(error -> Score.NEUTRAL)     // function will be processed if any error occurs
           .thenApply(SCORE_TO_PAYMENTMETHOD)         // function will be processed if score is determined and maps it to payment methods
           .whenComplete(ResultConsumer.write(resp)); // writes result/error into async response
    // ...

Note that in this implementation, request handling is no longer bound to a thread that has to wait for a response. Does this mean that the stability patterns are no longer necessary for the reactive style? No, you still should implement the stability patterns.

The non-blocking style requires that no blocking code is executed within the call path. For instance, if the PaymentDao has a bug that causes a blocking behavior under some circumstances, the non-blocking contract will be broken and the call path will become blocking. A worker pool thread will be implicitly bound to the call path. Furthermore, even though threads are no longer the bottleneck others resources, such as connection/response management, will become the next bottleneck.

In conclusion

The stability patterns I've introduced in this article describe best practices to prevent cascading failures in distributed systems. Even if a component fails, the system is able to continue its intended operation, often in a downgraded mode.

My examples are for an application architecture that uses RESTful endpoints, but the patterns can be applied to other communication endpoints. For instance, most systems include database clients, which have to be considered. I also haven't covered all of the stability-related patterns. In productive environments, server processes such as the servlet container process should be monitored by supervisors. A supervisor tracks the health of the container process and will restart it if the process is close to crashing. In many cases it is better to restart a service than to keep it alive. An erroneous, almost non-responsive service node is generally much worse than a removed, dead node.

More about this topic

1 2 3 Page 3
Page 3 of 3