Stability patterns applied in a RESTful architecture

Five ways to stabilize network behavior in distributed systems

1 2 3 Page 2
Page 2 of 3

Listing 3. Connection request timeout in RESTEasy


    RequestConfig reqConfig = RequestConfig.custom()   // apache HttpClient specific
                                           .setConnectTimeout(2000)
                                           .setSocketTimeout(2000)
                                           .setConnectionRequestTimeout(200)
                                           .build();
    CloseableHttpClient httpClient = HttpClientBuilder.create()
                                                      .setDefaultRequestConfig(reqConfig)
                                                      .build();
    Client restClient = new ResteasyClientBuilder().httpEngine(new ApacheHttpClient4Engine(httpClient, true)).build();  // RESTEasy specific

For the Timeouts pattern I've demonstrated implementations based on RESTEasy and Jersey, two RESTful Web services frameworks implementing JAX-RS 2.0. I also demonstrated two approaches to timeouts, based on whether your JAX-RS 2.0 provider uses standard thread pools or connection pooling to manage external requests.

'Circuit Breaker' pattern

While timeouts limit system resource consumption, the Circuit Breaker pattern is more proactive. A circuit breaker detects failures and prevents the application from trying to perform an action that is doomed to fail. In contrast to the HttpClient Retry pattern, the Circuit Breaker pattern addresses persistent errors.

You can use Circuit Breaker to save client-side resources for calls that are doomed to fail, as well as to save resources on the server side. If the server is in an erroneous state, such as an overloaded state, it isn't a good idea to add extra load on the server in most cases.

jw stability patterns fig2

Figure 2. A state diagram of the Circuit Breaker pattern

A circuit breaker decorates and monitors a protected function call. Depending on the current state, the call will be executed or rejected. In general a circuit breaker implements three types of state: open, half-open, and closed:

  • Within the closed state the call will be executed and transaction metrics will be recorded. These metrics are necessary to implement a health policy.
  • If the system's health becomes bad, the circuit breaker goes into the open state. In this state all calls will be rejected immediately without any calls being made. The purpose of the open state is to give the server-side time to recover and rectify the problem.
  • When the circuite breaker enters an open state, a timeout timer is started. If this timer expires, the circuit breaker switches to a half-open state. In the half-open state calls are occasionally executed to see if the problem has been fixed. If true, then the state switches back to closed.

Circuit Breaker on the client side

Figure 3 shows how to implement a circuit breaker using the JAX-RS filter interfaces. Note that there are several places to intercept the query. For instance, an interceptor interface of the underlying HttpClient would also be suitable to integrate a circuit breaker.

jw stability patterns fig3

Figure 3. Implementing Circuit Breaker using the JAX-RS filter interfaces

On the client-side, set a circuit breaker filter by calling the register method of the JAX-RS client interface:


client.register(new ClientCircutBreakerFilter());

The circuit breaker filter implements a pre-execution as well as a post-execution method. Within the pre-execution method the system will check whether the request execution is allowed. A dedicated circuit breaker instance is used for each target host in order to avoid side-effects. If the call is allowed, the HTTP transaction will be recorded to maintain the metrics. This transaction metric object will be closed within the post-execution method by assigning the result to the transaction. A 5xx status response will be interpreted as error only.

Listing 4. Pre- and post-execution methods in the Circuit Breaker pattern


public class ClientCircutBreakerFilter implements ClientRequestFilter, ClientResponseFilter  {
    // ..
    @Override
    public void filter(ClientRequestContext requestContext) throws IOException, CircuitOpenedException {
        String scope = requestContext.getUri().getHost();
        if (!circuitBreakerRegistry.get(scope).isRequestAllowed()) {
            throw new CircuitOpenedException("circuit is open");
        }
        Transaction transaction = metricsRegistry.transactions(scope).openTransaction();
        requestContext.setProperty(TRANSACTION, transaction);
    }
    @Override
    public void filter(ClientRequestContext requestContext, ClientResponseContext responseContext) throws IOException {
        boolean isFailed = (responseContext.getStatus() >= 500);
        Transaction.close(requestContext.getProperty(TRANSACTION), isFailed);
    }
}

Implementing a system health policy

Based on the recorded transactions from Listing 4, a circuit breaker system health policy (HealthPolicy) implementation would be able to get metrics such as the totalRate/errorRate ratio. Typically, the health logic should also consider exceptional behaviors. For instance, the health policy could ignore the totalRate/errorRate ratio in cases where the request rate was very low.

Listing 5. Health policy logic


public class ErrorRateBasedHealthPolicy implements HealthPolicy  {
    // ...
    @Override
    public boolean isHealthy(String scope) {
        Transactions recorded =  metricsRegistry.transactions(scope).ofLast(Duration.ofMinutes(60));
        return ! ((recorded.size() > thresholdMinReqPerMin) &&      // check threshold reached?
                  (recorded.failed().size() == recorded.size()) &&  // every call failed?
                  (...                                        ));   // client connection pool limit almost reached?
    }
}

If the health policy returns negative, the circuit breaker enters the open and later the half-open state. In this simplified example 2 percent of the calls will be passed through to check whether the server is back in normal state.

Listing 6. Health policy response


public class CircuitBreaker {
    private final AtomicReference<CircuitBreakerState> state = new AtomicReference<>(new ClosedState());
    private final String scope;
    // ..
    public boolean isRequestAllowed() {
        return state.get().isRequestAllowed();
    }
    private final class ClosedState implements CircuitBreakerState {
        @Override
        public boolean isRequestAllowed() {
            return (policy.isHealthy(scope)) ? true
                                             : changeState(new OpenState()).isRequestAllowed();
        }
    }
    private final class OpenState implements CircuitBreakerState {
        private final Instant exitDate = Instant.now().plus(openStateTimeout);
        @Override
        public boolean isRequestAllowed() {
            return (Instant.now().isAfter(exitDate)) ? changeState(new HalfOpenState()).isRequestAllowed()
                                                     : false;
        }
    }
    private final class HalfOpenState implements CircuitBreakerState {
        private double chance = 0.02;  // 2% will be passed through
        @Override
        public boolean isRequestAllowed() {
            return (policy.isHealthy(scope)) ? changeState(new ClosedState()).isRequestAllowed()
                                             : (random.nextDouble() <= chance);
        }
    }
    // ..
}

Circuit Breaker on the server side

The Circuit Breaker pattern can also be implemented on the server side. The scope of the server-side filter is the target operation instead of the target host. If the target operation-processing is erroneous, calls will be immediately rejected with an error status. Using a server-side filter ensures that an erroneous operation will not be allowed to consume too many resources.

In the case of the getPaymentMethods() implementation from Listing 1, the credit-score service will be called internally by using the creditScoreURI. However, if the internal credit-score service call response was very slow (and no appropriate timeout was set), the credit-score service call would consume all available threads of the servlet engine's thread pool, implicitly. Other remote operations of the payment service such as getPayments() would no longer be callable, even though the getPayments() implementation will never query the credit-score service.

Listing 7. A server-side circuit breaker filter


@Provider
public class ContainerCircutBreakerFilter implements ContainerRequestFilter, ContainerResponseFilter {
    //..
    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        String scope = resourceInfo.getResourceClass().getName() + "#" + resourceInfo.getResourceClass().getName();
        if (!circuitBreakerRegistry.get(scope).isRequestAllowed()) {
            throw new CircuitOpenedException("circuit is open");
        }
        Transaction transaction = metricsRegistry.transactions(scope).openTransaction();
        requestContext.setProperty(TRANSACTION, transaction);
    }
    //..
}

Note that in contrast to the client-side HealthPolicy the server-side example uses an OverloadBasedHealthPolicy. Here, an operation will be seen as erroneous when all threads of the worker pool are active, more than 80 percent of the threads are consumed by the dedicated operation, and the maximum slow latency threshold is exceeded. The overload-based health policy is shown below.

Listing 8. Server-side OverloadBasedHealthPolicy


public class OverloadBasedHealthPolicy implements HealthPolicy  {
    private final Environment environment;
    //...
    @Override
    public boolean isHealthy(String scope) {
        // [1] all servlet container threads busy?
        Threadpool pool = environment.getThreadpoolUsage();
        if (pool.getCurrentThreadsBusy() >= pool.getMaxThreads()) {
            TransactionMetrics metrics = metricsRegistry.transactions(scope);
            // [2] more than 80% currently consumed by this operation?
            if (metrics.running().size() > (pool.getMaxThreads() * 0.8)) {
                // [3] is 50percentile higher than slow threshold?
                Duration current50percentile = metrics.ofLast(Duration.ofMinutes(3)).percentile(50);
                if (thresholdSlowTransaction.minus(current50percentile).isNegative()) {
                    return false;
                }
            }
        }
        return true;
    }
}

'Handshaking' pattern

The Circuit Breaker pattern is an all-or-nothing approach. Depending on the quality and granularity of the recorded metrics an alternative is to detect an overload situation in advance. If an impending overload is detected, the client can be signaled to reduce requests. In the Handshaking pattern a server communicates with the client in order to control its own workload.

An approach to the Handshaking pattern is for the server to provide regular system health updates via a load balancer. The load balancer could use a health-check URI like http://myserver/paymentservice/~health to decide to which server requests should be forwarded. For security reasons health-check pages are generally not provided for public internet access, so the scope of health checks is limited to company internal communication.

An alternative to this pull approach is to implement a server push approach by adding a flow-control header to the response. This enables the server to control its load on a per-client base. In this case the client must be identified. In Listing 9 I've added a proprietary client ID request header as well as a proprietary flow-control response header.

Listing 9. Flow control header for a Handshaking filter


@Provider
public class HandshakingFilter implements ContainerRequestFilter, ContainerResponseFilter {
    // ...
    @Override
    public void filter(ContainerRequestContext requestContext) throws IOException {
        String clientId = requestContext.getHeaderString("X-Client-Id");
        requestContext.setProperty(TRANSACTION, metricsRegistry.transactions(clientId).openTransaction());
    }
    @Override
    public void filter(ContainerRequestContext requestContext, ContainerResponseContext responseContext) throws IOException {
        String clientId = requestContext.getHeaderString("X-Client-Id");
        if (flowController.isVeryHighRequestRate(clientId)) {
            responseContext.getHeaders().add("X-FlowControl-Request", "reduce");
        }
        Transaction.close(requestContext.getProperty(TRANSACTION), responseContext.getStatus() >= 500);
    }
}

In this example the server signals the client to reduce requests if a given metric threshold is exceeded. The metrics are recorded using the client ID. Using the client ID for metric enables us to allow a specific client a quota of resources. Typically, the client will respond to a reduce request directive by switching off functionality such as pre-fetching or suggesting functions that require backend requests.

'Bulkheads' pattern

In the industrial world, a bulkhead is used to partition a ship or aircraft into sections, so that sections can be sealed off if there is a hull breach. You can use bulkheads similarly in software systems, to partition your system and protect it against cascading errors. Essentially, a bulkhead assigns limited resources to specific (groups of) clients, applications, operations, client endpoints, and so on.

Bulkheads in a RESTful system

There are a number of ways to set up bulkheads, or system partitions, as I will demonstrate below.

Resources-per-client is an approach to the Bulkheads pattern that sets up individual clusters for specific clients. For instance, Figure 4 is a diagram of a new, mobile version of the Webshop application. Partitioning the mobile WebshopApp ensures that an influx of mobile status requests will not have negative side-effects for the original Webshop application. Any system failures caused by new incoming requests to the mobile app would be limited to the mobile channel.

jw stability patterns fig4

Figure 4. Mobile WebshopApp

1 2 3 Page 2
Page 2 of 3