Building cloud-ready, multicore-friendly applications, Part 2: Mechanics of the cloud

Orient yourself and your applications in the cloud

In the first half of this article you learned the four attributes that your code must have to take advantage of multicore computers and cloud service platforms. But once deployed to the cloud, what makes your applications soar? Appistry's Guerry Semones brings the cloud down to earth with this overview of the mechanics of scalability, reliability, load balancing, and more, in cloud computing's distributed environments.

In the first half of this article you learned about four important attributes your code needs in order to run most effectively on multicore computers, or in multi-computer environments like the cloud:

  • Atomicity
  • Statelessness
  • Idempotence
  • Parallelism

But how exactly do these features help you take advantage of cloud platforms? Applications in the cloud inherit capabilities from the underlying cloud architecture -- capabilities like scaling out horizontally, scaling up across multiple cores, availability, reliability, manageability, load balancing, and command and control. I touched on these benefits in my previous article; in this one, I'll explain how cloud platforms deliver these benefits to your code.

First, let's make sure we have a shared understanding of what exactly is meant by a cloud platform; then we'll be able to talk about the benefits of cloud computing to architects and developers.

What is a cloud platform?

First, you need to orient yourself in the cloud. Figure 1 categorizes different cloud technologies into simple architectural layers. The breakdown is not perfect, as some products may touch more than one layer, but it's a fine starting point.

Layers of a cloud platform
Figure 1. Layers of a cloud platform

The infrastructure-as-a-service cloud

Infrastructure-oriented cloud architectures, including infrastructure-as-a-service (IAAS) offerings, provide access to virtualized, on-demand computing resources. Amazon EC2 is a well-known example of this approach. The user can request that Linux and Windows virtual machine instances be created on the fly and billed based on actual usage. The cloud infrastructure allows the user to manage virtual machines (and associated resources, like IP addresses) and their configurations. With EC2, clients do not know where the machines are physically located or what kind of hardware is being used. This is what makes the service cloud-like.

Cloud platforms vs. platform-as-a-service (PAAS)

Platform-oriented approaches to the cloud, including platform-as-a-service (PAAS) and cloud application platforms, run atop an underlying cloud infrastructure. Cloud platforms abstract applications away from the cloud infrastructure and provide supporting services and functionality to those applications. The distinction between cloud infrastructure and cloud platforms is a critical one for architects and developers to understand.

Salesforce's Force.com and Google's App Engine (GAE) both typify the PAAS approach. Google App Engine users are solely concerned about the application they are creating to run on the platform. To deliver an application, they simply package it and deploy it to GAE The deployment happens in a single step and the end user does not know whether the application is being run on one virtual machine or 10 at any particular moment. In addition, the application can take advantage of special services provided by the GAE platform, such as authentication or data access.

Cloud application platforms, like their PAAS cousins, allow the developer to focus solely on the application deployed on the platform. Likewise, cloud application platforms offer the same or similar benefits described briefly for GAE above, such as virtualizing your application across the infrastructure, simplifying deployment, or providing special services. A key difference between some cloud application platforms and their PAAS cousins is portability across cloud infrastructures. For example, you can only deploy GAE applications on Google's services, whereas cloud application platforms like Appistry CloudIQ Platform allow for in-house private cloud deployment, as well as deployment on public cloud infrastructures. Among other differences, PAAS solutions often restrict tool choices, whereas typical cloud application platforms allow you flexibility in the choice of implementation languages, IDEs, and tools.

Ideally, you should not have to care about the underlying cloud infrastructure that runs your code. Likewise, you should not be concerned with writing application code to implement scalability, reliability, and other cloud and distributed computing features that a cloud platform could provide. Your focus should be on the business logic that brings your added value, while the cloud virtualizes your application, manages its lifecycle, and leverages your application over the underlying cloud infrastructure. Cloud platforms take your code -- which is ideally atomic, stateless (where possible), idempotent, and parallelizable -- and does the heavy distributed computing and multicore lifting, giving you benefits that are otherwise hard to achieve on your own.

Scaling out, scaling up, and scaling down gracefully

Cloud platforms horizontally scale out your application by running it across many servers, or workers. When transaction loads are high or you anticipate the need for more throughput, you can add more workers. When loads drop, workers can be shut down (offering green dividends by reducing power use) or shunted over to another application that needs the workers now.

Why should you care if you're a developer? If you have provided the cloud platform with a well-designed application, the cloud platform should be able to scale your application for you. Therefore, you don't have to write the scalability code. In most cloud platforms, your code doesn't know it's in the cloud, much less being scaled out.

What about scaling up across multiple cores to utilize all the available processing power? The same principles apply. If your code follows the principles outlined in Part 1, then the cloud platform can automatically scale the execution of your code across whatever cores are available without you having to use any special language primitives or tools. The ability to do this varies by the cloud platform.

If you run stateless, atomic code on a cloud platform your application should gain the resilience and ability to scale up and down gracefully. If you need more resources, you can add more nodes, and scale out horizontally; if your cloud platform utilizes multicore efficiently, you get to scale up across cores. If one or more nodes die, availability ensures that new work will get done, and reliability ensures that in-flight work has a chance to complete. Either way, you can scale down with a degree of grace, even in the face of hardware failures.

Availability

Cloud platforms distribute your code across the cloud in different ways. Some platforms put all of your code on every worker and can execute your code on any of those workers at any given time. Other platforms specify workers for given tasks or roles. Sometimes all of a transaction will occur on one worker. Other platforms may optionally distribute even the execution of a single transaction. Regardless of the model, cloud platforms make your application code highly available by distributing and managing it across multiple workers.

When your code is atomic and stateless in nature, it can then reside wherever the cloud platform puts it in the cloud. In an ideal setup, the code can execute anywhere without you or the code having to think about it. At its root, this means that you automatically have high availability. If a given compute node dies, who cares? The other nodes have the code and can fulfill transactions.

Reliability

What do I mean by reliability? Say you request code to execute, and something bad happens. If your code is reliable, the requested work still gets done; at the very least, the environment does its best to complete it instead of just giving up -- or, worse, losing the work entirely.

There are a number of models for attaining reliable execution in cloud platform environments. If the cloud platform is designed to provide reliability to your code, then you'll likely be allowed to declaratively configure (outside your code) how you want reliability to behave at runtime. Without a cloud platform that virtualizes and watches over your application, trying to write reliable, distributed applications from the ground up can be a lot of work to do yourself

Figure 2 illustrates one reliability model that directly shows the benefits of atomic, stateless, and idempotent code. Say you've requested that your code execute in the cloud, and a failure occurs. Perhaps the worker doing the work suffers a power supply failure. The cloud platform detects the loss of work, and, depending on packaging-time configuration, retries that work on a different worker instead of returning the failure immediately to the requester. The cloud platform then retries that work until success is achieved, or until some configured threshold is met and failure is returned.

Cloud platform retries failed work reliably
Figure 2. Cloud platform retries failed work reliably. (Click to enlarge.)

If your code takes advantage of the attributes of atomicity, statelessness, and idempotence, then you can have the flexibility to reach for reliability, especially if the environment leverages this functionality for you. Without these attributes, your options are narrowed. For example, consider atomicity in the reliability model just discussed. If the executed code encapsulates multiple non-atomic steps, then the complexity of retrying those steps goes way up. Likewise, if the code is a long-running series of steps, rather than stand-alone atomic steps, then a retry must rerun the entire series when failure happens, instead of just picking up at the step that failed.

Another approach to reliability in the cloud (besides retries and other approaches already discussed) is to execute duplicates of the same task in parallel. The task that completes first is accepted by the client, or the results of both are analyzed and one is chosen. This is illustrated in Figure 3.

Cloud platform executes same task twice in parallel; the first to complete wins
Figure 3. A cloud platform executes same task twice in parallel; the first to complete wins. (Click to enlarge.)

Of course, not all code is idempotent and repeatable, often because it affects state of some sort. In such a case, the cloud platform needs to be able to deal with that, preferably in an application-configurable manner. You'll see some possible solutions later in this article.

Manageability

Even as developers, we are affected by how difficult or easy it is to deploy and manage code in the runtime environment. When the runtime environment, even in development and testing, is distributed across multiple servers, the complexity and time to manage the application goes up dramatically. Cloud platforms take this into account -- more often than not because the developers that are creating and maintaining the cloud platform are affected by the same complexities!

Some cloud platforms allow you to code and test your application on one box rather than many, and some cloud application platforms allow you to develop most or all of your applications outside the cloud platform with your normal development and testing tools. (This is not true for many platform-as-a-service environments.)

Beyond this point, there are varying levels of difficulty in deploying and managing your application on the various cloud platforms. The worst-case scenario arises all too often, where you must manually deploy to each server or virtual machine directly, as illustrated in Figure 4.

Manually managing individual servers or virtual machines in the cloud
Figure 4. Manually managing individual servers or virtual machines in the cloud. (Click to enlarge.)

(In the discussion that follows, I'll be focusing on those feature sets that I consider easiest to deal with. Your mileage will vary based on the cloud platform you choose.)

Imagine that you have some code ready to run. Typically, you will package the application in some way, bundling with it configuration information that tells the cloud platform how you want the application managed. Next, you will deploy that application into the cloud platform with a single command. Some (but not all) cloud platforms will automatically distribute your application to all of its workers (or some workers, depending on the platform's model), and get your application up and running, as shown in Figure 5. You're done -- now use your client and access your cloud application.

Managing a cloud of servers or virtual machines as a single entity, and with a fine degree of granularity
Figure 5. Managing a cloud of servers or virtual machines as a single entity, and with a fine degree of granularity. (Click to enlarge.)

Subsequent versions of your code are handled the same way. You will usually repackage the code and redeploy it, probably with some mechanism for package versioning. The cloud platform will update the code for you.

Another difficulty arises from managing things at the granularity of a virtual machine image. In typical infrastructure-as-a-service cloud environments, you create a virtual machine image and populate that image with services, applications, and configurations. If something changes in your application's configuration, you revise the virtual machine image, a time-consuming and potentially error-laden process. The term image sprawl aptly describes the growing pool of images that result from this model.

1 2 Page 1
Page 1 of 2