This installment of Open source Java projects introduces Java developers to Docker Swarm. You'll learn why so many enterprise shops have adopted container-managed development via Docker, and why clustering is an important technique for working with Docker containers. You'll also find out how two popular Docker clustering technologies--Amazon ECS and Docker Swarm--compare, and get a quick guide to choosing the right solution for your shop or project. The introduction concludes with a hands-on demonstration of using Docker Swarm to develop and manage a two-node enterprise cluster.
What's the deal with Docker?
Docker is an open platform for building, shipping, and running distributed applications. Dockerized applications can run locally on a developer's machine, and they can be deployed to production across a cloud-based infrastructure. Docker lends itself to rapid development and enables continuous integration and continuous deployment like almost no other technology does. Because of these features, it's a platform that every developer should know how to use.
It's essential to understand that Docker is a containerization technology, not a virtualization technology. Whereas a virtual machine contains a complete operating system and is managed by a heavyweight process called a hypervisor, a container is designed to be very lightweight and self-contained. Each server runs a daemon process called a Docker engine that runs containers and translates operating system calls inside the container into native calls on the host operating system. A container, which is analogous to a virtual machine, only much smaller, hosts your application, runtime environment, and a barebones operating system. Containers typically run on virtual machines. Whereas a virtual machine can take minutes to startup, a container can do it in seconds.
Figure 1 illustrates the difference between a container and a virtual machine.
Docker containers are self-contained, which means that they include everything that they need to run your application. For example, for a web application running in Tomcat, the container would include:
- A WAR file
- The base operating system
Figure 2 shows the architecture of a web app inside a Docker container.
In the case of Docker, each virtual machine runs a daemon process called the Docker engine. You build your application, such as your WAR file, and then create a corresponding Dockerfile. A Dockerfile is a text file that describes how to build a Docker image, which is a binary file containing everything needed to run the application. As an example, you could build a Dockerfile from a Tomcat base image containing a base Linux OS, Java runtime, and Tomcat. After instructing Docker to copy a WAR file to Tomcat's webapps directory, the Dockerfile would be compiled into a Docker image consisting of the base OS, JVM, Tomcat, and your WAR file. You can run the Docker image locally, but you will ultimately publish it to a Docker repository, like DockerHub.
While a Docker Image is a binary version of your container, a runtime instance of a Docker Image is called a Docker container. Docker containers are run by your Docker engine. The machine that runs your Docker engine is called the Docker host; this could be your local laptop or a cloud platform, depending on the scale of your application.
The basics in this section provide a foundation for understanding why clustering is an important addition to your Docker toolkit. See my introduction to Docker for more.
Most developers getting started with Docker will build a Dockerfile and run it locally on a laptop. But there's more to container managed development than running individual Docker containers locally. Docker's superpower is its ability to dynamically scale containers up or down. In production, this means running Docker in a cluster across a host of machines or virtual machines.
Various Docker clustering technologies are available, but the two most popular are Amazon EC2 Container Service (ECS) and Docker Swarm.
Amazon's Docker clustering technology leverages Amazon Web Services (AWS) to create a cluster of virtual machines that can run Docker containers. An ECS cluster consists of managed ECS instances, which are EC2 instances with a Docker engine and an ECS agent. ECS uses an autoscaling group to expand and contract the number of instances based on CloudWatch policies. For example, when the average CPU usage of the ECS instances is too high, you can request ECS to start more instances, up to the maximum number of instances defined in the autoscaling group.
Docker containers are managed by an ECS service and configured by the amount of compute capacity (CPU) and RAM that the container needs to run. The ECS service has an associated Elastic Load Balancer (ELB). As it starts and stops Docker containers, the ECS service registers and deregisters those containers with the ELB. Once you've set up the rules for your cluster, Amazon ECS ensures that you have the desired number of containers running and those containers are all accessible through the ELB. Figure 3 shows a high-level view of Amazon ECS.
It is important to distinguish between ECS instances and tasks. The ECS cluster manages your ECS instances, which are special EC2 instances that run in an autoscaling group. The ECS service manages the tasks, which can contain one or more Docker containers, and which run on the cluster. An ELB sits in front of the ECS instances that are running your Docker containers and distributing load to your Docker containers. The relationship between ECS tasks and Docker containers is that a task definition tells the ECS service which Docker containers to run and the configuration of those containers. The ECS service runs the task, which starts the Docker containers.
Docker's native clustering technology, Docker Swarm allows you to run multiple Docker containers across a cluster of virtual machines. Docker Swarm defines a manager container that runs on a virtual machine that manages the environment, deploys containers to the various agents, and reports the container status and deployment information for the cluster.
When running a Docker Swarm, the manager is the primary interface into Docker. Agents are "docker machines" running on virtual machines that register themselves with the manager and run Docker containers. When the client sends a request to the manager to start a container, the manager finds an available agent to run it. It uses a least-utilized algorithm to ensure that the agent running the least number of containers will run the newly requested container. Figure 4 shows a sample Docker Swarm configuration, which you'll develop in the next section.
The manager process knows about all the active agents and the containers running on those agents. When the agent virtual machines start up, they register themselves with the manager and are then available to run Docker containers. The example in Figure 4 has two agents (Agent1 and Agent2) that are registered with the manager. Each agent is running two Nginx containers.
Docker Swarm vs Amazon ECS
This article features Docker Swarm, but it's useful to compare container technologies. Whereas Amazon ECS offers a well developed turnkey solution, Docker Swarm gives you the freedom to configure more of your own infrastructure. As an example, Amazon ECS manages both containers and load balancers, while in Docker Swarm you would configure a load balancing solution such as Cisco LocalDirector, F5 BigIp, or an Apache or Nginx software process.
If you're already running your app in AWS, then ECS makes it much easier to run and manage Docker containers than an external solution would. As an AWS developer, you're probably already leveraging autoscaling groups, ELBs, virtual private clouds (VPC), identity and access management (IAM) roles and policies, and so forth. ECS integrates well with all of them, so it's the way to go. But if you aren't running in AWS, then Docker Swarm's tight integration with the Docker tools makes it a great choice.
Getting Started with Docker Swarm
In the previous section you saw a sample architecture for a two-node Docker Swarm cluster. Now you'll develop that cluster using two Nginx Docker container instances. Nginx is a popular web server, publicly available as a Docker image on DockerHub. Because this article is focused on Docker Swarm, I wanted to use a Docker container that it quick and easy to start and straightforward to test. You are free to use any Docker container you wish, but for illustrative purposes I chose Nginx for this example.
My introduction to Docker includes a guide to setting up Docker in your development environment. If you've installed and setup the Docker Toolbox then it includes everything that you need to run Docker Swarm. See Docker's official documentation for further setup instructions.
Docker Swarm on the command line
If you've previously used Docker, then you're familiar with using the
docker command-line to start and stop containers. When using Docker Swarm, you'll trade
docker-machine. Docker Machine is defined as follows in the Docker documentation:
Docker Machine is a tool that lets you install Docker Engine on virtual hosts, and manage the hosts with docker-machine commands. You can use Machine to create Docker hosts on your local Mac or Windows box, on your company network, in your data center, or on cloud providers like AWS or Digital Ocean. Using docker-machine commands, you can start, inspect, stop, and restart a managed host, upgrade the Docker client and daemon, and configure a Docker client to talk to your host.
If you've installed Docker then your installation already includes Docker Machine. To get started with Docker Swarm, start Docker and open a terminal on your computer. Execute the following
docker-machine ls command to list all the VMs on your local machine:
$ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM default * virtualbox Running tcp://192.168.99.100:2376
If you've only run Docker from your local machine, then you should have the default Docker virtual machine running with an IP address of
192.168.99.100. To conserve resources on your local machine you can stop this virtual machine by executing:
docker-machine stop default.
Create a swarm
A Docker swarm consists of two or virtual machines running Docker instances. For this demo, we'll create three new virtual machines: manager, agent1, and agent2. Create your virtual machines using the
docker-machine create command:
$ docker-machine create -d virtualbox manager $ docker-machine create -d virtualbox agent1 $ docker-machine create -d virtualbox agent2
docker-machine create command creates a new "machine." Passing it the
-d argument lets you specify the driver to use to create the machine. Running locally, that should be
virtualbox. The first machine created is the
manager, which will host the manager process. The last two machines,
agent2, are the agent machines that will host the agent processes.
At this point, you've created the virtual machines but you haven't created the actual Swarm manager or agents. To view the virtual machines and their state execute the
docker-machine ls command: