Jul 12, 2016 12:41 PM PT

Clustering with Docker Swarm

Build a two-node enterprise cluster with Docker Swarm

This installment of Open source Java projects introduces Java developers to Docker Swarm. You'll learn why so many enterprise shops have adopted container-managed development via Docker, and why clustering is an important technique for working with Docker containers. You'll also find out how two popular Docker clustering technologies--Amazon ECS and Docker Swarm--compare, and get a quick guide to choosing the right solution for your shop or project. The introduction concludes with a hands-on demonstration of using Docker Swarm to develop and manage a two-node enterprise cluster.

What's the deal with Docker?

Docker is an open platform for building, shipping, and running distributed applications. Dockerized applications can run locally on a developer's machine, and they can be deployed to production across a cloud-based infrastructure. Docker lends itself to rapid development and enables continuous integration and continuous deployment like almost no other technology does. Because of these features, it's a platform that every developer should know how to use.

It's essential to understand that Docker is a containerization technology, not a virtualization technology. Whereas a virtual machine contains a complete operating system and is managed by a heavyweight process called a hypervisor, a container is designed to be very lightweight and self-contained. Each server runs a daemon process called a Docker engine that runs containers and translates operating system calls inside the container into native calls on the host operating system. A container, which is analogous to a virtual machine, only much smaller, hosts your application, runtime environment, and a barebones operating system. Containers typically run on virtual machines. Whereas a virtual machine can take minutes to startup, a container can do it in seconds.

Figure 1 illustrates the difference between a container and a virtual machine.

Figure 1. Docker vs a virtual machine

Docker containers are self-contained, which means that they include everything that they need to run your application. For example, for a web application running in Tomcat, the container would include:

  • A WAR file
  • Tomcat
  • JVM
  • The base operating system

Figure 2 shows the architecture of a web app inside a Docker container.

Figure 2. A Tomcat app running in a container

In the case of Docker, each virtual machine runs a daemon process called the Docker engine. You build your application, such as your WAR file, and then create a corresponding Dockerfile. A Dockerfile is a text file that describes how to build a Docker image, which is a binary file containing everything needed to run the application. As an example, you could build a Dockerfile from a Tomcat base image containing a base Linux OS, Java runtime, and Tomcat. After instructing Docker to copy a WAR file to Tomcat's webapps directory, the Dockerfile would be compiled into a Docker image consisting of the base OS, JVM, Tomcat, and your WAR file. You can run the Docker image locally, but you will ultimately publish it to a Docker repository, like DockerHub.

While a Docker Image is a binary version of your container, a runtime instance of a Docker Image is called a Docker container. Docker containers are run by your Docker engine. The machine that runs your Docker engine is called the Docker host; this could be your local laptop or a cloud platform, depending on the scale of your application.

The basics in this section provide a foundation for understanding why clustering is an important addition to your Docker toolkit. See my introduction to Docker for more.

Clustering Docker

Most developers getting started with Docker will build a Dockerfile and run it locally on a laptop. But there's more to container managed development than running individual Docker containers locally. Docker's superpower is its ability to dynamically scale containers up or down. In production, this means running Docker in a cluster across a host of machines or virtual machines.

Various Docker clustering technologies are available, but the two most popular are Amazon EC2 Container Service (ECS) and Docker Swarm.

Amazon ECS

Amazon's Docker clustering technology leverages Amazon Web Services (AWS) to create a cluster of virtual machines that can run Docker containers. An ECS cluster consists of managed ECS instances, which are EC2 instances with a Docker engine and an ECS agent. ECS uses an autoscaling group to expand and contract the number of instances based on CloudWatch policies. For example, when the average CPU usage of the ECS instances is too high, you can request ECS to start more instances, up to the maximum number of instances defined in the autoscaling group.

Docker containers are managed by an ECS service and configured by the amount of compute capacity (CPU) and RAM that the container needs to run. The ECS service has an associated Elastic Load Balancer (ELB). As it starts and stops Docker containers, the ECS service registers and deregisters those containers with the ELB. Once you've set up the rules for your cluster, Amazon ECS ensures that you have the desired number of containers running and those containers are all accessible through the ELB. Figure 3 shows a high-level view of Amazon ECS.

Figure 3. High-level ECS overview

It is important to distinguish between ECS instances and tasks. The ECS cluster manages your ECS instances, which are special EC2 instances that run in an autoscaling group. The ECS service manages the tasks, which can contain one or more Docker containers, and which run on the cluster. An ELB sits in front of the ECS instances that are running your Docker containers and distributing load to your Docker containers. The relationship between ECS tasks and Docker containers is that a task definition tells the ECS service which Docker containers to run and the configuration of those containers. The ECS service runs the task, which starts the Docker containers.

Docker Swarm

Docker's native clustering technology, Docker Swarm allows you to run multiple Docker containers across a cluster of virtual machines. Docker Swarm defines a manager container that runs on a virtual machine that manages the environment, deploys containers to the various agents, and reports the container status and deployment information for the cluster.

When running a Docker Swarm, the manager is the primary interface into Docker. Agents are "docker machines" running on virtual machines that register themselves with the manager and run Docker containers. When the client sends a request to the manager to start a container, the manager finds an available agent to run it. It uses a least-utilized algorithm to ensure that the agent running the least number of containers will run the newly requested container. Figure 4 shows a sample Docker Swarm configuration, which you'll develop in the next section.

Figure 4. A Docker Swarm configuration

The manager process knows about all the active agents and the containers running on those agents. When the agent virtual machines start up, they register themselves with the manager and are then available to run Docker containers. The example in Figure 4 has two agents (Agent1 and Agent2) that are registered with the manager. Each agent is running two Nginx containers.

Docker Swarm vs Amazon ECS

This article features Docker Swarm, but it's useful to compare container technologies. Whereas Amazon ECS offers a well developed turnkey solution, Docker Swarm gives you the freedom to configure more of your own infrastructure. As an example, Amazon ECS manages both containers and load balancers, while in Docker Swarm you would configure a load balancing solution such as Cisco LocalDirector, F5 BigIp, or an Apache or Nginx software process.

If you're already running your app in AWS, then ECS makes it much easier to run and manage Docker containers than an external solution would. As an AWS developer, you're probably already leveraging autoscaling groups, ELBs, virtual private clouds (VPC), identity and access management (IAM) roles and policies, and so forth. ECS integrates well with all of them, so it's the way to go. But if you aren't running in AWS, then Docker Swarm's tight integration with the Docker tools makes it a great choice.

Getting Started with Docker Swarm

In the previous section you saw a sample architecture for a two-node Docker Swarm cluster. Now you'll develop that cluster using two Nginx Docker container instances. Nginx is a popular web server, publicly available as a Docker image on DockerHub. Because this article is focused on Docker Swarm, I wanted to use a Docker container that it quick and easy to start and straightforward to test. You are free to use any Docker container you wish, but for illustrative purposes I chose Nginx for this example.

My introduction to Docker includes a guide to setting up Docker in your development environment. If you've installed and setup the Docker Toolbox then it includes everything that you need to run Docker Swarm. See Docker's official documentation for further setup instructions.

Docker Swarm on the command line

If you've previously used Docker, then you're familiar with using the docker command-line to start and stop containers. When using Docker Swarm, you'll trade docker for docker-machine. Docker Machine is defined as follows in the Docker documentation:

Docker Machine is a tool that lets you install Docker Engine on virtual hosts, and manage the hosts with docker-machine commands. You can use Machine to create Docker hosts on your local Mac or Windows box, on your company network, in your data center, or on cloud providers like AWS or Digital Ocean. Using docker-machine commands, you can start, inspect, stop, and restart a managed host, upgrade the Docker client and daemon, and configure a Docker client to talk to your host.

If you've installed Docker then your installation already includes Docker Machine. To get started with Docker Swarm, start Docker and open a terminal on your computer. Execute the following docker-machine ls command to list all the VMs on your local machine:


$ docker-machine ls
NAME         ACTIVE   DRIVER       STATE     URL                         SWARM
default    *        virtualbox   Running   tcp://192.168.99.100:2376

If you've only run Docker from your local machine, then you should have the default Docker virtual machine running with an IP address of 192.168.99.100. To conserve resources on your local machine you can stop this virtual machine by executing: docker-machine stop default.

Create a swarm

A Docker swarm consists of two or virtual machines running Docker instances. For this demo, we'll create three new virtual machines: manager, agent1, and agent2. Create your virtual machines using the docker-machine create command:

$ docker-machine create -d virtualbox manager
$ docker-machine create -d virtualbox agent1
$ docker-machine create -d virtualbox agent2

The docker-machine create command creates a new "machine." Passing it the -d argument lets you specify the driver to use to create the machine. Running locally, that should be virtualbox. The first machine created is the manager, which will host the manager process. The last two machines, agent1 and agent2, are the agent machines that will host the agent processes.

At this point, you've created the virtual machines but you haven't created the actual Swarm manager or agents. To view the virtual machines and their state execute the docker-machine ls command:



$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
agent1    -        virtualbox   Running   tcp://192.168.99.101:2376           v1.11.1
agent2    -        virtualbox   Running   tcp://192.168.99.102:2376           v1.11.1
default   -        virtualbox   Stopped                                       Unknown
manager   *        virtualbox   Running   tcp://192.168.99.100:2376           v1.11.1

You now have three running machines: manager, agent1, and agent2, as well as one stopped machine: default. Note the asterisk in the manager's ACTIVE column. This means that all Docker commands will be sent to the manager. You'll learn how to change the active machine a little later, when you setup the environment.

Create a discovery token

Next you'll need to obtain a Swarm discovery token. The discovery token is a unique identifier for your Swarm cluster. You'll use it to start your manager and to connect your agents to your manager. This discovery token should only be used in test environments; production deployments are a little more complex. Create a discovery token by running the Swarm container and passing it the create command, as follows:



$ docker run --rm swarm create
Unable to find image 'swarm:latest' locally
latest: Pulling from library/swarm
eada7ab697d2: Pull complete
afaf40cb2366: Pull complete
7495da266907: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:12e3f7bdb86682733adf5351543487f581e1ccede5d85e1d5e0a7a62dcc88116
Status: Downloaded newer image for swarm:latest
7c14cbf2a86ecd490a7ea7ae4b795a6b

Notes about this command:

  • The docker run command launches the specified Docker image, which in this case is swarm, or more specifically swarm:latest.
  • The command that you've passed to the Swarm container is create, which is defined on the container and tells the Swarm application to connect to the DockerHub discovery service and retrieve a unique Swarm ID, the discovery token.
  • The -rm argument tells Docker to automatically remove the container when it exits. This command can be read: run the latest version of the swarm container, execute the create command, and, when the it completes, remove the swarm container from the local machine.
  • The last line in the output is the discovery token, which in this example is 7c14cbf2a86ecd490a7ea7ae4b795a6b.

Save your discovery token in a safe place: you'l need it for the next step.

Run the Swarm manager and agents

Next you'll start the Swarm manager and create agents to join your Swarm cluster. Both activities are accomplished by launching the swarm container and passing different arguments. The manager is already the "active" machine. Create the Swarm cluster manager with the following command:



$ docker run -d -p 3376:3376 -t -v ~/.docker/machine/machines/manager:/certs:ro swarm manage -H 0.0.0.0:3376
  --tlsverify
  --tlscacert=/certs/ca.pem
  --tlscert=/certs/server.pem
  --tlskey=/certs/server-key.pem
  token://7c14cbf2a86ecd490a7ea7ae4b795a6b

This command runs the swarm container with the following configuration:

  • -d (or --detach): run the swarm container in background and print its container ID after it starts
  • -t: allocate a pseudo-TTY terminal output
  • -p: map port 3376 on the Docker Container to port 3376 on the Docker Host (your local laptop). This is the default port that the docker command expects to connect to
  • -v: mount the local volume (~/.docker/machine/machines/manager) on the container at the specified location (/certs) with read-only access (ro)

Depending on your version of Docker and your operating system, your certificates directory may be in a different location: this is was the only problem that I ran into when starting the Swarm manager. On a Mac, Docker creates a .docker folder in your home directory. When you created your manager virtual machine, it created its configuration at the location: machine/machines/manager. If you run into problems finding files during startup for files such as server.key or server-key.pem, locate those files in your own Docker configuration and update your volume mounting accordingly:

  • manage: you've seen that the swarm Docker container has a create argument that starts the Swarm container, connects to DockerHub to obtain a discovery token, and then exits. Here you can see the manage argument in use. The manage argument tells the swarm container to run in "manage" mode, which essentially means that it will start your Swarm manager process.
  • -H: The -H argument tells Swarm what host and port to bind to, which in this case is localhost:3376.
  • tls: The various tls arguments tell the Swarm manager where to find certificate files for TLS (HTTPS) communication.
  • token: The token argument references the discovery token that we created earlier. (Be sure to use the token that you created and not the one that I created for this example!)

Once you have the Swarm manager running, your next step is to start your agents and tell them to join the cluster. Accomplish this by running the Swarm container in "join" mode. Before you do that, you need to tell your local docker command-line client to send commands to the "agent1" machine that you created earlier. Do so with the command:

$ eval $(docker-machine env agent1)

This command tells the docker client to send all docker commands to the Docker engine running on the "agent1" machine. Now run the following command to start agent1:



$ docker run -d swarm join --addr=$(docker-machine ip agent1):2376 token://7c14cbf2a86ecd490a7ea7ae4b795a6b
Unable to find image 'swarm:latest' locally
latest: Pulling from library/swarm
eada7ab697d2: Pull complete
afaf40cb2366: Pull complete
7495da266907: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:12e3f7bdb86682733adf5351543487f581e1ccede5d85e1d5e0a7a62dcc88116
Status: Downloaded newer image for swarm:latest
99c5ec703dc3230fcf769eb13e639079803ee36c33447a0290a2fb7ffe5e7952

This command, similar to the previous one, tells Docker to run the swarm container, but this time in "join" mode. You'll tell it to run in detached mode (-d) and pass it two arguments:

  • --addr: The address and port of the agent, which is used to advertise the presence of the agent to the manager.
  • token: the discovery token that we created earlier and used to start the manager.

With agent1 running, execute the same command to start agent2:



$ eval $(docker-machine env agent2)
$ docker run -d swarm join --addr=$(docker-machine ip agent2):2376 token://7c14cbf2a86ecd490a7ea7ae4b795a6b
Unable to find image 'swarm:latest' locally
latest: Pulling from library/swarm
eada7ab697d2: Pull complete
afaf40cb2366: Pull complete
7495da266907: Pull complete
a3ed95caeb02: Pull complete
Digest: sha256:12e3f7bdb86682733adf5351543487f581e1ccede5d85e1d5e0a7a62dcc88116
Status: Downloaded newer image for swarm:latest
0b16ee511399c27d849c6a6c628822375c27755b14719b5295c9038f97ede72a

At this point, the manager and both agents should be running. Next you'll configure the docker client to connect to the Docker Swarm manager and retrieve information about the environments. First, set your DOCKER_HOST environment variable to point to the manager Docker machine:

$ DOCKER_HOST=$(docker-machine ip manager):3376

You can retrieve the manager's IP address by executing the docker-machine ip command and passing it the name of the machine for which you want to retrieve the IP address (manager). Likewise, on Windows you can execute the SET DOCKER_HOST command to setup the environment. With the DOCKER_HOST environment set, you can now retrieve information about your Swarm cluster by executing the docker info command:



$ docker info
Containers: 2
 Running: 2
 Paused: 0
 Stopped: 0
Images: 2
Server Version: swarm/1.2.2
Role: primary
Strategy: spread
Filters: health, port, containerslots, dependency, affinity, constraint
Nodes: 2
 agent1: 192.168.99.101:2376
  - ID: RDNQ:VD3I:AZPE:LSWW:7NND:XV7C:KHGH:5KR5:MZHG:4I7H:7RMU:XGQG
  - Status: Healthy
  - Containers: 1
  - Reserved CPUs: 0 / 1
  - Reserved Memory: 0 B / 1.021 GiB
  - Labels: executiondriver=, kernelversion=4.4.8-boot2docker, operatingsystem=Boot2Docker 1.11.1 (TCL 7.0); HEAD : 7954f54 - Wed Apr 27 16:36:45 UTC 2016, provider=virtualbox, storagedriver=aufs
  - Error: (none)
  - UpdatedAt: 2016-05-22T19:03:35Z
  - ServerVersion: 1.11.1
 agent2: 192.168.99.102:2376
  - ID: DXN7:FLLA:RMDW:HSPS:WT74:YM2I:CM3G:QBY7:FR7G:4WEO:LJ72:XB6L
  - Status: Healthy
  - Containers: 1
  - Reserved CPUs: 0 / 1
  - Reserved Memory: 0 B / 1.021 GiB
  - Labels: executiondriver=, kernelversion=4.4.8-boot2docker, operatingsystem=Boot2Docker 1.11.1 (TCL 7.0); HEAD : 7954f54 - Wed Apr 27 16:36:45 UTC 2016, provider=virtualbox, storagedriver=aufs
  - Error: (none)
  - UpdatedAt: 2016-05-22T19:03:32Z
  - ServerVersion: 1.11.1
Plugins:
 Volume:
 Network:
Kernel Version: 4.4.8-boot2docker
Operating System: linux
Architecture: amd64
CPUs: 2
Total Memory: 2.042 GiB
Name: 77d61b0fe67f
Docker Root Dir:
Debug mode (client): false
Debug mode (server): false
WARNING: No kernel memory limit support

From this output you can see that you're running two containers (Swarm containers running in "join" mode) for your two agents and those agents are healthy. Note that these are the Docker machine containers and not your custom Docker containers, such as Nginx Docker containers. You can view the running custom containers by executing the docker ps command:



$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

As expected, no custom containers are running.

Running containers in Docker Swarm

Thus far you've created three Docker machines and created a discovery token for your cluster. You've started one instance of the swarm container in "manage" mode and two instances of the swarm container in "join" mode. You have a running cluster, but it's not yet running any containers. In this section you'll start a Nginx container and connect to it. Begin with the following command:



$ docker run -d -p 80:80 nginx
cc6d627873f7b33f910129fafdcc5c544048cc864ef5433e667afc9a88632931

In this example, you run an nginx container (note that omitting a version defaults to latest) in detached mode and bind port 80 on the container to port 80 on the Docker host. Use the docker client to view your running container:



$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                NAMES
cc6d627873f7        nginx               "nginx -g 'daemon off"   28 seconds ago      Up 27 seconds       192.168.99.101:80->80/tcp, 443/tcp   agent1/goofy_bassi

The Docker manager deployed the Nginx container to agent1. We can connect to it by opening a browser to the following URL: http://192.168.99.101/.

If successful you should see a website similar to the interface in Figure 5.

Figure 5. A running Nginx container

To complete this example, let's startup a second Nginx container and review its deployment:


$ docker run -d -p 80:80 nginx

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED              STATUS              PORTS                                NAMES
737d5d37d5a6        nginx               "nginx -g 'daemon off"   About a minute ago   Up About a minute   192.168.99.102:80->80/tcp, 443/tcp   agent2/condescending_galileo
cc6d627873f7        nginx               "nginx -g 'daemon off"   3 minutes ago        Up 3 minutes        192.168.99.101:80->80/tcp, 443/tcp   agent1/goofy_bassi

As you can see, Swarm deployed the first Nginx container to agent1 and the second to agent2. The algorithm it uses to deploy containers is to search for the agent with the least number of containers and deploy the newly requested container there. You can connect to the new instance at the following URL: http://192.168.99.102/.

You now have a Swarm cluster with two agents running two Nginx containers. Verify that you can access both. When you're finished, you can clean up your environment by stopping containers as you normally would, with the docker stop command:



$ docker stop 737
737

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                                NAMES
cc6d627873f7        nginx               "nginx -g 'daemon off"   6 minutes ago       Up 6 minutes        192.168.99.101:80->80/tcp, 443/tcp   agent1/goofy_bassi

$ docker stop cc6
cc6

$ docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES

Both Nginx containers are now stopped. You can stop Docker Swarm using the docker-machine stop command:



$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM   DOCKER    ERRORS
agent1    -        virtualbox   Running   tcp://192.168.99.101:2376           v1.11.1
agent2    -        virtualbox   Running   tcp://192.168.99.102:2376           v1.11.1
default   -        virtualbox   Stopped                                       Unknown
manager   -        virtualbox   Running   tcp://192.168.99.100:2376           v1.11.1

$ docker-machine stop agent1
Stopping "agent1"...
Machine "agent1" was stopped.

$ docker-machine stop agent2
Stopping "agent2"...
Machine "agent2" was stopped.

$ docker-machine stop manager
Stopping "manager"...
Machine "manager" was stopped.

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL   SWARM   DOCKER    ERRORS
agent1    -        virtualbox   Stopped                 Unknown
agent2    -        virtualbox   Stopped                 Unknown
default   -        virtualbox   Stopped                 Unknown
manager   -        virtualbox   Stopped                 Unknown

At this point the containers have been stopped as well as the Docker Swarm machines. You can restart those as you wish, using the discovery token that you created earlier.

Conclusion

This article provided an overview of Docker Swarm and demonstrated how to create a Swarm cluster on your local machine. In began with a primer on Docker itself, then reviewed the two most popular Docker clustering technologies, namely Amazon ECS and Docker Swarm, and then it demonstrated, step-by-step, how to create a Docker Swarm cluster. Through the process we discussed Swarm discovery tokens as the unique identifier for a cluster, managers that manage the cluster and deploy containers to agents, and agents that run containers. At this point you should understand Docker Swarm with enough detail to setup and run a local environment.

From here, I recommend that you review the online documentation about running Docker Swarm in production. Production deployment could warrant a full article in and of itself, but the core concepts are similar: run a virtual machine with the Docker Engine, run a discovery backend (their example uses Consul), run the manager by starting the swarm Docker container in "manage" mode, run agents (or nodes as they call them) by starting swarm containers in "join" mode, and deploy containers to the manager, which in turn will distribute them to agents in the cluster.