Deploying Docker Containers on a Vagrant CoreOS Cluster with fleet

This is the second in a series of posts about CoreOS and Docker. You can read the first part here.

tl;dr

In this post I'll show you:

  1. How to run a three-node CoreOS cluster in Vagrant.
  2. How to run several services across a cluster with fleetctl.
  3. How fleet handles nodes joining and leaving the cluster.

Effectively we'll be re-creating what you see in this video, but in Vagrant: Cluster-Level Container Deployment with fleet.

Preparation

This post assumes that you understand the concepts in the previous post and that you've at least worked through the three steps in the Steps section, i.e. you have Vagrant and VirtualBox installed, as well as fleetctl, and have cloned the CoreOS Vagrant repository and set up the user-data file.

Let's cd to the coreos-vagrant repository that you cloned, re-export the tunnel environment variable, because you've probably changed shell sessions since then, and we'll also increase the number of instances that we want Vagrant to create. We'll also destroy any old VMs just to be sure we're starting afresh.

$ export FLEETCTL_TUNNEL=127.0.0.1:2222
$ export NUM_INSTANCES=3
$ cd <somewhere>/coreos-vagrant
$ vagrant destroy

The Power of CoreOS: etcd, fleet and cloud-config

In the previous post we got CoreOS running in a single Vagrant VM and ran a Docker container from the public registry on it. But we're not really taking advantage of what CoreOS has to offer- we could have done this on Ubuntu and not bothered with fleet and systemd at all.

In this post I'll walk you through running a three-node cluster (with Vagrant) in order to demonstrate where etcd and fleet really shine. But first, let's step back a little to explain some of the tools we're using in more detail.

etcd and fleet

CoreOS has some cool features to help you run pieces of software across a cluster. First, there is etcd, a highly-available key-value store for shared configuration and service discovery. CoreOS, like some other flavours of Linux, uses systemd to declare, start and stop services. CoreOS have added a tool called fleet (not to be confused with substack's Fleet which performs a similar function), which ties together systemd and etcd into a distributed init system. What this basically allows us to do is declare what we want to run across a cluster, spin up a bunch of hosts and have fleet and etcd sort it out for us.

Each machine in the cluster registers with etcd and discovers its peers, and fleet watches the discovery directory in etcd's key-value store and is able to react when machines come up or down in the cluster.

cloud-config

cloud-init is a specification for config files that handle early initialisation of cloud instances. CoreOS supports a subset of this spec, as well as several CoreOS-specific extensions. When we use cloud-init to bootstrap our CoreOS instances, we write what's called a cloud-config file. In the previous post, the user-data file we renamed was the cloud-config file.

Setting up a cluster of three nodes

Let's generate a new etcd discovery token for our new cluster and put it into the user-data cloud-config file:

$ DISCOVERY_TOKEN=`curl -s https://discovery.etcd.io/new` && perl -i -p -e "s@discovery: https://discovery.etcd.io/\w+@discovery: $DISCOVERY_TOKEN@g" user-data

Note: Don't re-use discovery tokens, they won't work. All boxes in your cluster need to use the same one, and if you restart them with the same token they will try to register again, which won't work. Generate a new token for your user-data file as above if you get errors with etcd complaining about this. For the sake of simplicity I won't go into how to avoid these issues in this blog post.

Now start up our cluster:

$ vagrant up
...
$ fleetctl list-machines
MACHINE     IP           METADATA  
acc6acb1... 172.17.8.101 -  
41ee561d... 172.17.8.102 -  
174291fb... 172.17.8.103 -  

If you see three machines then all is well.

Contrived services

In order to demonstrate three services running across a cluster I've created the three most contrived services imaginable, because I'm lazy. We'll be pulling these from the public Docker registry: lukebond/contrived-service-1, lukebond/contrived-service-2 and lukebond/contrived-service-3. They are nothing but webpages with "Hello World {1,2,3}" displayed in them, and use harp.js to serve the HTML. But use your imagination and you can picture a more useful cluster, perhaps a few database nodes, a bunch of API servers and a load balancer. You get the idea. I'll leave this as an exercise for the reader, as the principles are the same as what I'm outlining here.

I've created the following three unit files that you can grab from GitHub like so: $ git clone https://github.com/lukebond/blog-coreos-docker-2. Or you can copy and paste them from here:

$ cat service1/contrived-service-1.service
[Unit]
Description=Contrived Service 1  
Requires=docker.service  
After=docker.service

[Service]
ExecStart=/usr/bin/docker run --rm --name=contrived-service-1 -p 80:9000 lukebond/contrived-service-1  
ExecStartPost=/usr/bin/etcdctl set /domains/contrived-service-1/%H:%i running  
ExecStop=/usr/bin/docker stop contrived-service-1  
ExecStopPost=/usr/bin/etcdctl rm /domains/contrived-service-1/%H:%i

[X-Fleet]
X-Conflicts=contrived-service-*.service

$ cat service2/contrived-service-2.service
[Unit]
Description=Contrived Service 2  
Requires=docker.service  
After=docker.service

[Service]
ExecStart=/usr/bin/docker run --rm --name=contrived-service-2 -p 80:9000 lukebond/contrived-service-2  
ExecStartPost=/usr/bin/etcdctl set /domains/contrived-service-2/%H:%i running  
ExecStop=/usr/bin/docker stop contrived-service-2  
ExecStopPost=/usr/bin/etcdctl rm /domains/contrived-service-2/%H:%i

[X-Fleet]
X-Conflicts=contrived-service-2.service

$ cat service3/contrived-service-*.service
[Unit]
Description=Contrived Service 3  
Requires=docker.service  
After=docker.service

[Service]
ExecStart=/usr/bin/docker run --rm --name=contrived-service-3 -p 80:9000 lukebond/contrived-service-3  
ExecStartPost=/usr/bin/etcdctl set /domains/contrived-service-3/%H:%i running  
ExecStop=/usr/bin/docker stop contrived-service-3  
ExecStopPost=/usr/bin/etcdctl rm /domains/contrived-service-3/%H:%i

[X-Fleet]
X-Conflicts=contrived-service-*.service  

The [X-Fleet] section at the end of each file is a special CoreOS-specific extension to the systemd unit file syntax. It's telling fleet that any given instance of a contrived service (1, 2 or 3) cannot run on the same box as any other; i.e. one service per box. I've done this is order to demonstrate something later on, we'll come back to it.

The use of --rm in the docker run command tells Docker to remove the container after it has finished running. This

Cluster-level container deployment with fleet

Now we're going to submit our unit files to the cluster:

$ fleetctl submit service1/contrived-service-1.service service2/contrived-service-2.service service3/contrived-service-3.service
$ fleetctl list-units
UNIT                         LOAD  ACTIVE  SUB  DESC                 MACHINE  
contrived-service-1.service  -     -       -    Contrived Service 1  -  
contrived-service-2.service  -     -       -    Contrived Service 2  -  
contrived-service-3.service  -     -       -    Contrived Service 3  -  
$ fleetctl start contrived-service-{1,2,3}.service
Job contrived-service-1.service scheduled to acc6acb1.../172.17.8.101  
Job contrived-service-2.service scheduled to 41ee561d.../172.17.8.102  
Job contrived-service-3.service scheduled to acc6acb1.../172.17.8.101  
$ fleetctl list-units
UNIT                         LOAD    ACTIVE  SUB      DESC                 MACHINE  
contrived-service-1.service  loaded  active  running  Contrived Service 1  acc6acb1.../172.17.8.101  
contrived-service-2.service  loaded  active  running  Contrived Service 2  41ee561d.../172.17.8.102  
contrived-service-3.service  loaded  active  running  Contrived Service 3  acc6acb1.../172.17.8.101

fleetctl is telling us that the units loaded and started successfully and are now running, one service per node. Great! Remember that the images may take a little while to download.

If you want to see the logs of a particular service/unit, use this:

$ fleetctl journal contrived-service-1.service
-- Logs begin at Sun 2014-04-20 17:25:41 UTC, end at Sun 2014-04-20 19:06:06 UTC. --
Apr 20 18:45:46 core-02 systemd[1]: Starting Contrived Service 1...  
Apr 20 18:45:46 core-02 docker[3232]: Unable to find image 'lukebond/contrived-service-1' locally  
Apr 20 18:45:46 core-02 docker[3232]: Pulling repository lukebond/contrived-service-1  
Apr 20 18:45:46 core-02 systemd[1]: Started Contrived Service 1.  
Apr 20 18:45:46 core-02 etcdctl[3233]: running  
Apr 20 18:46:04 core-02 docker[3232]: ------------  
Apr 20 18:46:04 core-02 docker[3232]: Harp v0.12.1 – Chloi Inc. 2012–2014  
Apr 20 18:46:04 core-02 docker[3232]: Your server is listening at http://localhost:9000/  
Apr 20 18:46:04 core-02 docker[3232]: Press Ctl+C to stop the server  
Apr 20 18:46:04 core-02 docker[3232]: ------------  

Now head on over to your web browser and enter 172.17.8.101 (or 172.17.8.102 or 172.17.8.103) and you should see a (hopelessly minimal) website!

You can now add and remove units and boxes, and as long as they're all using the same etcd discovery token they'll find each other and fleet will know about them.

The cool part

Now increase the number of instances by one:

$ export NUM_INSTANCES=4
$ vagrant up
...
$ fleetctl list-machines
MACHINE         IP              METADATA  
8b4f3bb1...     172.17.8.102    -  
cc924daf...     172.17.8.103    -  
6b709591...     172.17.8.101    -  
66895783...     172.17.8.104    -  
$ fleetctl list-units
UNIT                            LOAD    ACTIVE  SUB     DESC                    MACHINE  
contrived-service-1.service     loaded  active  running Contrived Service 1     8b4f3bb1.../172.17.8.102  
contrived-service-2.service     loaded  active  running Contrived Service 2     6b709591.../172.17.8.101  
contrived-service-3.service     loaded  active  running Contrived Service 3     cc924daf.../172.17.8.103

Notice that despite there being a new instance, nothing is running on it; list-units returns the same result. This is because fleet has already satisfied the topology requirements we gave it, adding a VM didn't change that.

We're now going to kill the box with IP 172.17.8.102 ("core-02", as it is known to Vagrant). Don't kill "core-01", because that's the one whose SSH tunnel we're using for fleetctl.

$ vagrant halt core-02
==> core-02: Attempting graceful shutdown of VM...
$ fleetctl list-machines
MACHINE         IP              METADATA  
cc924daf...     172.17.8.103    -  
6b709591...     172.17.8.101    -  
66895783...     172.17.8.104    -  
$ fleetctl list-units
UNIT                            LOAD    ACTIVE  SUB     DESC                   MACHINE  
contrived-service-1.service     loaded  active  running Contrived Service 1    66895783.../172.17.8.104  
contrived-service-2.service     loaded  active  running Contrived Service 2    6b709591.../172.17.8.101  
contrived-service-3.service     loaded  active  running Contrived Service 3    cc924daf.../172.17.8.103  

Wait, what happened there? Notice that "contrived-service-1" is now running on 172.17.8.104 (core-04). We didn't tell it to move it, so how did it happen? Fleet is watching on the etcd directory where machines register for discovery and it is informed when machines in the cluster go down and come up. Previously, we'd told fleet that we wanted three distinct services running and that they couldn't share a box. When we terminated core-02 and core-04 was spare, fleet had no choice but to move contrived-service-1 onto core-04 in order to satisfy our topology requirements and constraints. Cool, right?

Notes for non-Vagrant environments

It's very simple to apply this to EC2 (or some other service) instead of Vagrant. There are two main considerations.

SSH keys

In order to be able to use fleetctl remotely (as we've done) you need to have your public key on the remote box. Vagrant sorts this out for you, but on EC2 (or similar) add the following section at the bottom of your cloud-config user-data file:

ssh_authorized_keys:  
  - <paste the entire contents of your ~/.ssh/id_rsa.pub file on one line>

Note that the indentation is important- leave the rest of the file intact and the first line of this addition has no indentation.

Now set your tunnel environment variable to the public host of your remote machine, port 22 (not 2222):

$ export FLEETCTL_TUNNEL=x.x.x.x:22

etcd timeouts

The network speed between nodes in a cluster running on something like EC2 will obviously be much slower than between Vagrant VMs on your laptop. The default timeout values for etcd are set for local use, so you will get lots of timeouts on EC2. Add the following lines to the end of the etcd: section of your unit-file, just after peer-addr: and before units::

    peer-election-timeout: 500
    peer-heartbeat-interval: 100

The indentation level should be the same as that of peer-addr:. You can read more about this here.

Summary

I've shown you how to get a working cluster of CoreOS nodes in Vagrant and to use fleetctl to deploy containers across the cluster and view their logs.

You now have a working understanding of the basics of CoreOS and have had a good chance to evaluate its strengths. If you have a clustered server environment that needs discovery, you now know enough to go and implement this in CoreOS instead of whatever you're using.

I hope you've enjoyed this and I hope you now know enough about CoreOS and Docker to evaluate them for use in your projects.

Where you can take this from here?

The natural progression after this if you're interested in using CoreOS and Docker in your project is to sign up for some private repositories on docker.io and pull your private images from there. If you're using a CI system such as Strider CD or Jenkins you can configure your jobs to build Docker images on green and to push them to private Docker repositories. You could use docker.io's webhooks to call back to a server that can instruct fleet to update the Docker images running on your cluster by pulling them again from the registry.

You might even look into using some tools from flynn.io to start to build up your own PaaS-like environment for your deployments. If the holy grail for you is Heroku-style git-push-to-deploy, it wouldn't take a lot of work to a rudemintary version of this on top of what you've learned here.

Acknowledgements

Special thanks to @iancrowther, @louisgarman and @marrkmoudy for testing this out for me.

Let me know if you find any mistakes or if you have any feedback.