Getting Started with CoreOS and Docker using Vagrant

This is the first in a series of posts about CoreOS and Docker.

tl;dr

In this post I'll show you:

  1. What CoreOS is.
  2. How to bring up a Vagrant VM running CoreOS.
  3. How to use fleetctl to submit jobs (i.e. run Docker containers) to a CoreOS cluster and other basic administration.

This series of articles is basically fleshing out (and spelling out) what is shown in this introductory video to fleet: Cluster-Level Container Deployment with fleet, but using Vagrant instead of EC2.

If you're new to Docker and need a quick primer, try Nuno Job's excellent tutorial.

Why CoreOS?

Docker is a lot of fun; I like it. But for Docker to reach the stage where it's widely used in production I believe it needs a little sugar on top; something that unlocks the full potential of Docker without requiring one to build a PaaS on top of it. After all, to run a set of services painlessly across a cluster you need nice solutions to the problems of service and peer discovery, configuration management and deployment, among others. Docker alone won't give you this.

There are a number of interesting projects out there today that build on Docker, such as Dokku, Flynn and Deis. CoreOS is another, yet it's different in that it isn't trying to be a PaaS. CoreOS is Linux stripped down and ready for truly massive-scale deployments, with a few cool PaaS-like features.

What is CoreOS?

Most Linux distributions come with a tonne of stuff that your average software service deployment isn't going to need. CoreOS strips this to the core, allowing you to build up your lean stack how you want it, using Docker.

CoreOS provides etcd, a highly-available, distributed key-value store that is used for discovery and configuration management, and fleet, a service that levereges etcd to provide a distributed init system- effectively systemd at the cluster level. With CoreOS, you think about what you want to do at a cluster level, rather than just the host level.

Whilst CoreOS, etcd and fleet don't bring much to the problem domain that is truly new (see Apache ZooKeeper, doozerd and Substack's Fleet), they work together nicely, along with systemd and Docker, making a cohesive whole that is more than the sum of its parts.

Steps

I'm only going to give instructions for OS X, since that's what I'm currently using, but everything here will either directly work on Linux or can be easily translated.

1. Install dependencies:

  • VirtualBox - I used version 4.3.10.
  • Vagrant - I used version 1.5.2.
  • fleetctl- the CLI client for fleet; install it like so:
$ wget https://github.com/coreos/fleet/releases/download/v0.3.2/fleet-v0.3.2-darwin-amd64.zip && unzip fleet-v0.3.2-darwin-amd64.zip
$ sudo cp fleet-v0.3.2-darwin-amd64/fleetctl /usr/local/bin/

Alternatively, you can use homebrew:

$ brew update
$ brew install fleetctl

Note: Given that CoreOS is in active development, new versions of tools like fleetctl are released regularly, likewise with CoreOS itself and the Vagrantfile. Ensure that you're using the version of the client tools that match the version of CoreOS to which you're connecting.

2. Clone CoreOS’s Vagrantfile repository:

$ git clone https://github.com/coreos/coreos-vagrant/
$ cd coreos-vagrant
$ DISCOVERY_TOKEN=`curl -s https://discovery.etcd.io/new` && perl -p -e "s@#discovery: https://discovery.etcd.io/<token>@discovery: $DISCOVERY_TOKEN@g" user-data.sample > user-data
$ export NUM_INSTANCES=1

In the penulatimate step we're renaming user-data.sample to user-data and substituting in a newly-generated discovery token. We need to rename the file in order for the Vagrantfile provisioning code to pick up the configuration inside that instructs CoreOS to start up etcd and fleet, which we need. The discovery token is a unique identifier for a cluster, so that all nodes know they are in the same cluster. The last step is to tell Vagrant how many instances to spin up. 1 is the default, but I want to be specific in case you're jumping between this post and the next post in the series :)

3. Start the VM

You should now be able to bring up the Vagrant VM and SSH into it:

$ vagrant up

Vagrant tips: You can SSS into the VM with vagrant ssh, and exit the SSH session again with Ctrl-d. This leaves the VM running, just vagrant ssh again to get back in. If you modify the Vagrantfile you can use vagrant reload --provision to have Vagrant reload the config and restart the VM. Use vagrant halt to terminate the VM (like powering off) and vagrant destroy to blow it away, losing all data inside. For further information on how to use Vagrant, see vagrant --help.

You now have a base installation of CoreOS running that you can play with.

Running Docker containers on CoreOS in Vagrant

Now I'm going to show you how to get a pre-built Docker image from the public registry up and running on CoreOS. We're going to run a pre-built Docker image of dillinger.io- an HTML5 Markdown editor. Just because it's a simple example.

On your own machine, create a systemd unit file (a config file that tells systemd how to start and stop a service) for our dillinger.io service called dillinger.service:

$ cat dillinger.service
[Unit]
Description=dillinger.io  
Requires=docker.service  
After=docker.service

[Service]
ExecStart=/usr/bin/docker run -p 3000:8080 dscape/dillinger  

This basically tells systemd how to run the container: using docker run, giving it an image name from the public registry and telling Docker to bind port 8080 in the container to port 3000 on the host machine (or VM in this case).

First we'll set an environment variable that will tell fleetctl how to speak to your Vagrant VM over SSH, and then test it:

$ export FLEETCTL_TUNNEL=127.0.0.1:2222
$ ssh-add ~/.vagrant.d/insecure_private_key
$ fleetctl list-machines
MACHINE      IP            METADATA  
c9f8bd2f...  172.17.8.101  -  

Now type the following commands (the ones starting with '$'!):

$ fleetctl submit dillinger.service
$ fleetctl list-units
UNIT                    LOAD    ACTIVE   SUB    DESC           MACHINE  
dillinger.service       -       -        -      dillinger.io   -  
$ fleetctl start dillinger.service
Job dillinger.service scheduled to c6d23f21.../172.17.8.101  
$ fleetctl list-units
UNIT                    LOAD    ACTIVE  SUB     DESC           MACHINE  
dillinger.service       loaded  active  running dillinger.io   c6d23f21.../172.17.8.101  
$ fleetctl journal dillinger.service
-- Logs begin at Sat 2014-04-19 22:45:43 UTC, end at Sat 2014-04-19 22:48:53 UTC. --
Apr 19 22:47:12 core-01 systemd[1]: Starting dillinger.io...  
Apr 19 22:47:12 core-01 systemd[1]: Started dillinger.io.  
Apr 19 22:47:12 core-01 docker[3136]: Unable to find image 'dscape/dillinger' locally  
Apr 19 22:47:12 core-01 docker[3136]: Pulling repository dscape/dillinger  

fleetctl list-units is useful for listing the submitted units. It will tell you their state and which machine in the cluster they're running on. fleetctl journal dillinger.service tails the recent logs of the container. You can use fleetctl journal -f dillinger.service to follow the logs. As you can see, it is pulling the Docker images down from the registry. This may take quite a while because it has to pull the Ubuntu base image, so be patient.

In the meantime, let's SSH into the VM and learn a few diagnostic tools:

$ vagrant ssh
$ docker ps -a
CONTAINER ID        IMAGE                     COMMAND                CREATED             STATUS              PORTS                    NAMES  
018a1bbbaadd        dscape/dillinger:latest   /bin/sh -c 'forever    7 seconds ago       Up 7 seconds        0.0.0.0:3000->8080/tcp   agitated_babbage  
$ sudo netstat -tulpn
Active Internet connections (only servers)  
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name  
tcp6       0      0 :::22                   :::*                    LISTEN      1/systemd  
tcp6       0      0 :::3000                 :::*                    LISTEN      3125/docker  
tcp6       0      0 :::7001                 :::*                    LISTEN      3064/etcd  
tcp6       0      0 :::4001                 :::*                    LISTEN      3064/etcd  

docker ps -a is telling us that it has run the container, it's still running, and that port 8080 in the container is mapped to port 3000 on the host, just as we specified in the unit file. So far so good. netstat -tulpn lists all the processes currently listening on ports (sudo allows us to see the last column- the process ID/name), and we can see that something run by docker is listening on port 3000. Sounds promising. Head on over to http://172.17.8.101:3000 in your browser and hopefully you will see dillinger.io running. Cool, eh?

If it isn't working, review the above steps, checking the logs, docker ps and netstat and check everything thoroughly. If docker ps -a returns nothing and the logs show no errors, it's probably just still downloading the images.

Summary

We've started up a simple CoreOS VM running one service and have learned a few basic administrative and diagnostic tools along the way. You may want to use this as a development sandbox for playing with Docker or building Docker images (since it doesn't work on OS X).

In the next post we'll spin up multiple VMs in a cluster and start to see what etcd and fleet can do.

Let me know if you find any mistakes or if you have any feedback.