Setting Up Docker and Buildbot

One of the newest players in the field of increasing server density and utilization is a piece of software called Docker. The idea is solid. Rather than automate the creation and management of resource-heavy virtual machines, Docker automates the management of lightweight Linux Containers (LXC containers), which allow for process and resource isolation backed by kernel guarantees on a single system. This means you have one system kernel, instead of dozens, and you don’t have to waste resources on duplicated pieces of code like system libraries, daemons, and other things that every server will always load into memory.

The ability to create a uniform and repeatable software environment inside of a container is worthy of attention, since it directly relates to getting both development and continuous integration environments running cleanly across a variety of systems.

It’s a problem I’m having right now: I need a continuous integration setup that can poll the master branch of a git repository and trigger a build on Windows, OS X, Linux, and Android. I have limited physical resources, but at least one multicore machine with 8GB of RAM running a 64-bit host OS.

Getting Started with Docker

Without further ado, here’s how I got going with Docker, after getting a clean 64-bit Ubuntu 12.04.3 system installed inside of VirtualBox.

Purge the old kernel (3.2):

Install the new kernel (3.8):

Install Docker (instructions from here):

That last command spits out an interesting list of dependencies, which I’m capturing here in case I need to look up the associated manpages later:

$ sudo apt-get install lxc-docker
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  aufs-tools bridge-utils cgroup-lite cloud-utils debootstrap euca2ools libapparmor1 libyaml-0-2 lxc lxc-docker-0.6.1
  python-boto python-m2crypto python-paramiko python-yaml
Suggested packages:
  btrfs-tools lvm2 qemu-user-static
The following NEW packages will be installed:
  aufs-tools bridge-utils cgroup-lite cloud-utils debootstrap euca2ools libapparmor1 libyaml-0-2 lxc lxc-docker
  lxc-docker-0.6.1 python-boto python-m2crypto python-paramiko python-yaml
0 upgraded, 15 newly installed, 0 to remove and 0 not upgraded.
Need to get 3,817 kB of archives.
After this operation, 20.6 MB of additional disk space will be used.
Do you want to continue [Y/n]? 

With Docker installed, but before running the “Hello World” Docker example, I took a snapshot of the virtual machine. Now that I think about it, though, that’s the last snapshot I’ll need, since Docker is itself a snapshottable container organizer.

$ sudo docker run -i -t ubuntu /bin/bash
[sudo] password for nuket: 
Unable to find image 'ubuntu' (tag: latest) locally
Pulling repository ubuntu
8dbd9e392a96: Download complete
b750fe79269d: Download complete
27cf78414709: Download complete
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
WARNING: IPv4 forwarding is disabled.
root@e8c30f41da03:/# 

No More sudo

I got a little tired of typing sudo in front of everything, so used the instructions here to add a docker group to the system, and restart the daemon with those credentials.

# Add the docker group
sudo groupadd docker

# Add the ubuntu user to the docker group
# You may have to logout and log back in again for
# this to take effect
sudo gpasswd -a ubuntu docker

# Restart the docker daemon
sudo service docker restart

Then I log out and log back into my desktop session, to gain the group permissions.

Getting Buildbot Installed

Someone beat me to it and uploaded a Dockerfile describing both a buildbot-master and buildbot-slave configuration:

Found 6 results matching your query ("buildbot")
NAME                             DESCRIPTION
mzdaniel/buildbot                
mdaniel/buildbot                 
ehazlett/buildbot-master         Buildbot Master See full description for available environment variables to customize.
ehazlett/buildbot-slave          
ehazlett/buildbot                Standalone buildbot with master/slave.  See full description for available environment variables.
mzdaniel/buildbot-tutorial     

Pull buildbot-master

docker pull ehazlett/buildbot-master

According to the Docker Index entry for buildbot-master, there are a handful of environment variables available to be passed into docker run. (This is a bit of a kicker for me, that at the moment you have to pass these environment variables in via the command line, but I’m guessing they’ll fix that to read them in via a file at some point.)

CONFIG_URL: URL to buildbot config (overrides all other vars)
PROJECT_NAME: Name of project (shown in UI)
PROJECT_URL: URL of project (shown in UI)
REPO_PATH: Path to code repo (buildbot watches -- i.e. git://github.com:ehazlett/shipyard.git)
TEST_CMD: Command to run as test
BUILDBOT_USER: Buildbot username (UI)
BUILDBOT_PASS: Buildbot password (UI)
BUILDSLAVE_USER: Buildbot slave username
BUILDSLAVE_PASS: Buildbot slave password

Start buildbot-master

The documentation isn’t super-clear on how to pass these multiple environment variables into the docker container, but it looks something like this:

$ docker run -e="foo=bar" -e="bar=baz" -i -t ubuntu /bin/bash
WARNING: Docker detected local DNS server on resolv.conf. Using default external servers: [8.8.8.8 8.8.4.4]
root@1f357c1e17b4:/# echo $foo
bar
root@1f357c1e17b4:/# echo $bar
baz

For the time being, I’ll just run the container with its default parameters.

CID=$(docker run -d ehazlett/buildbot-master)

But I’m also curious as to how I’m supposed to communicate with it. So I inspect the docker configuration for the buildbot-master:

So what happens when you run the container? You have to find out the portmapping of the buildbot ports inside the container to the ports on your host system.

Port 9989 is the communications port for the Buildbot systems to talk to one another. Port 8010 is the Buildbot web interface whch you can open in a browser, like so:

buildbot-running

Of course, you can access this from the outside (in the top-most, non-virtual host OS as well):

buildbot-running-chrome

Docker Subnetwork

It’s also not entirely clear from the Docker basic instructions, that Docker also creates an internal private network that is NATed to the host, so when you run ifconfig on the Docker host, you’ll see:

And if you’ve attached to a Docker container, and run ip addr show, you’ll see:

Which you’ll also see if you run docker inspect $CID, which returns useful runtime information about the specific container instance:

So now the question is how to get the buildbot-slaves on other VMs to talk to the buildbot-master, and how to configure the buildbot-master itself. I’m also considering getting an instance of CoreOS running, as it seems to have a mechanism for handling global configuration within a cluster, which would be one way to provide master.cfg to the buildbot-master.

Updates to this post to follow.

Update: Easier Overview

The easier way to see your container runtime configurations at a glance is to use the docker ps command (Duh!). Particularly nice is the port-mapping list in the rightmost column:

Update: Configuring the Buildbot Master Itself

You can just jump into the container and edit the master.cfg, like so:

Update: Getting the list of all Containers

It’s not entirely intuitive, but each time you docker run an Image, you get a new Container, and these Containers don’t generally show up as you might expect. (I was wondering how the diff command was supposed to work, for instance.)

You have to use the docker ps -a command, to see all of the Container IDs, which you then can start and stop.

In other words, using docker run image-name creates the new Container. But for subsequent calls, you should use docker start container-id.

This also clarifies why there’s a docker rm and a docker rmi command.

An easy way to remove unused Docker containers

Update: Using cpp to make Dockerfiles uniform

This section could also be called “The horror, the horror”. Following the best practices list mentioned here, I decided to create a uniform include files to pull into my Dockerfiles, which I then generate using the C preprocessor (pretty much because it’s cheap and available).

So the idea I had was to put common Dockerfile instructions into separate files. I’m guessing the Docker devs might build an INCLUDE instruction into the Dockerfile syntax at some point. The benefit to doing this is that you can take advantage of the docker image cache, which stores incremental versions of your build-images based on the instructions and base-image used to create them. In other words, you don’t have to keep rebuilding the common parts of various Docker images. And you’re less likely to mistype common lines across files, which could be a source of inefficiency.

Dockerfile.ubuntu

Dockerfile.run

In a clean subdirectory, below where the .ubuntu and .run files are located:

Dockerfile.in

To create the custom Dockerfile:

Which generates something like:

Then just docker build . like usual.

Update: Now with github!

I’ve created a repository on github to play around with includable Dockerfiles.

The github repository currently has a few images in it, which are related to one another in a tree that looks like this:

The idea, then is to eliminate redundant text by including Dockerfiles in other Dockerfiles, and to organize this hierarchically, such that images further down the hierarchy are just combinations of their parents + some differentiation.

apt-get install package caching using squid-deb-proxy

If you’re actively developing Docker images, one thing that slows you down a lot and puts considerable load on Ubuntu’s mirror network is the redundant downloading of software packages.

To speed up your builds and save bandwidth, install squid-deb-proxy and squid-deb-proxy-client on the Docker container host (in my case, the outermost Ubuntu VM):

And, make sure you add ppa.launchpad.net (and any other PPA archive sources) to /etc/squid-deb-proxy/mirror-dstdomain.acl:

And restart the proxy:

In your Dockerfile, set the proxy configuration file to point to the default route address (which is where squid-deb-proxy will be running):

Once the caching is set up, you can monitor accesses via the logfiles in /var/log/squid-deb-proxy.

The first time you build an image, the log file has lots of cache misses:

The second time, you’ll see a number of hits (TCP_REFRESH_UNMODIFIED), saving you bandwidth and time:

Update: Using sshfs to mount Docker container folders

Your Docker containers might not actually have any editors installed. One way to easily get around this is to just mount folders inside the Docker container using the user-mode SSHFS:

Using Selenium WebDriver with PhantomJS on a shared host

I’m currently trying to set up a cronjob to do end-to-end autotesting of the OAuth code I wrote for Tandem Exchange, which is occasionally failing these days due to the OAuth provider systems being unavailable or very very slow (which is the same thing). It’s actually one of the biggest problems in web authentication actually, the fact that once you do start relying on OAuth providers to authenticate users on your website, you inherit a critical dependency on their servers always being reachable. (In our case, I’ve been seeing stupid downtimes by QQ, one of the largest Chinese social networks.) And it’s not something you can test with mock objects.

Set up a virtualenv:

Install Selenium in the virtualenv:

Download and untar the PhantomJS executable somewhere on the PATH:

During the first attempts at using PhantomJS, I had problems with SSL-based addresses always returning a .current_url of “about:blank“, which you have to fix using a somewhat obscure flag in the webdriver.PhantomJS() constructor.

The fix looks like this, in Python:

And when the unit test runs (via nosetests):

For the most part the Python-based selenium module works, but it is pretty verbose, as it sticks to the original Webdriver API very closely. There are higher-level abstractions, such as Splinter, but I get the impression that making sure it starts PhantomJS properly will be an ordeal in and of itself. I’ve gotten Facebook and Google OAuth testing working headlessly, which is pretty cool, but the next step of getting the QZone OAuth test working is getting jammed up on the fact that QZone is behaving exactly as users see it behave (which is to say, problematically).

But then, that’s the objective of the unit testing, to reveal the source of the latest OAuth problems.

Actually, I’ve also noticed that it’s pretty slow not only from the shared-hosting server we’re using, but also from where I am currently located. So two different points on the globe. I can’t help but wonder how much the Great Firewall of China is slowing things down.

libevent, gcov, lcov, and OS X

Getting a sense of code coverage for open-source projects is necessary due-diligence for any dependent project. On OS X, it’s also a little more work. While doing some of my own research, I wanted to see how well tested the libevent library was, but wasn’t finding much info online. So here’s a log of what I needed to do, to get this information on a Mac.

First things first (and this should apply for many more open-source projects), after I checked out the latest code from github, I added an option to the configure.ac Autoconf source file to insert the necessary profiling code calls for code coverage:

With that added, I reran the autogen.sh file, which pulls in the configure.ac file, and regenerates the configure script, and then I ran ./configure --enable-coverage.

Then I ran make and specified clang as the C compiler instead of old-school gcc. Besides better code quality and error reports, only clang will generate the coverage code. The Apple version of gcc did not do so, leading to some initial confusion.

Once the build was complete, I ran the regression tests with:

Unfortunately, the following error occurred:

So I tried running tests/regress directly and saw:

Oops.

Turns out that Apple uses the profile_rt library to handle code coverage instead of the former gcov library, which is why the _llvm_gcda_start_file function symbol is missing. So I linked to the libprofile_rt.dylib library by specifying LDFLAGS=-lprofile_rt on the make command line:

Rerunning make verify, the following was output, which indicated which of the event notification subsystems were available and being tested on the system:

Once the regression tests finished, the coverage data became available in the form of *.gcno and *.gcda files in the test/ folder and in the .libs/ folder.

Running lcov generated easier-to-interpret HTML files:

Once lcov finished, all I had to do was open up html/index.html and see how much code the coverage tests executed. (And panic? In this case, 14.5% coverage seems pretty low!)

Here’s what the lcov summary looks like:

lcov-libevent

I reran the test/regress command to see if that would help, and it did push the coverage rate to 20%, but I need more insight into how the coverage tests are laid out, to see what else I can do. It is not clear how well the coverage tools work on multiplatform libraries like libevent, which have configurably-included backend code that may or may not run on the platform under test. In these cases, entire sections of code can be safely ignored. But it is unclear that code coverage tools in general are going to be aware of the preprocessor conditions that were used to build a piece of software (nor would I trust most coverage tools to be able to apply those rules to a piece of source code, especially if that code is written in C++).

In any case, like I said in a previous entry, coverage ultimately is not proof of correct behavior, but it is a good start to see what parts of your code may need more attention for quality assurance tests.