Monitoring container health in Rackspace Carina

Edited Feb 3, 2016. @everett_toews submitted a pull request that greatly simplified some of my scripts. I've updated this post to reflect his suggestions.

Disclaimer: Carina is currently in beta (as of February 2, 2016) and should not be considered production ready. That hasn't stopped me, but you should evaluate all technology platforms and vendors to make the best selection for your business. I'm happy to work with your team to review potential technologies - just get in touch with me.

What's a Carina?

If you haven't heard of Rackspace Carina, you should check it out. Carina is a container runtime that allows you to deploy Docker containers onto bare-metal machines.

Under the hood, Carina is providing Docker Swarm-as-a-Service, which means that you can use existing Docker CLI tools (like docker-compose) to interact with the Carina runtime.

The exciting part for me is being able to run containers without the performance penalty of a virtual-machine hypervisor. It's the way containers should be run. Don't believe me? Go ask Google or Joyent.

I'm also excited, because I don't really want to manage the underlying physical hosts & docker swarm machinery - Carina handles that for you.

(Aside: I'm really not a fan of Docker Swarm. Kubernetes provides a significantly better platform (not to mention it's production ready). I'll be writing more about that in future posts.)

Once you have created a free Carina account and ran one of the many great example tutorials, the next question is: How do I monitor the health of my containers?

In a traditional environment, you might use host-level monitoring tools like cadvisor, sysdig, or New Relic. With Carina, however, you don't have access to the underlying host (and for good reason: security!).

Docker (v1.5+) provides access to data from the underlying cgroups that are managing our containers. You can get these metrics via the docker stats command which pulls data from the Docker API.

Carina gives access to the Docker API via /var/run/docker.sock which can be mounted as a volume into our containers. This is how we're going to monitor the health of our containers.

Monitoring Carina

Feel free to skip straight to the GitHub Repo to try this out.

From a high-level perspective, here's the architecture that we will be using to monitor the health of our containers:

Docker Stats API -> Monitoring Agent -> Graphing Engine

There are numerous logging/metrics vendors, many of which now provide monitoring agents that export data from the Docker Stats API directly into their system. For this project, I needed to export to a custom statsd/graphite system, so this tutorial will be using those platforms. You can absolutely swap my monitoring agent with one from your vendor.

The monitoring agent I'm using is based on this Docker image. It's a NodeJS application that reads the stream of data from the Docker Stats API, converts them into StatsD metrics, and sends them to a specified StatsD instance.

Prerequisites

If you're following along at home, you'll need to have these things:

  • Rackspace Carina cluster
  • docker and docker-compose cli installed on your local machine
  • Rackspace Carina cli installed on your local machine
  • Local copy of this repo

If you're feeling lazy

git clone https://github.com/rosskukulinski/docker-swarm-monitoring.git && cd docker-swarm-monitoring  
./launch.sh

Step by step walk through

To monitor the health of our containers, we need the following:

  • Containers to monitor
  • Statsd/Graphite endpoint
  • Monitoring agent running on each segment sending metrics

To make things simple for this demonstration, I've included a docker-compose.yml template that launches a Statsd/Graphite container on the first Segment in your cluster. Here's the graphite container definition in docker-compose.yml:

graphite:  
  image: hopsoft/graphite-statsd
  restart: always
  environment:
    - constraint:node==*n1
  ports:
    - 80:80
    - SEGMENT_SERVICE_IP:8125:8125/udp # servicenet of n1, statsd rx on udp

As you can see, we are going to run the graphite container on a particular Segment, binding to port 80 for HTTP access and port 8125/udp for Statsd input.

Now lets take a look at the monitoring agent:

monitoring-agent:  
  image: rosskukulinski/docker-swarm-monitoring
  environment:
    - STATSD_HOST=SEGMENT_SERVICE_IP
    - STATSD_PORT=8125
    - STATSD_PREFIX=docker.
    - affinity:container!=*monitoring-agent*
  restart: always
  volumes:
    - '/var/run/docker.sock:/var/run/docker.sock'

Here I'm using an automatically built image from my docker-swarm-stats GitHub repository. We are specifying the statsd host/port to that of the graphite container. I've mounted docker.sock into the monitoring container so that it has access to the Docker Stats API.

Finally, we're using the Docker Swarm affinity feature to force monitoring-agents to run on different hosts. More on that later!

Before we can launch the monitoring containers, we need to get two pieces of
information:

  1. The service net IP of the first segment in the cluster
  2. The public IP of the first segment in the cluster

We can get this information by running these commands:

SEGMENT_PUBLIC_IP=$(docker run --rm --net=host --env constraint:node==*n1 racknet/ip public)  
SEGMENT_SERVICE_IP=$(docker run --rm --net=host --env constraint:node==*n1 racknet/ip service)  

Now, we need to update the docker-compose.yml file with this information. You can manually find/replace if you'd like, or use this next command to insert the information about your cluster and create a new docker compose file named monitoring.yml.

sed 's/SEGMENT_SERVICE_IP/'${SEGMENT_SERVICE_IP}'/g' docker-compose.yml > monitoring.yml  

Launch the monitoring containers

docker-compose -p monitor -f monitoring.yml up -d  
docker-compose -p monitor -f monitoring.yml scale monitoring-agent=$(docker info | grep Nodes | awk '{print $2}')  

This first command launches the statsd/graphite container and a single monitoring agent. The second command finds out how many segments you have in your cluster and then scales the monitoring agent so that there is the same number as segments.

Note: If you ever increase the number of segments, you'll need to re-run the scale command.

We ensure that a monitoring agent is put onto each segment via the affinity environmental variable in the docker-compose.yml:

environment:  
  - affinity:container!=*monitoring-agent*

Now that the monitoring system is up, we can open the Graphite UI.

Open Graphite UI

Using the public IP of the first segment in your cluster, we can access the graphite dashboard. Using your web browser, navigate to the public IP: http://$SEGMENT_PUBLIC_IP

You can get that IP by running:

echo $SEGMENT_PUBLIC_IP  

Once you have the Graphite user interface up, let's graph some container metrics!

  1. In the Graphite Composer window, click 'Graph Data', then click 'Add'
  2. Enter aliasByNode(stats.gauges.docker.*.memory.usage,3) and press OK

You should now see the memory usage of the containers in your cluster! You might need to adjust the time window of the graph to see the data better. You can do this by clicking on the clock icon, and entering a smaller time-scale: e.g. 10 minutes.

Summary

Currently this monitoring agent only sends metrics for CPU and memory utilization. There are a lot more data that you might want to look at. You can certainly customize my agent to grab the metrics you want (I'm always happy to accept PRs). Alternatively, you can look into logging/monitoring vendors. For example, Logentries has a decent out-of-the-box Docker dashboard and monitoring agent.

While this post has been focused on the Carina platform, the mechanism in which we are exporting metrics from the Docker Stats API should work with other Docker Swarm platforms.

Happy monitoring!

You can contact me at ross(@)kukulinski.com or on Twitter @RossKukulinski.

Ross Kukulinski

My name is Ross. I teach the world Kubernetes and Nodejs through consulting, conference speaking, and training courses.

Philadelphia, PA