September 3, 2014

HOWTO: Dealing with Data in Docker: A Dockerized Jenkins and Sonatype Nexus Example


My first job title may have been "Software Developer in Test," and my dissertation may be about "Model-based testing," but I'm afraid I cannot escape my true self - a CM guy. For whatever reason, I am passionate about Configuration Management. Every side-project I've ever worked on, the first things I want to get up and running are a code repo and a continuous integration build; THEN (and only then) the coding can begin.

I recently shared my enthusiasm for Docker, and I mentioned in that post that I would come back and share an example of data management with Docker. Here is that example, in the form of some work I recently did to get Jenkins and Sonatype Nexus working in their own Docker containers yet talking to one another and scripted in a stable, repeatable way.

If you're in a hurry, you can find a public Gist of my scripts for this example right here.


Here are the requirements for what I consider to be a "scripted, stable, repeatable" setup:
  • Dependency only on publicly available docker containers
  • Persistent data across restarts of containers
  • Ability to completely destroy and re-create containers and images in a scripted way
Given those requirements, we could set down the path of creating a Dockerfile for each of my containers. This would be OK, but there are some aspects of Jenkins and Nexus (e.g., user account setup, security configuration, tool installation, etc.) that can be difficult to cleanly script in a Dockerfile.

Alternatively, we could commit updated images after making changes by hand. We could then export entire images as tar archives that could be re-imported to a docker installation at any time. The downside here (as I perceive it) would be that we are not using volumes, so doing backups or restores means bringing down the image, and saving the entire image, and so forth.

For the time being, I started down the path of finding usable images for Jenkins and Nexus on DockerHub. I found these two:


I liked these two setups in particular because they exposed data volumes and ports in a clean way. Now we can start to script the process. What we eventually want to do is to launch an instance of each of these containers which load their data from Docker volumes.

There is also a pattern in docker of using data-only containers (see Creating and Mounting Data Volume Containers" on this page). The idea of a data-only container is that a container can be started with one or more data volumes attached, and that containers which depend on those data volumes  can use the --volumes-from option of docker run to share them. This gives us the ability to mount the volumes for purposes of backup and restore without impacting any other running containers which may be mounting the same volumes. Because volumes can be mounted to any number of containers at the same time, I used a single data volume container for this example.

So now we want three containers for our example: rdata (for all persistent data ... rdata because I'm using it for "research" ...), jenkins (for Jenkins), and nexus (for Nexus). Ignoring the rdata container for now, here is a startup script for Jenkins and Nexus:

#/bin/bash

# Start nexus
docker stop nexus
docker rm nexus
docker run -d --name nexus --volumes-from rdata -p 8081:8081 conceptnotfound/sonatype-nexus

# Start jenkins
docker stop jenkins
docker rm jenkins
docker run -d --name jenkins --volumes-from rdata --link nexus:nexus -p 8080:8080 -u root jenkins

Some notes about what is actually happening here:
  • For each container, stop any existing running instance and remove it from my local registry of containers.
  • Start nexus first, using the volumes from rdata, and mapping port 8081 of the container to port 8081 on my docker host
  • Start jenkins next, using the same volumes from rdata, adding a host file entry for the nexus container, mapping port 8080 to 8080 on my host, and running as root
  • Jenkins is run as root to simplify file sharing (by default, the container uses a "jenkins" user to run the war, but this apparently requires a user with the same uid to exist on the docker host for sharing. I don't believe there are many security issues here, as many/most other containers run as root, which is docker's default.)
Now let's take a look at the load script, which initializes the volumes of the rdata container. Of course, this needs to be done prior to the startup script shown above, as they are mounting the volumes from the rdata container:

#/bin/bash

archive=$1

if [ -z "$archive" ]; then
  echo "No archive provided."
  echo "Usage: $0 archive.tar"
  exit
fi

echo "Loading from $archive"

# Clean up any old stuff here
docker rm rdata

# Start new data container
echo "Starting new  rdata container"
docker run -d --name rdata -v /var/jenkins_home -v /nexus busybox true

# Copy in files from old data container
echo "Restoring old files into new rdata"
docker rm restore
docker run --name restore --volumes-from rdata -v $(pwd):/backup busybox tar xvf /backup/$archive
docker rm restore


What's happening here?

  • We take the name of a tar archive as an input argument.
  • We remove any existing rdata container (note: the container won't be running, but removing it forces removal of its current volumes as well).
  • Start the new rdata container, with volumes mapped to Jenkins and nexus' exposed data locations. /var/jenkins_home and /nexus are consistent with the locations exposed by the jenkins and sonatype-nexus images we are using. Note: the data locations are EMPTY at this point.
  • Also note that the rdata container is running the command "true", which is essentially meant as a no-op that will exit cleanly. The rdata container will exit, but as long as the container is not removed (i.e. deleted), its volumes will still be available for mounting via --volumes-from.
  • Create a new container "restore" using the volumes from rdata and a new volume mapped to our current directory on the docker host. This container runs a tar command which will extract the given tar file's contents. This depends on the tar file containing data for var/jenkins_home and nexus folders. 
  • Remove the "restore" container
The missing piece here is a script to create the archive used by the load script. Here is that backup script:

#/bin/bash

archive=$1

if [ -z "$archive" ]; then
  echo "No archive provided."
  echo "Usage: $0 archive.tar"
  exit
fi

if [ -e "$archive" ]; then
  echo "File $archive already exists. Please choose another name"
  exit
fi

local=`pwd`
echo "Backing up to $archive"

echo "Exporting data current rdata container"
docker rm backup
docker run --name backup --volumes-from rdata -v $local:/backup busybox tar cvf /backup/$archive var/jenkins_home nexus
docker rm backup

Some notes about this process:
  • We take in a name for the forthcoming tar archive
  • We start a new container using the volumes from the current rdata container  and a backup volume mapped to the docker host (rdata does not have to be running, and in fact, it won't be if we used the load script from above)
  • We create a tar file from the contents of the jenkins_home and nexus directories.
  • We remove the backup container
With these three scripts in hand, we can string them together to do sequences like a clean "restart" of the current containers, or a "restore" of the containers with data from a previous backup.

Restart:

#/bin/bash

now=`date +%s`
arch="backup.$now.tar"

./backup.sh $arch
./load.sh $arch
rm $arch
./start.sh

Restore:

#/bin/bash

archive=$1

if [ -z "$archive" ]; then
  echo "No archive provided."
  echo "Usage: $0 archive.tar"
  exit
fi

echo "Restoring from $archive"

./load.sh $archive
./start.sh

Again, here is a Public Github Gist of all of these scripts that is probably much easier to read than the code formatting here on the blog. With these scripts, I have been routinely working with a Dockerized Jenkins and Nexus setup for some time, taking backups and restoring from them regularly. As usual with this sort of thing, your own milage may vary.

Admittedly, I am relatively new to Docker. It's possible I'm doing something "wrong" here. If you see anything along those lines, or if you'd like for me to clarify anything, let me know in the comments!

6 comments:

  1. Nice article !
    Did you try to use Liquibase for database migration ?

    ReplyDelete
    Replies
    1. Have taken a "hello world"-level look at Liquibase and Flyway. From what I can tell, they wouldn't have much effect on the need to persist data in containers, though; but maybe I'm missing something?

      Delete
  2. Hi,

    Try the helicopterizer for Backup and Restore for Docker Container in the Cloud Providers.

    https://github.com/frekele/helicopterizer

    help us with your code, make a PR. :)

    ReplyDelete
  3. This comment has been removed by the author.

    ReplyDelete
  4. I read your article and decided to share similar information online data room, interested in your opinion

    ReplyDelete
  5. The internationally arranged world has made information and data open from any area and any gadget independent of the separation. Facilitate, with the each and every detail relating to the association and the worldwide workforce put away on PCs, associations need a constant procedure of distinguishing, surveying and overseeing security dangers while reacting to the steadily changing client requests, lawful and industry prerequisites. https://goo.gl/KS7Bj0

    ReplyDelete