My first job title may have been "Software Developer in Test," and my dissertation may be about "Model-based testing," but I'm afraid I cannot escape my true self - a CM guy. For whatever reason, I am passionate about Configuration Management. Every side-project I've ever worked on, the first things I want to get up and running are a code repo and a continuous integration build; THEN (and only then) the coding can begin.
I recently shared my enthusiasm for Docker, and I mentioned in that post that I would come back and share an example of data management with Docker. Here is that example, in the form of some work I recently did to get Jenkins and Sonatype Nexus working in their own Docker containers yet talking to one another and scripted in a stable, repeatable way.
If you're in a hurry, you can find a public Gist of my scripts for this example right here.
Here are the requirements for what I consider to be a "scripted, stable, repeatable" setup:
- Dependency only on publicly available docker containers
- Persistent data across restarts of containers
- Ability to completely destroy and re-create containers and images in a scripted way
Alternatively, we could commit updated images after making changes by hand. We could then export entire images as tar archives that could be re-imported to a docker installation at any time. The downside here (as I perceive it) would be that we are not using volumes, so doing backups or restores means bringing down the image, and saving the entire image, and so forth.
For the time being, I started down the path of finding usable images for Jenkins and Nexus on DockerHub. I found these two:
I liked these two setups in particular because they exposed data volumes and ports in a clean way. Now we can start to script the process. What we eventually want to do is to launch an instance of each of these containers which load their data from Docker volumes.
There is also a pattern in docker of using data-only containers (see Creating and Mounting Data Volume Containers" on this page). The idea of a data-only container is that a container can be started with one or more data volumes attached, and that containers which depend on those data volumes can use the --volumes-from option of docker run to share them. This gives us the ability to mount the volumes for purposes of backup and restore without impacting any other running containers which may be mounting the same volumes. Because volumes can be mounted to any number of containers at the same time, I used a single data volume container for this example.
So now we want three containers for our example: rdata (for all persistent data ... rdata because I'm using it for "research" ...), jenkins (for Jenkins), and nexus (for Nexus). Ignoring the rdata container for now, here is a startup script for Jenkins and Nexus:
Some notes about what is actually happening here:
What's happening here?
- jenkins - what appears to be official docker containers of Jenkins releases
- conceptnotfound/sonatype-nexus - a fairly simple container with Nexus installed
I liked these two setups in particular because they exposed data volumes and ports in a clean way. Now we can start to script the process. What we eventually want to do is to launch an instance of each of these containers which load their data from Docker volumes.
There is also a pattern in docker of using data-only containers (see Creating and Mounting Data Volume Containers" on this page). The idea of a data-only container is that a container can be started with one or more data volumes attached, and that containers which depend on those data volumes can use the --volumes-from option of docker run to share them. This gives us the ability to mount the volumes for purposes of backup and restore without impacting any other running containers which may be mounting the same volumes. Because volumes can be mounted to any number of containers at the same time, I used a single data volume container for this example.
So now we want three containers for our example: rdata (for all persistent data ... rdata because I'm using it for "research" ...), jenkins (for Jenkins), and nexus (for Nexus). Ignoring the rdata container for now, here is a startup script for Jenkins and Nexus:
#/bin/bash # Start nexus docker stop nexus docker rm nexus docker run -d --name nexus --volumes-from rdata -p 8081:8081 conceptnotfound/sonatype-nexus # Start jenkins docker stop jenkins docker rm jenkins docker run -d --name jenkins --volumes-from rdata --link nexus:nexus -p 8080:8080 -u root jenkins
Some notes about what is actually happening here:
- For each container, stop any existing running instance and remove it from my local registry of containers.
- Start nexus first, using the volumes from rdata, and mapping port 8081 of the container to port 8081 on my docker host
- Start jenkins next, using the same volumes from rdata, adding a host file entry for the nexus container, mapping port 8080 to 8080 on my host, and running as root
- Jenkins is run as root to simplify file sharing (by default, the container uses a "jenkins" user to run the war, but this apparently requires a user with the same uid to exist on the docker host for sharing. I don't believe there are many security issues here, as many/most other containers run as root, which is docker's default.)
Now let's take a look at the load script, which initializes the volumes of the rdata container. Of course, this needs to be done prior to the startup script shown above, as they are mounting the volumes from the rdata container:
#/bin/bash archive=$1 if [ -z "$archive" ]; then echo "No archive provided." echo "Usage: $0 archive.tar" exit fi echo "Loading from $archive" # Clean up any old stuff here docker rm rdata # Start new data container echo "Starting new rdata container" docker run -d --name rdata -v /var/jenkins_home -v /nexus busybox true # Copy in files from old data container echo "Restoring old files into new rdata" docker rm restore docker run --name restore --volumes-from rdata -v $(pwd):/backup busybox tar xvf /backup/$archive docker rm restore
What's happening here?
- We take the name of a tar archive as an input argument.
- We remove any existing rdata container (note: the container won't be running, but removing it forces removal of its current volumes as well).
- Start the new rdata container, with volumes mapped to Jenkins and nexus' exposed data locations. /var/jenkins_home and /nexus are consistent with the locations exposed by the jenkins and sonatype-nexus images we are using. Note: the data locations are EMPTY at this point.
- Also note that the rdata container is running the command "true", which is essentially meant as a no-op that will exit cleanly. The rdata container will exit, but as long as the container is not removed (i.e. deleted), its volumes will still be available for mounting via --volumes-from.
- Create a new container "restore" using the volumes from rdata and a new volume mapped to our current directory on the docker host. This container runs a tar command which will extract the given tar file's contents. This depends on the tar file containing data for var/jenkins_home and nexus folders.
- Remove the "restore" container
The missing piece here is a script to create the archive used by the load script. Here is that backup script:
#/bin/bash archive=$1 if [ -z "$archive" ]; then echo "No archive provided." echo "Usage: $0 archive.tar" exit fi if [ -e "$archive" ]; then echo "File $archive already exists. Please choose another name" exit fi local=`pwd` echo "Backing up to $archive" echo "Exporting data current rdata container" docker rm backup docker run --name backup --volumes-from rdata -v $local:/backup busybox tar cvf /backup/$archive var/jenkins_home nexus docker rm backup
Some notes about this process:
- We take in a name for the forthcoming tar archive
- We start a new container using the volumes from the current rdata container and a backup volume mapped to the docker host (rdata does not have to be running, and in fact, it won't be if we used the load script from above)
- We create a tar file from the contents of the jenkins_home and nexus directories.
- We remove the backup container
With these three scripts in hand, we can string them together to do sequences like a clean "restart" of the current containers, or a "restore" of the containers with data from a previous backup.
Restart:
Restore:
Again, here is a Public Github Gist of all of these scripts that is probably much easier to read than the code formatting here on the blog. With these scripts, I have been routinely working with a Dockerized Jenkins and Nexus setup for some time, taking backups and restoring from them regularly. As usual with this sort of thing, your own milage may vary.
Admittedly, I am relatively new to Docker. It's possible I'm doing something "wrong" here. If you see anything along those lines, or if you'd like for me to clarify anything, let me know in the comments!
Restart:
#/bin/bash now=`date +%s` arch="backup.$now.tar" ./backup.sh $arch ./load.sh $arch rm $arch ./start.sh
Restore:
#/bin/bash archive=$1 if [ -z "$archive" ]; then echo "No archive provided." echo "Usage: $0 archive.tar" exit fi echo "Restoring from $archive" ./load.sh $archive ./start.sh
Again, here is a Public Github Gist of all of these scripts that is probably much easier to read than the code formatting here on the blog. With these scripts, I have been routinely working with a Dockerized Jenkins and Nexus setup for some time, taking backups and restoring from them regularly. As usual with this sort of thing, your own milage may vary.
Admittedly, I am relatively new to Docker. It's possible I'm doing something "wrong" here. If you see anything along those lines, or if you'd like for me to clarify anything, let me know in the comments!
Nice article !
ReplyDeleteDid you try to use Liquibase for database migration ?
Have taken a "hello world"-level look at Liquibase and Flyway. From what I can tell, they wouldn't have much effect on the need to persist data in containers, though; but maybe I'm missing something?
DeleteHi,
ReplyDeleteTry the helicopterizer for Backup and Restore for Docker Container in the Cloud Providers.
https://github.com/frekele/helicopterizer
help us with your code, make a PR. :)
This comment has been removed by the author.
ReplyDeleteI read your article and decided to share similar information online data room, interested in your opinion
ReplyDeleteThe internationally arranged world has made information and data open from any area and any gadget independent of the separation. Facilitate, with the each and every detail relating to the association and the worldwide workforce put away on PCs, associations need a constant procedure of distinguishing, surveying and overseeing security dangers while reacting to the steadily changing client requests, lawful and industry prerequisites. https://goo.gl/KS7Bj0
ReplyDelete