At a very high level, Docker is a virtualization technology built on a tool called Linux Containers (LXC). The goal of these containers is similar to the goal of any virtual machine - to provide a "machine within a machine" functionality. However, Docker is significantly different from existing technologies because it does not virtualize the entire hardware stack. This does come at a cost of less true isolation between containers, but the sharing this tradeoff enables leads to huge performance and scalability gains. For a very nice comparison of Docker's features vs. more traditional VMs, I found this SO post helpful.
At this point, I am personally a little light on the internal details of Docker, but I have gained some familiarity with its features. Here are five features that have me sold on this technology after only about a month of exposure to it.
- Docker containers represent an extension of "infrastructure as code" to the entire container level. Everything in the container is now subject to configuration management (e.g., versioning, archiving, etc.) and testing in a way that wasn't really feasible with traditional VMs. The describing of containers can be written into a beautiful thing called a Dockerfile, such that installing tools, deploying wars, migrating data, and other tasks that still remain rather tedious with alternative VMs are fairly straightforward with Docker.
- A Dockerfile is built into an image which is only then ready to be executed, and there is a clean separation between describing a container and running a container. As an example, check out this Dockerfile for the official MongoDB container. The file says in a fairly straightforward DSL to start FROM a base Ubuntu image, RUN some commands to install MongoDB, expose a data VOLUME for the data being stored, EXPOSE some ports that can be mapped to the host, and finally, a CMD to run when the container is run. Everything but the CMD line is executed when the container is "built" into an image. When the container is actually run, the CMD will be executed, and the command continues to run until it finishes or the container is stopped. The Dockerfile only describes what should be pre-installed, exposed, and executed. At runtime, we can specify specific port mappings from host to guest (allowing us to run the same image many times on separate ports), data volumes to be used (allowing us to manage data separately), and can even provide an alternative CMD to be executed, e.g., if we want to perform maintenance on an image rather than perform its default command. This separation of image properties and runtime properties goes a long way toward stable portability.
- Docker data volumes are also nice. When I first began to explore Docker I was fairly skeptical of how data could be managed from within containers. If the containers are truly portable, is a container stuck with all of its data as well? While you can capture data as part of a container (and this is the default), this would become bulky and tedious on any real scale. The container itself would no longer be truly "lightweight" if it's necessary to drag around and version its accompanying data as well. Thankfully, the use of volumes allows data to be managed separately from containers, and only coupled where absolutely necessary. Also volumes can be mounted and shared among containers, so that data maintenance operations can be isolated from user-facing containers like webservers (read: no downtime for creating a backup of a database). I'll probably be back to write a full blog post on how I've use a data-only container to manage data for a CI setup with Jenkins and Nexus. It kind of blew my mind ....
- The public Docker registry is also cool. There we can find base images for open-source operating systems, and more complex images to pull which pre-configure tools (e.g., the MongoDB example above). My only complaint here is that I still don't understand what happens if I pull someone else's image but want to override or add to their settings. (I'm certain this is just ignorance on my part, but I also haven't really seen it discussed.) Almost all of the custom images I've created use the ubuntu or busybox image as a starting point, and this is very helpful. I've also pulled stable images for Jenkins, Nexus, and MongoDB so far, and I'm sure additional containers will follow.
- Docker promotes - but does not require - the use of containers for "micro-services." When writing code, we see modularity and separation of duties/functions as a good thing. From what I have seen, Docker enables the same principles at the container level. As a basic example, a webapp which requires a database does not require for the database to run in the same container. (In fact, most traditional production-quality server setups already have at least this separation in place.) What if we took this to the extreme that every service required for my app (e.g., Apache, Tomcat, DB, Chron jobs, etc.) should be separately containerized and managed? Wouldn't we have that same requirement for any properly designed code? If infrastructure is now code, I would see this decoupling as a good thing, having many of the same benefits as proper modularity and decoupling in software (e.g., easier, low-impact maintenance).
There are plenty of other benefits I can see to Docker, especially in the test automation space. I hope to be exploring these at work, research, and for fun over the next few months. I'm sure I'll have more to yack about along the same lines soon!