June 17, 2016

HOWTO: CloudFormation and Masterless Puppet on the Baseball Workbench Project

Within days of my successful dissertation defense in February, I started Baseball Workbench, a side project around a self-service tool for advanced baseball analytics, to have some fun and sharpen my development skills.

One area I've put way too much effort into so far is the automated creation and configuration of AWS resources for the project. The creation and configuration of AWS resources for Baseball Workbench is now completely automated, using a combination of:
  • AWS CloudFormation
  • CloudFormation's support for cloud-init
  • r10k
  • hiera
  • puppet apply (AKA, local Puppet runs, without a Puppet master)
  • Custom "role" and "profile" Puppet classes
  • A custom Puppet module for "superbuilds" for configuring my CI server
  • A custom Puppet module "aws_ec2_facts" for converting EC2 tags into Puppet facts
In this post, I walk through the details of each of these components in-turn. Hopefully, this combination of implementation choices is interesting to you. The concepts should translate to similar approaches as well.

AWS CloudFormation

In general, CloudFormation is a hosted service for creating and doing what I think of as "AWS-level" configuration of AWS resources (tagging, applying IAM roles, applying security groups, etc.). CloudFormation service calls are driven by a JSON file, called a "template", and templates can take in "parameters".

For the Baseball Workbench project, I adapted a few publicly available CloudFormation templates from AWS to create AWS resources like a Virtual Private Cloud (VPC), an Internet Gateway, an ECS Cluster, Security Groups, and other resources. You can view the full template on my "Standard AWS" github repo over here.

The most interesting of these resources for this walkthrough are a couple of EC2 instances:

  • BuildServer - an EC2 instance for running code builds.
  • ProxyServer - an EC2 instance for proxying requests to services
I am also using an Autoscaling group of Docker hosts through the AWS EC2 Container Service behind the ProxyServer. To-date, I haven't Puppet-ized the setup of the these EC2 instances, so I'll omit them from the discussion here. You can still see the full list of shell commands (triggered through cloud-init) in the template above, and I may post an update once these are converted to Puppet steps as well.

My CloudFormation template takes in parameters for Instance Types (one each for BuildServer, ProxyServer, and DockerServer), EC2 keypair, and Base64-encoded Hieradata. I have a couple of small wrapper scripts which generate a Keypair and Base64-encode a local hieradata file prior to calling CloudFormation "create".

CloudFormation's Support for cloud-init

The CloudFormation template, then, creates resources and applies "AWS-related" configurations to them. This is not quite sufficient for complete instance configuration. As considered so far, we can create, tag, and configure the properties of EC2 instances; but we cannot, for example, install and configure Nginx on the ProxyServer instance. We will use Puppet for those instance configuration steps, so the problem is reduced to needed to install and configure Puppet, then "apply" Puppet classes to configure instances. I use AWS EC2's support for cloud-init to perform these steps.

AWS support for CloudInit allows us to create files, run shell commands, start services, and more. In fact, it could be used in the place of more full-featured instance configuration tools like Puppet; but in my experience, cloud-init lacks the finer grained control of the more advanced tools, and would require scripting most instance configuration steps in bash. This just doesn't seem feasible for anything but the most basic instance configurations, such as our use case here of using cloud-init to trigger installation and execution of Puppet code on instance boot.

In particular, CloudFormation exposes a configuration property for cloud-init metadata when defining EC2 resources, and I use this property to specify that I want my instances to apply their latest relevant Puppet code on boot. I also specify a "UserData" script to the EC2 instance, which explicitly triggers cloud-init on boot.

Consider the configuration of the BuildServer resource as an example:

A few things related to cloud-init configuration here:

  • "Metadata" is a top-level property of the resource
  • There is a specific Metadata property for Cloudformation::Init, and within that, a "config" property for specifying the cloud-init metadata directly.
  • In the "UserData" property on line 43, I update and trigger the cfn-init script, which is CloudFormation's wrapper script for cloud-init (installed by default on Amazon Linux base images)
  • The "UserData" text requires hard-coding the region and logical ID (BuildServer in this case) of the CloudFormation resource
Within that config property (starts on line 6), I am using cloud-init to do three things:
  • Use the "sources" block to stage the contents of my standard-aws repository from Github, expanded into root's home directory /root (lines 7-9)
  • Use the "files" block to create a file at /root/init/datadir/custom.yaml, whose contents are the decoded form of the Base64-encoded "Hieradata" parameter provided from a local file by my wrapper script at launch (lines 10-18)
  • Use the "commands" block to trigger a script init.sh that was staged from the standard-aws repository by the sources block (lines 19-23)
The real guts of the Puppet installation and triggering of Puppet code, then, are in the init.sh script from the standard-aws repo (and a corresponding update.sh for actual triggering of a local puppet run):

A few things to note from these scripts, many of which are discussed in greater detail in their respective sections below:
  • I install Puppet, Git, Rubygems and hiera from their respective repositories The examples in this walkthrough are not yet using the "eyaml" hieradata backends also being installed.
  • The running of Puppet is isolated to a separate script that we can call independently (we could also set up a cron job to call this regularly, if desired).
  • r10k and Puppet are called from the working directory /root/init.
  • r10k is called before each Puppet run to obtain the latest modules, using a Puppetfile staged from the standard-aws repo
  • The Puppet local apply command gets its site.pp manifest and hieradata configuration from static files staged from the standard-aws repo
  • Puppet gets its modules from two sources: the "modules" directory populated by r10k, and the "site" directory staged from the standard-aws repo


r10k is a tool for managing Puppet modules. In the setup for Baseball Workbench, my use of r10k is relatively trivial: I use r10k to check out Puppet code from the Puppet Forge and custom Github repositories.

I install r10k from rubygems (line 10 of init.sh above) and run the r10k "puppetfile install" command to check out modules and place them into a local "modules" directory, which is its default behavior.

r10k uses the Puppetfile format for specifying Puppet modules. My Puppetfile currently includes a number of community-supported modules for installing and configuring tools, including Jenkins, Packer, Docker, R, Nginx, Consul, and others, as well as two custom modules I will discuss in more detail below: superbuilds and aws_ec2_facts You can view my full Puppetfile in the standard-aws repository.

Importantly, I decided not to use r10k to manage hieradata, or to create Puppet environments. I elaborate on the hiera setup below. As for environments, I have no other reason for having separate environments at the moment.


Hiera is a tool for providing variables to Puppet modules. Hiera allows Puppet code to avoid "hard-coding" environment-specific values. For the Baseball Workbench, I use hiera variables (called "hieradata") to provide parameters for specific Jenkins "seed jobs" on the BuildServer. I also use hieradata to configure the ProxyServer for specific endpoints.

I install hiera during the init.sh script, and configure Puppet to use a hiera.yaml configuration file (staged from the standard-aws repo) during its "apply" command in update.sh. Here is the current hiera.yaml and site.pp used by the Baseball Workbench project:

Walking through the configuration:
  • Hiera is configured to use a "yaml" backend, which is available by default.
  • The yaml backend is pointed to a source directory ("datadir") of /root/init/datadir, which has most of its contents staged from the standard-aws repo.
  • Recall from the cloud-init configuration that an additional file custom.yaml is staged into the datadir, with its contents passed in as a CloudFormation parameter at stack creation time.
  • The hierarchy for lookups of hiera variables is configured to prefer custom.yaml entries over entries from a "host-specific" yaml, which are preferred over entries from a common.yaml.
  • The host-specific yaml files are in a "hosts" subdirectory of datadir, with the name of the host-specific yaml provided by the "aws_cloudformation_logical_id" custom fact, which I elaborate on below.
To provide a more complete example of the hierarchy, the common.yaml file applies a single "base" profile class to every node, whereas the buildserver.yaml and proxyserver.yaml files in the hosts subdirectory provide role classes and other variables which are specific to the configuration of each host. The custom.yaml at the highest level provides values only known at runtime.

I chose to include the custom.yaml level in the hierarchy for a couple of reasons:
  • I want to be able to reuse the standard-aws repo and its CloudFormation stack across any number of side projects (i.e., not just Baseball Workbench)
  • I want to eventually be able to provide secret values at stack creation time, through the Hieradata parameter of my CloudFormation template, rather than committing these values to a repository in advance.
As an alternative to providing the custom.yaml contents through a CloudFormation parameter, I could have chose to use r10k to check out the datadir contents from a repository (potentially, a repository specific to a side project like Baseball Workbench). As of the time of my implementation in February 2016, however, there were some significant drawbacks to this approach that led me in my current direction:
  • The yaml backend for hiera only supported specification of a single datadir, making it tricky for r10k to check out yaml files from multiple locations (e.g., common files from the standard-aws repo, and custom files from a Baseball Workbench repo).
  • Use of r10k to stage hieradata appears to require use of environments, and I have no need for environments as far as Puppet environments are typically concerned.
Another important point in my use of hiera is that I am using hiera as a "node classifier" for Puppet. This is accomplished by adding the "hiera_include("classes") line in the site.pp manifest being applied by the Puppet command. This causes Puppet to look at a special hiera value "classes", and use the contents of that variable to determine which Puppet classes to apply to each EC2 instance.

The "classes" entry is a list, and Puppet first concatenates all "classes" entries from the hierarchy, then attempts to apply a class corresponding to each entry, assuming that every entry in the list is the name of a Puppet class. Importantly, because we have host-specific hieradata in our hierarchy, the classes applied to a given instance can vary, because the concatenated "classes" list comes from common.yaml, custom.yaml, and a host-specific yaml. You can read more about the use of hiera as a node classifier here.

Alternatively, I could have chosen (and in the long run, may indeed choose) to not use hiera as a node classifier, and instead to use host-specific manifests when making the "puppet apply" call. This would not have much effect overall on my use of hieradata for custom variables. Switching to this model would be a matter of committing host-specific Puppet manifests instead of host-specific classes entries in the host-specific yaml files. This alternative would also have some limitations - classes could only come from the Puppet manifest, and could not be concatenated from multiple sources - but would also enforce use of the roles/profiles pattern discussed below.

puppet apply

We've finally made it to another star of the show: local ("Masterless") Puppet runs with puppet apply!

Up to this point we have:
  • Installed Puppet
  • Checked out modules via r10k
  • Staged environment and host-specific configuration files for hiera, including the list of classes to apply to each node
It's now time to actually run Puppet on our instances. This happens in line 8 of the update.sh script from above. Let's review the call to "puppet apply" in detail, which specifies:
  • A very generic site.pp manifest is being applied. This can be generic because we are using hiera as a node classifier.
  • A generic hiera.yaml, discussed in detail above, is also being used as the hiera configuration.
  • The Puppet modulepath, which is a list of directories where Puppet modules will be found, is set to include the "modules" directory populated by r10k and the "site" directory staged from the standard-aws repo. The "site" directory contains custom role and profile Puppet classes which I elaborate on below.
  • Use of the "future parser" option for Puppet, which I found necessary for some Puppet features I wanted to use within classes.
Referring back to the site.pp shown when discussing hieradata, I found the "virtual packages" block necessary to avoid warnings for the way some of my Puppet modules were loaded. The only signficant content of the site.pp is the "hiera_include" (discussed above) which sets up use of the hiera variable "classes" for the application of Puppet classes.

Role and Profile Classes

The idea of using "roles" and "profiles" is that they provide two layers of abstraction for organizing Puppet classes. In terms of implementation, they are simply Puppet classes. A role is specific to the type of instance being configured (e.g., BuildServer or ProxyServer), and refers to one or more profile classes. A profile is finer-grained, performing the configuration of a common component (e.g., NTP, user accounts, etc.) and refers more directly to resources from Puppet modules.

For the Baseball Workbench Project, I currently have the following roles and profiles (you can browse them in detail on Github):

  • Role proxyserver
    • Profile proxy
  • Role buildserver
    • Profile builds
    • Profile consul

Within each profile, I include the Puppet resources needed for configuration of the instance. The "proxy" profile configures Nginx and Nginx Template, the "builds" profile configures Jenkins and build tools (including a reference to the "superbuilds" module discussed below), and so on. There is also a profile "base" which is applied through the common.yaml "classes" hiera variable to every instance. This profile configures NTP and the use of the aws_ec2_facts puppet module.

In the long run, I question the value of including everything in Profile classes. The real value of Profile classes seems to be shared configuration components - otherwise, we are only adding a layer of abstraction through which we have to debug and maintain. In the future, I plan to refactor my roles and profiles setup to improve on this.

The Superbuilds Module

The vast majority of instance configuration I currently need for the BuildServer and ProxyServer was provided by community-supported Puppet modules from the Puppet Forge. In a couple of cases, I found it necessary to develop my own Puppet modules beneath the role and profile layers specific to the standard-aws project.

The first such example was a Puppet module I called "superbuilds", which you can view on Github. Inside this module, I install a number of tools using existing Puppet modules: jenkins, docker, packer, R, and NodeJS in particular.

In the long run, the Superbuilds module may be more appropriate as one or more Profile-level classes referenced from a Role class. The latest version of Jenkins (Jenkins 2!) also simplifies the configuration of Jenkins, which was a major driver for the existence of this separate module in the first place.

The aws_ec2_facts Module

The second Puppet module I developed for the Baseball Workbench was a quite simple one related to the creation of facts for Puppet and Hiera. The "facter" tool installed as part of the Puppet 3 installation on each EC2 instance in my stack is responsible for maintaining facts for Puppet and Hiera. I can run "facter -p" as sudo on my instances to see the list of available facts that facter obtains by default. Unfortunately, there are no facts related to EC2 tags.

AWS adds "tags" to EC2 instances, as well as most created resources across AWS services. As discussed above, I have referenced at least one such tag in my hiera hierarchy - the logical ID of the EC2 instance in the CloudFormation stack. To make this available to Puppet and Hiera requires writing a bit of Ruby code to define Custom Facts for facter. Below is the custom code in my current aws_ec2_facts module:

Let's review some details of this code, which lives in a file facts.rb in the "lib/facter" subdirectory of the aws_ec2_facts module.

  • At a high level, the Facter object has methods to use for adding fact values.
  • For each fact, I am constructing a command-line call for which the standard output will be the value of the fact. I am using Facter::Core::Execution.exec to execute the command-line call.
  • I use a combination of curl calls to AWS "instance metadata" and calls to the more powerful AWS CLI to get the values of facts
  • The complete list of custom facts in the current implementation is:
    • AWS availability zone
    • AWS region
    • CloudFormation logical ID
    • CloudFormation stack name
    • Private IPs of all EC2 instances in the same stack
I use my common.yaml "classes" hiera value to add this Puppet class to every EC2 instance in my CloudFormation stack. I then have all of the facts above available for reference from Puppet code and Hiera configurations (as well as the output of facter -p).


Hopefully this has been a helpful (if lengthy) overview of my particular implementation of masterless Puppet on an AWS CloudFormation stack. I have inserted quite a bit of commentary on design decisions and potential changes, and I welcome any feedback you may have.

I hope to follow up this particular post with some additional details from the Baseball Workbench stack. If you find any of this interesting, I invite you to "Watch" the various repos (especially standard-aws and the Puppet modules) on Github.

This post also requires a Hat Tip to @tokynet, who taught this "Java guy" 99% of what I know about Puppet. He also presented on the use of AWS tags within hiera hierarchy at the 2015 Spring Puppet Camp in DC.


  1. Great... Excellent sharing.. This is very helpful for beginners. Read that provide me more enthusiastic. This helps me get a more knowledge about this topic. Thanks for this.hunt aws jobs in hyderabad

  2. The information shared was very useful My sincere thanks for sharing this post
    AWS Training in BTM Layout

  3. nice blog has been shared by you. before i read this blog i didn't have any knowledge about this. but now i got some knowledge about this so keep on sharing such kind of an interesting blogs.
    Selenium Training in Bangalore

  4. I simply wanted to write down a quick word to say thanks to you for
    those wonderful tips and hints you are showing on this site.

    AWS Training in Chennai

    AWS Training in Bangalore

    AWS Training in Bangalore

  5. It’s great to come across a blog every once in a while that isn’t the same out of date rehashed material. Fantastic read.

    Digital Marketing Training in Mumbai

    Six Sigma Training in Dubai

    Six Sigma Abu Dhabi

  6. Very nice post here and thanks for it .I always like and such a super contents of these post.Excellent and very cool idea and great content of different kinds of the valuable information's.
    Good discussion. Thank you.
    Six Sigma Training in Abu Dhabi
    Six Sigma Training in Dammam
    Six Sigma Training in Riyadh

  7. Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us. Do check Six Sigma Training in Bangalore | Six Sigma Training in Dubai & Get trained by an expert who will enrich you with the latest trends.

  8. Amazon has a simple web services interface that you can use to store and retrieve any amount of data, at any time, from anywhere on the web. Amazon Web Services (AWS) is a secure cloud services platform, offering compute power, database storage, content delivery and other functionality to help businesses scale and grow.For more information visit.
    aws online training
    aws training in hyderabad
    amazon web services(AWS) online training
    amazon web services(AWS) training online

  9. I have read your blog its very attractive and impressive. I like it your blog.
    Best AWS training in marathahalli bangalore

  10. I’ve bookmarked your site, and I’m adding your RSS feeds to my Google account.
    nebosh course in chennai

  11. Do not think! Just come with us to BGAOC and play with us. best casino games come to us and win soon.

  12. Today I was riding in a bus and heard how the boys talked about this site. secure online casino sites Speak as if it’s cool for a couple of hours so much money decided to take a risk and really thank you guys

  13. Супер отличная гибкая світлодіодна стрічка на любой вкус и цвет, обычно покупаю в интернет магазине.

  14. I wish to show thanks to you just for bailing me out of this particular trouble.As a result of checking through the net and meeting techniques that were not productive, I thought my life was done..
    Data Science Training in Chennai
    Robotic Process Automation Training in Chennai
    Cloud Computing Training in Chennai
    Data Warehousing Training in Chennai
    Dev Ops Training in Chennai

  15. Great Post,really it was very helpful for us.
    Thanks a lot for sharing!
    I found this blog to be very useful!!
    AWS Cloud training in Bangalore

  16. The mentor sees very well that with two strikes on his hitter, it is a contact circumstance. 토토사이트

  17. Your info is really amazing with impressive content..Excellent blog with informative concept. Really I feel happy to see this useful blog, Thanks for sharing such a nice blog..
    If you are looking for any Big data Hadoop Related information please visit our website Big Data Training In Bangalore page!

  18. Nice and useful information.Thanks for the effort and the results,impressed with your post.
    big data training in btm layout

  19. Thanks for the information...
    AWS Training in Bangalore | AWS Cours | AWS Training Institutes - RIA Institute of Technology
    - Best AWS Training in Bangalore, Learn from best AWS Training Institutes in Bangalore with certified experts & get 100% assistance.

  20. Very Interesting content
    Big Data Training from Experts by Gologica

  21. Attend The Artificial Intelligence course From ExcelR. Practical Artificial Intelligence course Sessions With Assured Placement Support From Experienced Faculty. ExcelR Offers The Artificial Intelligence course.
    Artificial Intelligence Course

  22. Superb. I really enjoyed very much with this article here. Really it is an amazing article I had ever read. I hope it will help a lot for all. Thank you so much for this amazing posts and please keep update like this excellent article.thank you for sharing such a great blog with us.
    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

  23. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!

    selenium training in chennai

    selenium training in chennai

    selenium online training in chennai

    selenium training in bangalore

    selenium training in hyderabad

    selenium training in coimbatore

    selenium online training

  24. Great explanation to given on this post and i read our full content was really amazing,then the this more important in my part of life. The given information very impressed for me really so nice content.I am happy for sharing on this blog its awesome blog I really impressed. thanks for sharing. Great efforts.
    Java training in Chennai

    Java Online training in Chennai

    Java Course in Chennai

    Best JAVA Training Institutes in Chennai

    Java training in Bangalore

    Java training in Hyderabad

    Java Training in Coimbatore

    Java Training

    Java Online Training


  25. Great post!I am actually getting ready to across this information,i am very happy to this commands.Also great blog here with all of the valuable information you have.Well done,its a great knowledge

    very nice blogs!!! i have to learning for lot of information for this sites...Sharing for wonderful information.Thanks for sharing this valuable information to our vision. You have posted a trust worthy blog keep sharing.

    Azure Training in Chennai

    Azure Training in Bangalore

    Azure Training in Hyderabad

    Azure Training in Pune

    Azure Training | microsoft azure certification | Azure Online Training Course

    Azure Online Training

  26. Nice blog. Thank you for sharing your experiences with us and keep going on. See more:Apigee Training

  27. This article is really helpful for me. I am regular visitor to this blog. Share such kind of article more in future.It’s hard to come by experienced people about this subject, but you seem like you know what you’re talking about! Thanks.
    DevOps Training in Chennai

    DevOps Online Training in Chennai

    DevOps Training in Bangalore

    DevOps Training in Hyderabad

    DevOps Training in Coimbatore

    DevOps Training

    DevOps Online Training

  28. Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me.This is incredible,I feel really happy to have seen your webpage.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
    Full Stack Training in Chennai | Certification | Online Training Course
    Full Stack Training in Bangalore | Certification | Online Training Course

    Full Stack Training in Hyderabad | Certification | Online Training Course
    Full Stack Developer Training in Chennai | Mean Stack Developer Training in Chennai
    Full Stack Training

    Full Stack Online Training

  29. Thanks for sharing this wonderful content.its very useful to us.This is incredible,I feel really happy to have seen your webpage.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
    Data Science Training In Chennai

    Data Science Online Training In Chennai

    Data Science Training In Bangalore

    Data Science Training In Hyderabad

    Data Science Training In Coimbatore

    Data Science Training

    Data Science Online Training

  30. I feel really happy to have seen your webpage.I am feeling grateful to read this.you gave a nice information for us.please updating more stuff content...keep up!!

    Android Training in Chennai

    Android Online Training in Chennai

    Android Training in Bangalore

    Android Training in Hyderabad

    Android Training in Coimbatore

    Android Training

    Android Online Training

  31. Great site and a great topic as well I really get amazed to read this.This is incredible,I feel really happy to have seen your webpage.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.

    IELTS Coaching in chennai

    German Classes in Chennai

    GRE Coaching Classes in Chennai

    TOEFL Coaching in Chennai

    spoken english classes in chennai | Communication training

  32. nice blog has been shared by you. before i read this blog i didn't have any knowledge about this. but now i got some knowledge about this so keep on sharing such kind of an interesting blogs.

    AWS Course in Bangalore

    AWS Course in Hyderabad

    AWS Course in Coimbatore

    AWS Course

    AWS Certification Course

    AWS Certification Training

    AWS Online Training

    AWS Training