Category: puppet

On the importance of not being smart

I just started a new job, which brings interesting reflections. My new colleagues are incredibly smart (possibly even smarter than than the bunch I just left, but that’s still open for discussion). Smart people tend to be attracted to smart problems. For example, one of my first task was to add configuration so that our Elasticsearch cluster sends its logs to Logstash. Logstash itself is backed by another Elasticsearch cluster, so of course, you need to make sure that this ES cluster does not stat sending logs to itself.

All this is managed by Puppet. A very much simplified interface is:

class elasticsearch (
    $enable_log_shipping = false,
    $logstash_server = undef,
) {
    [...]
}

Now if you want to be clever, you could simplify this interface and assume that if no logstash server is defined, we do not want to ship logs:

class elasticsearch (
    $logstash_server = undef,
) {
    $enable_log_shipping = $logstash_server {
        undef   => false,
        default => true,
    }
    [...]
}

I would argue that this second version has less expressiveness, but that’s besides the point. The interesting issue we now have is “how do we configure this in hiera“.

A simplified hiera hierarchy could be:

:hierarchy:
  - "roles/%{::role}"
  - "common"

And of course, we’d like that by default log shipping is enabled, but disabled for role == 'logstash'. This raises a few questions:

  • How do I represent an undef value in hiera?
  • Does hiera stops resolution when it finds an undef value? In other term, are a non existing value and a value explicitly defined as undef the same thing or not?

If you can answer both of the questions above without help from Google, congratulation, you are part of the “knowledgeable” category (disclaimer: I’m not). If the questions above pick your curiosity and you know you will not be able to sleep this evening without the answer, congratulation, you are part of the “smart” category.

Let’s highlight a few more difficulties:

YAML has a concept of language independant null type. It explicitly states that “a mapping entry with some key and a null value is valid and different from not having that key in the mapping”, which is different than how Puppet treats undef (“The undef value is usually useful for testing whether a variable has been set. It can also be used as the value of a resource attribute, which can let you un-set any value inherited from a resource default and cause the attribute to be unmanaged”).

Now, if I set the following hiera configuration:

elasticsearch::logstash_server: !!null
elasticsearch::logstash_server: logstash.example.net

What will happen? Can I assume that logs will be disabled the nodes having role == 'logstash'? What happen if I change logstash.yaml to

elasticsearch::logstash_server: undef

Now let’s compare this to the first proposed interface. The configuration would look like:

elasticsearch::enable_log_shipping: false
elasticsearch::enable_log_shipping: true
elasticsearch::logstash_server: logstash.example.net

A bit more obvious, don’t you think?

Now the actual point of this post:

Smart people are fascinated by smart and / or tricky questions. As soon as this question was asked with my coworkers, they wanted to have the answer (you noticed I have not told you yet?). As smart people, we are attracted in doing the smart thing. It is fun. It is how we learn. It is how we get even smarter. And this is how we screw up 6 month from now when we still have the smart, but we lost the knowledgeable.

Scaling Graphite

Graphite is a great tool to collect metrics on anything. Graphite has great performances by itself. We collect over 300K metrics per second on a single virtual server. But for the hardware we have, that’s close to the limit, we start to loose metrics from time to time. So, weekend exercise : how to scale Graphite to multiple servers.

The goal is :

  • increase the number of servers to allow more scalability
  • keep a single interface to consult metrics
  • keep a single interface to publish metrics
  • keep things as simple as possible (but no more)
  • have everything deployed with Puppet

The result looks more or less like this (based on PlantUML):
graphite-components

There is already good documentation, I will not repeat it. But I was not able to find a complete code example. My implementation is expressed as a few Vagrant VMs and is available on GitHub.

Still to be done

  • Adding more servers to the Graphite cluster means moving existing data around, some scripts need to be integrated for that
  • scripts to clean up corrupted data need to be added to the puppet-carbon module

Helpful sources

Automation of Rundeck

Rundeck is usually thought of as a tool to automate your infrastructure. As we know, it is turtles all the way down, so the obvious question is how do we automate Rundeck itself.

Why would we want to automate Rundeck ? What exactly do we want to automate ? I’ll present my use case, let me know if it applies for you as well:

Scalability
We manage a bit less than a hundred different applications. We manage deployment and some administrative tasks with Rundeck. Those tasks are similar between applications. We don’t want to configure 80 different rundeck jobs for each task, instead we want a generic definition of those jobs and instanciate it for each application, taking care of the specificities if needed.
Reproducibility
Everything deployed on production environment should be tested somewhere else, validate, tested, … To ensure that what you deploy on production is the same as what you tested on test environment, you need a way to version your jobs and have a promotion workflow from test to production.
Coherence
With multiple projects, if the jobs for each application are managed separately, they will start to diverge. We need to ensure that the procedures implemented as rundeck jobs are coherent and that common code is properly factorized.

What I want, is to manage my Rundeck configuration in the same way I manage my applications and my Puppet modules. With the same build, release, versioning and promotion mechanism. For that, I need simple Puppet resources that will allow me to create projects and jobs. Then, I will be able to control them from a mix of role classes and hiera.

What is missing?

Creating projects is done with the command line rd-project tool. Configuring the project is done through the Web interface, or by editing the project.properties file. Project definitions are reloaded mostly automagically, so everything is fine here. I have a basic implementation of a Puppet resource to manage Rundeck project on GitHub.

Managing jobs is a bit harder. You can create jobs through the rd-jobs, but you don’t get direct access to the job definition once it is created. You can reload a job with the same name and it will override the job definition. This makes it harder to manage correctly modifications to jobs, or job removal. I have a buggy half working implementation of jobs, also on GitHub.

What I want in the future?

I’ll keep working on my Puppet module to better handle the quirks of Rundeck. It already works for the basic use cases, but need some love to manage more complex workflows.

Modification to Rundeck itself could make the job much easier, for example by storing all project and job definition directly in files. While the Rundeck code base is of reasonable size and quite readable, I probably won’t have time to dig into it right now. Any help is welcomed on that side …

My griefs with hiera-gpg

Hiera-gpg seems to be the standard way to to store encrypted data in hiera. Craig Dunn has a pretty good article on how to deploy and use it. Storing sensitive data encrypted is a pretty good idea, using asymmetric crypto is even better. But still I am frustrated.

First, let’s have a look at the problem space. In our case we have the following requirements:

  1. all configuration should be stored in a version control system – VCS (SNV / Git / …)
  2. sensitive data should be stored and transmitted securely
  3. only our puppet master should able to decrypt data
  4. anybody should be able to encrypt data
  5. authorization should be handled only by the VCS

hiera-gpg fills the first three requirements, but fails just short on 5 and 6.

To encrypt data, no problem, you create a new yaml file, encrypt it, publish it to your VCS. Anyone can access thewe puppet master’s public key, so anyone can encrypt data. As long as you have access to the VCS, you can publish this encrypted file.

Let’s face it, most of the time we don’t publish a brand new configuration, we modify an existing one. To modify it, we must have the plaintext version. We could either store the plaintext somewhere, but that defeats the whole purpose of encrypting it. We could encrypt data with our own public key to be able to retrieve the plaintext, but that defeats requirement #3. Further more, we work in teams. We could share our private key with our team, but key repudiation becomes an issue when people leave the company. And even more, we work with multiple teams, with divers responsibilities. No team should have the full knowledge of all our private information.

How can we do better …

Instead of encrypting the whole file, we can encrypt individual properties. Each property can than be modified by anyone, but read by no one. When you modify a property, you know its new value, but not its old value. We could even mix encrypted and non encrypted properties. The format would look like this:

---
- database: 
    username: dbuser
    password: ENC(XXXXXXXX)

Anyone can modify this password, but nobody can decrypt it.

Now I just need time to learn ruby and write a hiera extension.