Category: metrics

Scaling Graphite

Graphite is a great tool to collect metrics on anything. Graphite has great performances by itself. We collect over 300K metrics per second on a single virtual server. But for the hardware we have, that’s close to the limit, we start to loose metrics from time to time. So, weekend exercise : how to scale Graphite to multiple servers.

The goal is :

  • increase the number of servers to allow more scalability
  • keep a single interface to consult metrics
  • keep a single interface to publish metrics
  • keep things as simple as possible (but no more)
  • have everything deployed with Puppet

The result looks more or less like this (based on PlantUML):

There is already good documentation, I will not repeat it. But I was not able to find a complete code example. My implementation is expressed as a few Vagrant VMs and is available on GitHub.

Still to be done

  • Adding more servers to the Graphite cluster means moving existing data around, some scripts need to be integrated for that
  • scripts to clean up corrupted data need to be added to the puppet-carbon module

Helpful sources