Scaling Graphite

Graphite is a great tool to collect metrics on anything. Graphite has great performances by itself. We collect over 300K metrics per second on a single virtual server. But for the hardware we have, that’s close to the limit, we start to loose metrics from time to time. So, weekend exercise : how to scale Graphite to multiple servers.

The goal is :

  • increase the number of servers to allow more scalability
  • keep a single interface to consult metrics
  • keep a single interface to publish metrics
  • keep things as simple as possible (but no more)
  • have everything deployed with Puppet

The result looks more or less like this (based on PlantUML):
graphite-components

There is already good documentation, I will not repeat it. But I was not able to find a complete code example. My implementation is expressed as a few Vagrant VMs and is available on GitHub.

Still to be done

  • Adding more servers to the Graphite cluster means moving existing data around, some scripts need to be integrated for that
  • scripts to clean up corrupted data need to be added to the puppet-carbon module

Helpful sources

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s