zag metrics

Zag is a fast, scalable Node.js application for aggregating and visualizing both real-time and historical metrics.

Setup

Quick start guide

zag-standalone uses LevelDB as the default backend.

$ npm install -g zag-standalone
$ start-zag metrics.db
zag-web listening on 0.0.0.0:8875
zag-daemon pool: ["127.0.0.1:8876"]

The first time you run it, wait a minute for the key list to populate with the first data points. Next, point your browser to the address the script printed.

Now you're ready to start sending metrics data using zag-agent.

$ npm install zag-agent

then:

var agent = require('zag-agent')(["127.0.0.1:8876"])
for (var i = 0; i < 1000; i++) {
  agent.histogram("foo|bar", Math.random() * 100)
}

agent.counter("demo")

setTimeout(function() {
  agent.close()
}, 1500)

It can take up to 1 minute for non-live metrics to appear.

Scaling up

The standalone helper will only get you so far. If you are running zag in production, the load should be spread across multiple daemons, which means LevelDB won't work anymore – on to Postgres.

  1. Set up a postgres database.
  2. Run the setup script in zag-backend-pg. This will create the tables and indices.
  3. Set up some zag-daemon processes. The daemons are responsible for aggregating and monitoring the metrics and saving them to the database.
  4. Set up a zag-web process. zag-web handles data visualization and manages ancillary data such as tags and dashboards.
  5. Start sending data to the daemons using zag-agent in your application.
  6. Explore the data in zag-web's interface.

Zooming and panning

Zoom, zoom, revert, revert

Zoom, zoom, revert, revert

To zoom a graph in the X or Y direction, just click and drag a range. Double-clicking will revert the zoom.

To pan a graph up and down or side-to-side, shift-click and drag.

Zooming and panning

Zooming and panning

Graph types

Counter

Counter

Counter

A counter measures the number of events over an interval.

Stacked counters

Stacked counters

Histogram subkeys

Histogram subkeys

Histogram

A histogram measures the distribution of a value. Fields measured:

  • mean - Mean value
  • p10, p75, p95, p99 - Percentiles
  • median - 50th percentile
  • max - Maximum value
  • std_dev - Standard deviation
  • count - Total number of values recorded
  • llquantize - Log/linear quantile data for heat maps
Zooming a heat map

Zooming a heat map

Metrics keys

Key structure

Metrics keys may include the characters: [ \w/._()+:-], as well as > and | as separators. Metrics sent with invalid keys are silently ignored.

In A|B, A is an aggregate key, which means that every point that gets sent to A|B, A|C, etc will also automatically be forwarded to A.

In A>B, A has no points associated with it, it is just a scope in the tree widget.

Navigate thousands of keys

Navigate thousands of keys

Expand/collapse key tree

Expand/collapse key tree

Key tree

Clicking on a triangle expands/collapses the keys.

Shift-clicking on a triangle graphs all of the key's children.

To create a new key, just start sending it data. Within 2 minutes of receiving the first data point, the key will appear in the tree.

Overlay keys

Overlay keys

Overlay by subkey

Overlay by subkey

Shift-click on a leaf to overlay it on the current graph.

Filter keys

Filter keys

Intervals

Change interval

Change interval

By default, the interval is relative, e.g. if the last 6 hours are plotted and you reload the graph, it will still be the last 6 hours. To link to a fixed range, go to ≡/Permalink.

Intervals selected from the "minutes" row of the range selector will be live.

Live sub-second metrics

Live sub-second metrics

Deltas

Set the downsample interval manually

Set the downsample interval manually

A "delta" is the point's downsample interval.

While data is stored in 1-minute chunks, sometimes it is useful to plot larger samples when the interval is large.

Any "delta" smaller than 1-minute will result in a live chart.

Dashboards

A basic dashboard

A basic dashboard

Use dashboards to get a bird's-eye view of related graphs.

Select a dashboard

Select a dashboard

Tags

A tag marks an event globally, across all metrics. For example, you may mark server releases so that metrics that change can be correlated after the fact.

Each tag contains the following fields:

  • id - Unique identifier for the tag, in the form of {time}_{random}.
  • ts - Timestamp marking the tag's location.
  • label - Some text specific to this tag describing the event that it marks.
  • color - A hex color, e.g. #f00.
Tags

Tags

New tag

New tag

To start using tags, you need to create some 'tagtypes' using the admin script:

$ zag/bin/admin.js tagtype list
$ zag/bin/admin.js tagtype create '#00ff00' 'Server deploy'
$ zag/bin/admin.js tagtype create '#0000ff' 'Client release'

Tags can be created by either the admin script:

$ zag/bin/admin.js tag create '#0000ff' 'release 0.12.4' 1393429946985

or in the web UI. To create a tag in the web UI, right-click on the X axis of a graph. Left-click a tag on the axis to modify it.