Metrics
Sentry supports the emission of pre-aggregated metrics information via the statsd
envelope item. This emission bypasses relay-side sampling and assumes a client side
aggregator. In the product these are referred to as "custom metrics" and internally
the system sometimes carries the name "ddm" (delightful developer
metrics). The functionality sits on top of the
generics metrics platform which is also used for span and transaction level
metrics.
These metrics are considered "trace seeking" which means that they should attempt to establish a relationship to the current span or trace.
Introduction
Custom metrics are sent into the custom
namespace in the generic metrics platform.
The features exposed from the generic metrics platform to the SDKs are all
metric types (counters, distributions, gauges and sets). SDKs
are required to perform local aggregation unless they perform on-shot emissions.
Aggregator Behavior
Metrics are aggregated into a process wide aggregator which is required to flush in intervals that are reasonable for the SDK. These aggregations are lossless (unless the metric type is already lossy which for instance applies to sets and gauges) and are required to support tags. The following general guidelines exist:
- 10 Second Bucketing: SDKs are required to bucket into 10 second intervals (rollup in seconds) which is the current lower bound of metric accuracy.
- Flush Shift: SDKs are required to shift the flush interval by
random() * rollup_in_seconds
. That shift is determined once per startup to create jittering. - Force flush: an SDK is required to perform force flushing ahead of scheduled time if the memory pressure is too high. There is no rule for this other than that SDKs should be tracking abstract aggregation complexity (eg: a counter only carries a single float, whereas a distribution is a float per emission).
- Background Aggregation: SDKs are encouraged to defer flushing and aggregation
into a background thread. For SDKs where
fork()
is to be expected, the background thread needs to ensure it restarts after fork (eg: python). - Tagging: metrics are tagged by key value pairs, where each key can have more than one value. Both keys and values are limited to strings, but the SDK can expose other types if it has a reasonable way to stringify these.
In abstract terms the aggregator is recommended to look a bit like this:
class Aggregator:
def __init__(self):
self.buckets = {}
def get_bucket(self, ts):
return self.buckets.setdefault(ts // 10, {})
def add(self, type, key, value, unit, tags):
mri = "%s:%s" % (type, key)
Normalization
Normalization shall be done with unicode support if possible.
- Keys and Tag Keys: regex
[^a-zA-Z0-9_/.-]+"
with replacement character_
. - Tag Values: regex
[^\w\d_:/@\.{}\[\]$-]+
with no replacement character.
Units
SDKs should permit any string unit, but some of them are known to the system and
will in the future permit recalculations. They are documented as part of
relay: MetricUnit
.
Trace Seeking
When a metric is emitted it needs to find the current trace (trace seeking), most specifically the current span. From there two things shall be happening unless they are disabled:
- Common tag attaching: the tags
transaction
,release
andenvironment
shall be automatically added to the metric. - Span local aggregation: the metric shall be attached to the current span as a local summary (a form of gauge). For more information see below.
API
API wise the SDKs are required to expose static metric functions which are to be
defined in a metrics
module. This also documents the aggregation state for
each metric type. They are constructed with an initial value, have an add
method to add another value and a serialize
method that returns the values for
the final emission as array.
Common Attributes
The following attributes are common for all metrics. These should be emitted via keyword arguments, some configuration struct or whatever is reasonable in the target language.
key
: the name of the metric. SDKs are required to normalize this name.value
: the value to be emitted, this is specific by metric type.unit
: the unit for the metric, see below.tags
: a dictionary or list of tuples of tags and their values.stacklevel
: a utility attribute for SDKs where n stack level shall be ignored for code location recording. In languages that have better facilities this can be removed.
Span Aggregation
On a span every metric is aggregated as a gauge. For each metric the following attributes are retained per span:
min
: the minimum value observed.max
: the maximum value observed.count
: the number of emissions.sum
: the sum of all values.tags
: the metric tags and their values.
The local aggregator behaves as such:
class LocalAggregator:
def add(self, type, value, value, unit, tags):
# Assumes sorted tags
bucket_key = ("%s:%s@%s" % (ty, key, unit), tags)
old = self._measurements.get(bucket_key)
if old is not None:
v_min, v_max, v_count, v_sum = old
v_min = min(v_min, value)
v_max = max(v_max, value)
v_count += 1
v_sum += value
else:
v_min = v_max = v_sum = value
v_count = 1
self._measurements[bucket_key] = (v_min, v_max, v_count, v_sum)
Counters
Counters (c
) have a single method for emission:
incr(key, value: int | float = 1.0, unit = "none", ...)
: increments the given key by one in the bucket.
Aggregation state:
class Counter:
def __init__(self, initial):
self.value = initial
def add(self, value):
self.value += value
def serialize(self):
return [self.value]
Span aggregation: the value is added to the span local aggregator.
Distributions
A distribution (d
) holds all values observed. There are multiple methods that
shall exist on an SDK to support common use cases. The basic way to emit a
distribution is the low level distribution
function, higher level methods
depend on the facilities in the language.
distribution(key, value: int | float, unit = "none", ...)
: emits a single value into a distribution.
For languages with context managers (eg: with
in Python):
timing(key, ...)
: this shall measure the time of a code block and emit the measured time as distribution with a time unit (second
,millisecond
, etc.). The default unit shall besecond
in current SDKs as we are not yet normalizing values on the server.
For languages with lambda callback patterns (eg: timed(() => { ... })
):
timing(key, ..., callback)
: this shall measure the time of the invoked lambda and emit the measured time as distribution with a time unit (second
,millisecond
, etc.). The default unit shall besecond
in current SDKs as we are not yet normalizing values on the server.
For languages with decorators:
@timing(key, ...)
or@timed(key, ...)
: Can be used to annotate a function so that every time it is invoked, a timing is emitted.
Aggregation state:
class Distribution:
def __init__(self, initial):
self.values = [initial]
def add(self, value):
self.values.append(value)
def serialize(self):
return self.values
Span aggregation: the value is added to the span local aggregator.
Gauges
Gauges (g
) are generally discouraged. They are often also called "summaries" as they
are similar in nature to distributions, but somewhat lossy. They are however emitted
per span when metric summaries are enabled. As they are hard to aggregate across
different emitters they can be tricky to use properly in practice.
They are emitted by a single function called gauge
:
gauge(key, value: int | float, unit = "none", ...)
: emits a single value into the gauge.
Aggregation state:
class Gauge:
def __init__(self, initial):
self.last = initial
self.min = initial
self.max = initial
self.sum = initial
self.count = 1
def add(self, value):
self.last = value
self.min = min(self.min, value)
self.max = max(self.max, value)
self.sum += value
self.count += 1
def serialize(self):
return (
self.last,
self.min,
self.max,
self.sum,
self.count,
)
Span aggregation: the value is added to the span local aggregator.
Sets
Sets (s
) are used to count unique members. SDKs are supposed to accept integers and
strings at least, but they are encouraged to use CRC32 to fold them into a 32bit integer.
As such the aggregation state is partially lossy.
They are emitted with a single method:
set(key, value: int | string, unit = "none", ...)
: marks a single value to be in the set.
Aggregation state:
class Set:
def __init__(self, initial):
self.value = {initial}
def add(self, value):
self.value.add(value)
def serialize(self):
def _hash(x):
if isinstance(x, str):
return crc32(x.encode("utf-8")) & 0xFFFFFFFF
return int(x)
return [_hash(value) for value in self.value]
Span aggregation: the value 1
is added to the span local aggregator for new members.
Serialization
Metrics are emitted in regular flush intervals (every 10 seconds, with a random offset that
is detemined once per startup) as envelope items. The item type is statsd
and the serialization
format is statsd compatible. Unlike normal statsd, more than one value can be emitted separated
by a colon. These values are the results of calling serialize
or similar on the aggregation
state.
The format is documented as part of relay: relay_metrics.
Additionally metric summaries retained on the spans shall be emitted as _metric_summaries
on the metrics as documented by this RFC: RFC-0123.
Meta Data
Additionally SDKs are encouraged to capture location information once per metric. At metric
emission time the system shall be collecting which code location emitted a metric and retain it.
For that the same format as for the frame
on the stacktrace
shall be used. stacklevel
number of frames shall be ignored.
The meta data is emitted in JSON as envelope item called metric_meta
:
{
"timestamp": 1701743645,
"mapping": {
"encoded_mri": [{
"type": "location",
"filename": "..."
}]
}
}
Today only code mappings are supported. Each key is an encoded MRI (eg: c:custom/foo@none
), each
value in the mapping array has a key called type
which today can only be "location"
. All other
keys are the same format as the frame
on the stacktrace. Today
SDKs should send code locations at least once a day, and they can round the timestamp.
Reference Implementation
The Python SDK holds a reference implementation in the metrics.py
module for the full
feature set: metrics.py.