histogram_percentile()
- histogram_percentile()
- Syntax
- Parameter definitions
- Return value
- The histogram input stream
- Calculate the percentile
- histogram_percentile() considerations
- Apply histogram_percentile() to an aggregated stream
- Apply histogram_percentile() over time
- Apply histogram_percentile() to an aggregated stream over time
For an input stream that contains a fixed bucket histogram, histogram_percentile()
calculates the specified percentile for the data in the histogram.
Syntax
histogram_percentile(pct=<percentile> [,bucket_name='<upper_bound>'][,infinity_bucket_name='<infinity_value>'])
Parameter definitions
Parameter | Type | Default value | Description |
---|---|---|---|
<percentile> | number | - | Required. Percentile value in the range 0 < pct <= 100 |
<upper_bound> | string | le | Optional. Name of a dimension that specifies the upper bound of a histogram bucket. |
<infinity_value> | string | Inf | Optional. Value of <upper_bound> for the bucket that contains all data above the bound. |
Return value
Returns a stream that contains a metric time series (MTS) for each histogram in the input stream.
Each MTS is a single data point with the metadata from the input histogram, except for the value of the <upper_bound>
dimension.
The stream computes the specified percentile from the histogram.
The histogram input stream
The input to histogram_percentile()
is one or more histograms generated by a host or application. Each histogram is represented by a group of MTS. Each bucket is represented by a single MTS in the group, and each MTS contains a single data point.
All buckets for a histogram have the same metric name, metric type, and dimensions, except for the <upper_bound>
dimension that has the same name but different values. The values for this dimension represent the
upper bound of each bucket in the histogram.
For example, a possible histogram might have three buckets. Each bucket represents the number of requests that have a latency less than a specified value.
In the input stream, this histogram is a group of three MTS. All the MTS have the same metric name, metric type, and dimension names. Each individual MTS has a different value for the upper boundary of the corresponding histogram bucket. By default, the dimension name is le
,
but you can override this name by using the bucket_name
parameter.
The input stream for this histogram would have these metrics and dimensions:
metric name:
request_latency_bucket
.metric value: Number of requests that have a latency less than the
<upper_bound>
valueDimensions:
job
: The name of the job that generated the histogramcontainer
: The name of the container in which the job is runningle
: The dimension that specifies the upper boundary of a histogram bucket
The following table represents this input stream:
Metric name | Metric type | Metric Value | Dimensions |
---|---|---|---|
request_latency_bucket | counter | 10 | - job: job1 - container: container1 - le: 100 |
request_latency_bucket | counter | 20 | - job: job1 - container: container1 - le: 500 |
request_latency_bucket | counter | 20 | - job: job1 - container: container1 - le: Inf |
- The first row is the MTS/data point for the 10 requests that have a latency less than or equal to 100.
- The second row is the MTS/data point for the 30 requests have a latency less than or equal to 500 (10 less than 100, and 20 between 100 and 500).
- The last row is the MTS/data point for all 50 requests. The default infinity bucket value that the function looks for is
Inf
, but you can override this by using theinfinity_bucket_name
parameter.
Calculate the percentile
The following SignalFlow program calculates the 50th percentile (the median) of requests in the input stream:
data('request_latency_bucket').histogram_percentile(pct=50, bucket_name='le').publish()
The function calculates the percentile using linear interpolation.
histogram_percentile()
considerations
- If the percentile is within the last bucket in the histogram, with bucket value of
Inf
, the function returns the lower bound of that bucket. In the example, a call to `histogram_percentile(pct=90, bucket_name='le') returns 500. - The result of
histogram_percentile()
might be inaccurate if there are more than 10K MTS that meet the requirement of thedata()
block that generates the input tohistogram_percentile()
. Above 10K, the function does sampling, so some buckets might not be in the input stream.
Apply histogram_percentile()
to an aggregated stream
To apply histogram_percentile()
to an aggregated stream, the bucket_name
dimension must be present in the "group by" key in order to
preserve the bucket boundary. sum()
is the only function that makes sense when you perform an aggregation before you call
histogram_percentile()
, because sum()
aggregates the counts in the bucket across all entities you want to aggregate.
Consider the histogram from the previous example. To calculate the request latency for all jobs, use the following SignalFlow program:
data('request_latency_bucket').sum(by=['job', 'le']). histogram_percentile(pct=90, bucket_name='le').publish()
Apply histogram_percentile()
over time
As an example, in the previous example you can calculate the request latency over the last 10 minutes, using the following SignalFlow:
data('request_latency_bucket').sum(over='10m').histogram_percentile(pct=50, bucket_name='le').publish()
Apply histogram_percentile()
to an aggregated stream over time
You can use both types of aggregations together. For example:
data('request_latency_bucket').sum(by=['job', 'le']).sum(over='10m').histogram_percentile(pct=90).publish()