`histogram_percentile()`

Was this page helpful?

On this page

For an input stream that contains a fixed bucket histogram, histogram_percentile() calculates the specified percentile for the data in the histogram.

Syntax

histogram_percentile(pct=<percentile> [,bucket_name='<upper_bound>'][,infinity_bucket_name='<infinity_value>'])

Parameter definitions

Parameter	Type	Default value	Description
`<percentile>`	number	-	Required. Percentile value in the range `0` < `pct` <= `100`
`<upper_bound>`	string	`le`	Optional. Name of a dimension that specifies the upper bound of a histogram bucket.
`<infinity_value>`	string	`Inf`	Optional. Value of `<upper_bound>` for the bucket that contains all data above the bound.

Return value

Returns a stream that contains a metric time series (MTS) for each histogram in the input stream. Each MTS is a single data point with the metadata from the input histogram, except for the value of the <upper_bound> dimension. The stream computes the specified percentile from the histogram.

The histogram input stream

The input to histogram_percentile() is one or more histograms generated by a host or application. Each histogram is represented by a group of MTS. Each bucket is represented by a single MTS in the group, and each MTS contains a single data point. All buckets for a histogram have the same metric name, metric type, and dimensions, except for the <upper_bound> dimension that has the same name but different values. The values for this dimension represent the upper bound of each bucket in the histogram.

For example, a possible histogram might have three buckets. Each bucket represents the number of requests that have a latency less than a specified value. In the input stream, this histogram is a group of three MTS. All the MTS have the same metric name, metric type, and dimension names. Each individual MTS has a different value for the upper boundary of the corresponding histogram bucket. By default, the dimension name is le, but you can override this name by using the bucket_name parameter.

The input stream for this histogram would have these metrics and dimensions:

metric name: request_latency_bucket.
metric value: Number of requests that have a latency less than the <upper_bound> value
Dimensions:
- job: The name of the job that generated the histogram
- container: The name of the container in which the job is running
- le: The dimension that specifies the upper boundary of a histogram bucket

The following table represents this input stream:

Metric name	Metric type	Metric Value	Dimensions
`request_latency_bucket`	counter	10	- job: job1 - container: container1 - le: 100
`request_latency_bucket`	counter	20	- job: job1 - container: container1 - le: 500
`request_latency_bucket`	counter	20	- job: job1 - container: container1 - le: Inf

The first row is the MTS/data point for the 10 requests that have a latency less than or equal to 100.
The second row is the MTS/data point for the 30 requests have a latency less than or equal to 500 (10 less than 100, and 20 between 100 and 500).
The last row is the MTS/data point for all 50 requests. The default infinity bucket value that the function looks for is Inf, but you can override this by using the infinity_bucket_name parameter.

Calculate the percentile

The following SignalFlow program calculates the 50th percentile (the median) of requests in the input stream:

data('request_latency_bucket').histogram_percentile(pct=50, bucket_name='le').publish()

The function calculates the percentile using linear interpolation.

`histogram_percentile()` considerations

If the percentile is within the last bucket in the histogram, with bucket value of Inf, the function returns the lower bound of that bucket. In the example, a call to `histogram_percentile(pct=90, bucket_name='le') returns 500.
The result of histogram_percentile() might be inaccurate if there are more than 10K MTS that meet the requirement of the data() block that generates the input to histogram_percentile(). Above 10K, the function does sampling, so some buckets might not be in the input stream.

Apply `histogram_percentile()` to an aggregated stream

To apply histogram_percentile() to an aggregated stream, the bucket_name dimension must be present in the "group by" key in order to preserve the bucket boundary. sum() is the only function that makes sense when you perform an aggregation before you call histogram_percentile(), because sum() aggregates the counts in the bucket across all entities you want to aggregate.

Consider the histogram from the previous example. To calculate the request latency for all jobs, use the following SignalFlow program: data('request_latency_bucket').sum(by=['job', 'le']). histogram_percentile(pct=90, bucket_name='le').publish()

Apply `histogram_percentile()` over time

As an example, in the previous example you can calculate the request latency over the last 10 minutes, using the following SignalFlow: data('request_latency_bucket').sum(over='10m').histogram_percentile(pct=50, bucket_name='le').publish()

Apply `histogram_percentile()` to an aggregated stream over time

You can use both types of aggregations together. For example:

data('request_latency_bucket').sum(by=['job', 'le']).sum(over='10m').histogram_percentile(pct=90).publish()

histogram_percentile()