Join the upcoming Developer Hackathon: Splunk Build-a-thon!Register now

 Analyze data using SignalFlow

Use SignalFlow, the Splunk Observability Cloud statistical computation engine, to analyze incoming data and write custom chart and detector analytics.

Info Circle

When you send an API request, you need to use the endpoint specific to your organization's realm. To learn more, see Realms in endpoints.

SignalFlow has the following features:

  • SignalFlow background computation engine: Runs SignalFlow programs in the background and streams results to charts and detectors
  • SignalFlow programming language: Python-like language that you use to write SignalFlow programs
  • SignalFlow library: Functions and methods that you call from a SignalFlow program

 SignalFlow programs

SignalFlow programs produce the output data streams used by charts and detectors. When you create a chart or detector using the API, you specify a SignalFlow program as part of the request. You can also run SignalFlow programs using the SignalFlow API. Programs run in the background, and you receive the results asynchronously in your client.

 Examples of SignalFlow programs

The following SignalFlow program displays two output data streams in a chart.

#
# Specifies a filter for the stream of incoming metrics
# Selects metrics with the dimension serverType="app" and
# the dimension env ≠ "qa" 
app_servers = filter("serverType", "app") and not filter("env", "qa");
# 
# For the filtered input stream retrieved by the data() function, 
...

This program demonstrates the following SignalFlow programming features:

  • Built-in functions like data() that provide real-time incoming data in streams.
  • Positional and keyword arguments for functions. For example, this program uses filter() with positional arguments to define a filter for server types. It also uses data() with the filter keyword argument to apply a filter to the input stream.
  • Support for comments delimited from program code by the # character. All characters from # to the end of a line are ignored.

The following curl command demonstrates how to start a computation that calculates average CPU utilization over all servers.

$ curl \
    --request POST \
    --header "Content-Type: text/plain" \
    --header "X-SF-Token: <SESSION_TOKEN>" \
    --data "data('cpu.utilization').mean().publish()" \
    https://stream.<REALM>.signalfx.com/v2/signalflow/execute

The result is streaming SSE messages. For more information:

 SignalFlow syntax

SignalFlow's syntax is similar to the syntax of Python. Although some Python constructs are supported, others don't work, or behave in a different way. For example, SignalFlow lets you assign a value to its variable only once.

For more details on SignalFlow syntax and SignalFlow programming best practices, see Intermediate to advanced SignalFlow on GitHub.

 Computation behavior

When you run a SignalFlow program using the SignalFlow API, the computations run continually on streams of data. The program has a start and stop time and a specific resolution:

  • If you don’t provide a start time when you start the computation, SignalFlow defaults to the current time.
  • If you don’t provide a stop time, the computation continues indefinitely.
  • A computation's resolution is the time interval at which SignalFlow processes data and generates output from the computation. At the completion of each interval, each published stream in the computation emits results.

If you choose a start time that tells SignalFlow to include historical data, you receive output as quickly as the SignalFlow analytics engine can calculate it. Computations that use a start time of "now" send out data in real time at intervals controlled by the resolution of the computation.

 Resolution of computations

The resolution of a computation is normally set by SignalFlow to match the incoming frequency of the data. You can ask for a minimum resolution, which forces the computation to operate at a coarser resolution than the one that SignalFlow assigns. You can’t force SignalFlow to make computations at a resolution that’s finer than the incoming frequency of the data, which protects against invalid results.

For example, if you force SignalFlow to calculate a sum at twice the incoming data resolution, the operation adds each data point to the result twice.

Regardless of the resolution you choose, SignalFlow rolls up data to the resolution of the computation using an algorithm determined by the input data metric type. You can override this with the rollup parameter of the data() built-in function.

 Input lag management

Because Splunk Observability Cloud is a real-time system, it can’t wait for all the input data for a computation to arrive, because some data might be blocked indefinitely. For example, a host might go offline, delaying its data indefinitely. The latency of the connection between a host and Splunk Observability Cloud might also cause significant delays in receiving data.

To account for these delays, Splunk Observability Cloud continually measures, tracks, and evaluates data input lag. From the results of this evaluation, Splunk Observability Cloud establishes a deadline after which the computation continues, even if data is missing. This deadline lets the computation continue for all the available data, while minimizing perceived lags.

You can specify a fixed delay override instead of relying on the system to calculate the delay. Splunk Observability Cloud provides the following delay overrides:

 MaxDelay input lag override

MaxDelay delays a computation for up to a specified number of milliseconds. If all the data for the computation arrives before the specified duration, the computation continues immediately after it has received all data. If some data is still missing after the specified duration, the computation continues without the missing data.

If you expect your data to arrive with an unpredictable variance in lag, and you want to ensure the computation isn't delayed more than a certain duration, set MaxDelay to that duration.

The maxDelay property is available in the Charts, Detectors, and SignalFlow APIs:

  • Charts API - The request and response bodies for these methods include maxDelay:

    • Create Chart: POST /v2/chart
    • Update Chart: PUT /v2/chart/<CHART_ID>

    The GET methods for the API return the value of maxDelay for an existing chart.

  • Detectors API - The request and response bodies for these methods include maxDelay:

    • Create Detector: POST /v2/detector
    • Update Detector: PUT /v2/detector/<DETECTOR_ID>
    • Validate Detector: POST /v2/detector/validate

    The GET methods for the API return the value of maxDelay for an existing detector.

  • SignalFlow API - Use language-specific SDKs

    • Start a SignalFlow program (execute)
    • Preview the number of alerts generated by a detector SignalFlow program (preflight)

 MinDelay input lag override

MinDelay delays a computation for at least a specified number of milliseconds. Even if all the data for this computation arrives before the specified duration, the computation doesn't continue until after the duration.

If the computation receives data sporadically, Splunk Observability Cloud might not have enough data to calculate a correct delay value. Set MinDelay to ensure that Splunk Observability Cloud waits a specified duration before continuing. Setting MinDelay increases the likelihood of receiving sporadic data.

The minDelay property is only available for the following methods of the Detectors API:

  • Create Detector: POST /v2/detector
  • Update Detector: PUT /v2/detector/<DETECTOR_ID>
  • Validate Detector: POST /v2/detector/validate

The GET methods for the API return the value of minDelay for an existing detector.

 Input streams

SignalFlow computations operate on a stream of data coming from the systems you’re monitoring. In SignalFlow, you specify streams as queries that return data, usually from more than one metric time series.

Splunk Observability Cloud creates a metric time series for each combination of metric and dimension-value pair. For example, the metric cpu.utilization and dimension hostname=server1 defines a unique metric time series.

Because the query for a stream usually matches multiple metric time series, the stream itself consists of one or more arrays of data points, one datapoint for each metric time series and one array for each unit of time.

The following examples show you how specify and publish streams:

#Specify a stream
data("cpu.utilization");
data("cpu.*");

#Make the output visible
data("cpu.utilization").publish();
data("cpu.*").publish();

By default, a stream doesn’t output data points from a computation. To make its output visible, you have to publish the output using the publish() method.

The data points in a stream contain both the actual metric value and contextual metadata for the value. SignalFlow functions let you filter an input stream based on the metadata or publish the metadata itself.

 Filters

Filters select the data points in an input stream, based on dimensions or properties of the metric time series. To create a filter, use the filter() function. The first argument to this function is the dimension or property name. The rest of the arguments are one or more values that form the search criteria for the filter. If you specify multiple values, SignalFlow joins them with an OR condition.

Info Circle

Query strings allow the multiple character * wildcard everywhere except the first position.

You can specify as many query arguments as you want. For example:

#Filter on metric timeseries that have a serverType of API
filter("serverType", "API");

#Filter on metric timeseries that have an aws_tag_Name that starts with api
filter("aws_tag_Name", "api*");

#Filter on metric timeseries with an api of either "login" or "logout"
#Notice that SignalFlow joins "login" and "logout" with an OR
filter("api","login","logout");

You can explicitly combine filter function calls into expressions using the AND, OR and NOT keywords. Operator precedence is the same as Boolean expressions. You can also use parentheses to force precedence.

For example:

#Simple AND expression that selects data with serverType of API and environment of QA
filter("serverType", "API") and filter("env", "QA");

#A more complex example. This filter expression selects data that matches all of the following criteria:
#1. serverType is db.core or any serverType that starts with db.staging AND serverType is *not* db.staging-qa
#2. api is login
(filter("serverType", "db.core", "db.staging*") and not filter("serverType", "db.staging-qa")) and filter("api", "login");

 Stream labels

You can label the output of a stream using the label argument of the publish() stream method. Although you don’t have to supply a stream label, using one lets you specify multiple output streams and display them differently when you publish them.

For example:

#Publish stream output without a label
data("cache.hits").sum(by="datacenter").mean(over="1h").percentile(90).publish();

#Publish stream output with the label "misses"
data("cache.misses").sum(by="datacenter").mean(over="1h").percentile(90).publish("misses");

To learn more about a specific function or method, see the SignalFlow functions and SignalFlow methods topics.

 SignalFlow analytical functions and methods

SignalFlow has a large library of built-in analytical functions and methods that take a stream as an input, perform computations on its data points, and output the result as a stream.

SignalFlow functions and methods provide the following types of analysis:

  • Aggregations: Apply a calculation across all the data in the array of data points for the incoming stream at one point in time. For example, you can calculate the average (mean) cpu utilization for a set of servers at a point in time by using the mean() stream method on an input stream.

  • Transformations: Apply a calculation on a metric time series within a window of time. For example, you can calculate the moving average of cpu utilization over a specific time window by using the mean() stream method and providing it with the over argument to specify a time range.

  • Calendar window transformations let you specify calculations based on date and time windows. These transformations help you compare the behavior of a system from one calendar time window to the next. For example, you can compare latency in the previous hour to the aggregated latency over the previous day.

    To learn more about this type of transformation, see Calendar window transformations.

 Difference between functions and methods

The primary difference between functions and SignalFlow methods is syntax. Functions take input arguments and return values, while methods implicitly act on the output stream object that they’re called on.

For an example highlighting the differences between functions and methods, see the following SignalFlow code:

# Calls the max() function on the input stream `cpu.utilization`.
# This returns the maximum value in the input stream MTS.
maxVal = max(`cpu.utilization`)

# Calls the ceil() method on the value returned by max(). 
# This returns the maximum value in the input stream MTS rounded up to the nearest integer.
maxValCeiling = max(`cpu.utilization`).ceil()
Info Circle

NOTE: Some operations work as both functions and methods. For example, ceil() can take an input stream or expression as an argument using ceil(<input stream>), or you can call ceil() on another value using <value>.ceil().

 Aggregations

Aggregation methods include:

 Aggregation examples

The following code demonstrates how to use aggregation functions:

#Overall CPU utilization average
data("cpu.utilization").mean().publish();

#95th percentile CPU utilization by AWS availability zone
data("cpu.utilization").percentile(95, by="aws_availability_zone").publish();

#CPU utilization average by service and instance type
data("cpu.utilization").mean(by=["service", "aws_instance_type"]).publish();

 Aggregation results

If you don’t specify arguments for an aggregation function, the output for the 1-by-n array of data points at time t is a single value. For example, if you apply sum() to an input stream that has 5 time series having the values of 5, 10, 15, 20, and 10 at time t, the output is a single value of 60.

 Grouping

Grouping the output from an aggregation function organizes the results into separate sets based on metadata values for the data points in the stream. To specify metadata on which to group, use the by keyword argument. You specify a single metadata with a string value, for example by='datacenter'. To specify multiple grouping fields, use a string array, for example by=['env', 'datacenter'].

For example, if the data points in the input stream have the dimension datacenter, and each data point has one of three different values datacenter1, datacenter2, or datacenter3, then calling data('cpu.utilization').mean(by='datacenter').publish() produces 3 values, each in a separate group. Each value represents the mean cpu utilization for data series that have the corresponding dimension value; that is, one value is mean cpu utilization for datacenter1, another value is mean cpu utilization for datacenter2, and the third value is mean cpu utilization for datacenter3.

Info Circle

If a time series in the input stream doesn’t have the dimension you specify in the by keyword of the aggregation function, it isn’t included in the aggregation calculation. As a result, by also functions as a filter.

 Handling missing metadata

In aggregation methods, you can use the optional boolean parameter allow_missing to require the presence or allow the absence of properties in the calculation:

  • If allow_missing is False or None (default value), all properties in the calculation must be present, and any missing property results in the corresponding time series being dropped.
  • If allow_missing is True, missing properties are ignored.

 Grouping and missing metadata

You can handle scenarios where some of the input time series might not have a particular property or dimension, but you still want to group by that property. Combine grouping and allow_missing to prevent time series that don't have a specific property to fall into a common bucket.

For example:

# Any grouped by property may be missing
.sum(by=..., allow_missing=True) 
# The given group by property may be missing
.sum(by=..., allow_missing='foo') 
# The given group by properties may be missing
.sum(by=..., allow_missing=['foo', 'bar']) 
...

 Transformations

 Rolling window transformations

Some methods have an option to perform a computation on an MTS over a period of time. For example, mean(over='10s') is the moving average of an MTS over 10 seconds.

The over parameter supports units of seconds('s'), minutes('m'), and hours('h').

SignalFlow provides rolling window transformations for the following methods:

 Calendar window transformations

Some methods also have an option to perform a computation over calendar intervals or windows, such as days, weeks, and months. For example, to calculate a daily average, use mean(cycle='day'). After every midnight, this calculation returns the mean of all data points from the previous day.

Note: Unlike mean(over='24h'), this is not a moving average.

SignalFlow provides calendar window transformations for the following stream methods:

 Specify calendar window transformations

You control calendar window transformations with these arguments:

  • cycle: Duration of the window

    For example, sum(cycle='day') tells SignalFlow to calculate the sum every calendar day.

    As a result, SignalFlow calculates the sum every calendar day starting at 00:00:00 hours of the current day and returns the value at 00:00:00 of the next calendar day.

    To do a calendar window transformation, you must at least specify cycle.

  • cycle_start: The offset from the default start of the cycle

    For example, sum(cycle='day', cycle_start="8h") starts calculating the sum at 08:00:00 of the current day, and returns the value at 08:00:00 the next calendar day.

    The expected values and defaults depend on the value of cycle; for example, the cycle_start default for 'day' is '00h'. The defaults for all values of cycle_start are listed in the table Cycle and cycle_start values

Info Circle

NOTE: For cycle='hour' the only valid value for cycle_start is '0m'.

  • shift_cycles: A number of cycles to shift backwards when reporting the value at the end of the current cycle.

    For example, with sum(cycle='day', shift_cycles='1'), the value reported at the end of the current day is the sum calculated over the previous day.

    One way to use this option is to compare results between cycles. Use sum(cycle='day') without shift_cycles to get the current cycle’s sum, and sum(cycle='day', shift_cycles='1') to get the previous cycle’s sum. You can then compare the two to get a day-over-day comparison.

Info Circle

If you specify shift_cycles, you must set partial_values=False.

  • partial_values: Flag that controls the return of values during the cycle.

    If partial_values=False, SignalFlow only returns a result at the end of the cycle, but if partial_values=True, SignalFlow also returns results during the cycle. The interval at which SignalFlow returns results depends on the resolution of the background job that SignalFlow uses to run the program. The default is False.

Info Circle

If you specify partial_values=True, you can’t specify shift_cycles.

 Cycle and cycle_start values

The value of cycle_start is related to the value of cycle. If you specify a value for cycle_start that isn’t permitted for the value of cycle you’re using, your request returns an error.

The permitted cycle and cycle_start values are summarized in the following table:

cycle valuecycle_start valuescycle_start description
'quarter'Starting month of the first quarter of the year. The value is a month between 'january' and 'december', inclusive. The default is 'january'.For example, if you use sum(cycle='quarter', cycle_start='march'), then Q1 is 'march' to 'may', Q2 is 'june' to 'august', and so forth.
'month'Starting day of the monthly cycle. The value is a day number between '1d' and '28d'. The default is '1d' and the maximum value is '28d'.Specifies the starting day of a cycle of one calendar month. The actual number of days in the cycle depends on the month in which the calculation is initiated; for example, if cycle_start='3d', you always get a value at the end of the 3rd day, regardless of the number of days in the month.
'week'Starting day name of the weekly cycle. The value is a day name between 'monday' and 'sunday' inclusive. 'monday' is the default.Specifies the starting day of a cycle of 7 consecutive days
'day'Starting hour of the daily cycle. The value is a two-digit number with leading zeros, followed by the letter h. The default is '00h'.'00h' represents the first moment in the day. 23h is the last hour of the day.
'hour'Starting minute of the hourly cycle. The default and only allowed value is '0m'.If you specify cycle='hour', the only value you can specify for cycle_start is '0m'. As an alternative, you can omit cycle_start, because it defaults to '0m'. If you specify any other value for cycle_start, the request returns an error.
Info Circle

When you use the operation POST /execute to run a program, and you specify a job start time, calendar windows don’t transform and report data until the job starts.

 Time zone for calendar window transformations

The calendar time zone controls how SignalFlow interprets calendar intervals and associates data points with them.

For example, consider a data point with a timestamp that’s 11 PM December 31 UTC:

  • You leave the calendar time zone set to UTC: SignalFlow includes the data point in the December 31 cycle.

  • You set your calendar time zone to Los Angeles: 11 PM UTC is 3 PM December 31 in Los Angeles. SignalFlow includes the data point in the December 31 cycle.

  • You set your calendar time zone to Tokyo: 11 PM UTC is 8 AM January 1 in Tokyo. SignalFlow includes the data point in the January 1 cycle.

Setting the timezone has no effect on a SignalFlow program that doesn’t use calendar window transformations.

 Supported SignalFlow time zones

The default SignalFlow time zone is Coordinated Universal Time (UTC). The time zones that SignalFlow accepts are a subset of the IANA Time Zone Database; for a list of accepted zones, see Time Zones.

 Set the SignalFlow time zone

To set the SignalFlow time zone for a program, using the API:

  • For a chart’s SignalFlow program: Set the options.programOptions.timezone property in the request body for the Create Chart or Update Chart operation. To learn more, see the REST API Reference topic Charts API.

  • For a detector’s SignalFlow program: Set the timezone property at the top level of the request body for the Create Detector or Update Detector operation. To learn more, see the REST API Reference topic Detectors API.

  • For a SignalFlow program that you run directly using the operation POST https://stream.<REALM>.signalfx.com/v2/signalflow/execute: Specify timezone=<TIMEZONE_VALUE> as a query parameter for the operation.

 Dashboard window transformations

Some methods have an option to perform a computation on an MTS over a dashboard window, a duration value that represents the time width of the chart. The default duration for dashboard window is 15m.

For example, mean(over=Args.get('ui.dashboard_window', '15m')) takes the default time range from the dashboard and represents the time width of the chart.

Note: Unlike mean(over='24h'), this is not a moving average.

SignalFlow provides dashboard window transformations for the following stream methods:

 Transformations and aggregations

Many analytical functions calculate both aggregations and transformations. If you don’t specify the over keyword argument in the function call, SignalFlow assumes that you want an aggregation. For example, mean() is an aggregation and mean(over='1h30m') is a transformation.

 Transformation examples

The following SignalFlow programs show you how to use transformations:

#5-minute moving average of the CPU utilization of all servers
data("cpu.utilization").mean(over="5m").publish();
#15-second moving average of the 95th percentile CPU utilization
#grouped by AWS availability zone
data("cpu.utilization").percentile(95,by="aws_availability_zone").mean(over="15s").publish();
##Sum of CPU utilization over one week, starting on Monday
...

 Other analytical functions

In addition to aggregations and transformations, SignalFlow provides analytical functions that perform other actions on streams. These include timeshift(), which retrieves a data point from a specified time offset in the past, and delta(), which performs actions on the current and previous data point, regardless of the time interval between them.

 map() and lambda functions

SignalFlow can calculate values in the midst of an incoming stream. To access this functionality, you call the map() method or write a lambda function. In most cases, you use these to perform arithmetic or algebraic calculations on all the values in a stream. SignalFlow only supports single-argument lambdas and only allows references to that argument.

#Calculate the maximum mean cpu.user by datacenter over an hour. Use the floor
#function to "round" the mean.
#With two variables
A = data("cpu.user").mean(by="datacenter");
floor(A).max(over="1h").publish();
#Use the map function
...

In addition to calculations, lambda functions allow if-else expressions, using the syntax

<TRUE_VALUE> if <TRUE_CONDITION> else <FALSE_VALUE>

Using the value None, you can use this type of lambda function to filter values from the output. For example:

#Filter if x is between 50 and 60
data("cpu.utilization").map(lambda x: x if x > 50 and x < 60 else None).publish();

This mechanism differs from the user interface, which provides the scale and exclude functions to perform actions that lambdas handle in a more generic fashion.

 Streams as variables

You can use streams as inputs to mathematical expressions in the same way that you use a variable in a formula. To do this, assign a stream name to a variable name at the beginning of the program. Because the result of a calculation on one or more stream objects is also a stream object, you can perform aggregations or transforms on it as well. To perform further analytics on a stream calculation (or to stream the results), wrap the calculation in parentheses.

#Inline stream calculation
(data("cache.hits") / data("cache.total") * 100).mean(over="1h").publish()';
#Using a stream as a variable
cache_hits = data("cache.hits").sum(by="datacenter");
cache_misses = data("cached.misses").sum(by="datacenter");
hit_pct = (cache_hits / (cache_hits + cache_misses) * 100).mean(over="1h");
hit_pct.publish("hit_pct");
...

 SignalFlow API

SignalFlow programs run asynchronously in background jobs. By default, Splunk Observability Cloud immediately starts jobs for SignalFlow programs in charts and detectors.

Splunk Observability Cloud also lets you run SignalFlow programs as background jobs using a REST HTTP endpoint or a WebSocket connection.

Whenever possible, use WebSocket as it offers these advantages:

  • WebSocket uses a single HTTP connection, regardless of the number of programs you run
  • The WebSocket protocol has lower overhead than REST, because messages are entirely in JSON.
  • Websocket can achieve substantially lower latency.

HTTP endpoints have the same functionality, and they’re more straightforward to add to your programs, but they require one HTTP connection per job.

Regardless of the request format you use, the SignalFlow API returns results in a message stream that contains

  • Execution information
  • Metadata for the time series being generated
  • Generated data and events
Info Circle

Due to the use of scientific notation in the API, the output of SignalFlow API calls doesn't exactly match what is output from SignalFlow programs in charts in the UI.

The format of this message stream depends on the request API you use:

  • REST API: SignalFlow returns Server-Sent Event (SSE) messages.
  • WebSocket API: SignalFlow returns WebSocket JSON messages.

The following topics provide reference information for the APIs:

 Using the SignalFlow API

 Connecting (WebSocket API only)

If you’re using the WebSocket API, establish a WebSocket connection with Splunk Observability Cloud before you authenticate. To do this, use the REST API operation GET https://stream.<REALM>.signalfx.com/v2/signalflow/connect. After you have the connection, you send an authentication token to Splunk Observability Cloud in an authentication WebSocket message.

 Authenticate SignalFlow API requests

To use the SignalFlow API, you need a session token.

If you’re using the REST API, send the token in the header for the operation POST https://stream.<REALM>.signalfx.com/v2/signalflow/execute that starts your computation job.

If you’re using the WebSocket API, send an authenticate message. You must do this within 5 seconds of connecting to Splunk Observability Cloud; otherwise your connection is dropped.

To learn more about authentication and tokens, see the topic Authentication Tokens.

 Execute background computations

To run a SignalFlow program in the background:

  • REST API: Use the operation Start SignalFlow Computation. The API returns results asynchronously using Server-Sent Event (SSE) messages.

  • WebSocket API: Send an execute message. The API returns results asynchronously using WebSocket messages.

 Detach from background jobs

To stop receiving output from a computation, detach from the background job:

  • REST API: Close the connection.
  • WebSocket API: Send a detach message.
Info Circle

Detaching from the WebSocket connection automatically closes all the channels that use that connection.

While WebSocket remains connected, Splunk Observability Cloud keeps your computation alive. If you disconnect using an end_of_channel message, Splunk Observability Cloud stops the computation. In addition, if Splunk Observability Cloud detects an error associated with the computation or connection, it sends you an abort_channel message and closes the computation.

 Client library support

To simplify the use of the APIs, the following GitHub client libraries are available:

The following older SignalFlow client libraries were migrated to the new ones and are now deprecated.

Info Circle

The SignalFlow Ruby client library is no longer supported.