How to work with data models and pivots in the Splunk SDK for JavaScript

Starting in Splunk® Enterprise 6, you can use data models to create specialized searches of your datasets. Data models allow you to produce pivot tables, charts, and visualizations on the fly, based on column and row configurations that you select, and without necessarily having to know the Splunk Enterprise search language.

The Splunk SDK for JavaScript includes support for data models and pivots. Using the SDK, you can enable your JavaScript application to read and use data models that have been created on your Splunk Enterprise instance. You can also use the SDK to create data models, but the easiest way to create new data models continues to be logging directly into Splunk Enterprise with a browser.

This topic contains the following sections:

About data models

Data models map semantic knowledge about one or more datasets. The data model encodes the domain knowledge that is necessary to generate specialized searches of those datasets. Data models are what enable you to use pivots to produce useful reports and dashboards without having to write the searches that generate them. Data models contain data model objects, which are essentially specifications for a dataset. Each data model object represents a different dataset within the larger set of data that Splunk Enterprise indexes.

Data model objects inherit from other data model objects. Each object inherits either from one of three base objects built into Splunk Enterprise (BaseEvent, BaseTransaction, and BaseSearch) or from another object in the same data model. Allowing inheritance only from a fixed set of base classes or in the data model prevents complications involving access control lists (ACLs).

To learn more about data models, see the following topics:

Create a data model

To create an individual data model, you first get the collection of all data models that are accessible to a user with your credentials: After you've connected to Splunk Enterprise, use the dataModels() method of your Service object to retrieve a DataModels collection (which extends the Collection class) that contains the set of data models. Then, use the create() method of the DataModels collection to create a data model (DataModel object) with the following three parameters:

  • name - The name of the data model to create
  • params - A dictionary of parameters that define the data model
  • callback - A callback function to call on error or once the data model has been created on the Splunk Enterprise server

Following is an example that creates a data model called "MyDataModel:"

var dataModelsCollection = service.dataModels(); 
dataModelsCollection.create("MyDataModel", {some: "parameters"}, function (err, dataModel) {  
    ...
});

Retrieve a data model

To retrieve an individual data model, you first get the collection of all data models that are accessible to a user with your credentials. After you've connected to Splunk Enterprise, the dataModels() method of your Service object to retrieve a DataModels collection containing the set of data models. Then, use the fetch() method to reload the object from the server. Next, use the item("data_model") method of the DataModels collection to retrieve the data model with the name specified to the item() method.

Following is an example that retrieves the data model called "internal_audit_logs," which is part of every standard Splunk Enterprise install.

service.dataModels().fetch(function(err, dataModels) { 
    var dataModel = service.dataModels().item("internal_audit_logs");
    ...
}

See the contents of a data model

A data model consists of several metadata fields (its internal name, its human-readable name, and a description), a value that indicates whether acceleration has been enabled (For more information about accelerating data models, see "Accelerate data models and pivots."), and a collection of data model objects.

Retrieve the data model's internal name using the DataModel class' name property. Retrieve its display name using the displayName property. Retrieve its description using the description property. For example:

console.log("Data model named " + dataModel.displayName + " (internal name: " + dataModel.name + ")");
console.log("Description: ");
console.log(dataModel.description);

Retrieve a data model object

The objects in the data model are structured views on a Splunk Enterprise index, and are represented by the DataModelObject class. You can check for and retrieve individual data model objects by name from a data model by using the hasObject() and objectByName() methods of the DataModel object.

Following is an example of the objectByName() method in use:

var searches = dataModel.objectByName("searches");
console.log("Object: " + searches.displayName + " (internal name: " + searches.name + ")");

Work with data model objects

Data model objects are hierarchical; they are arranged in parent-child relationships. The top-level object in any object hierarchy is referred to as a root object. Any object that descends from a root object is a child object.

Child objects inherit calculations and fields from their parent objects. Calculations narrow down the set of data represented by the object, while fields are name/value pairs associated with the object dataset. Each child object can add calculations to the ones it inherits. Fields are used by Pivot designers to define pivot tables and charts. Child objects can optionally have new fields in addition to the fields they inherit from their parent object.

A data model object is uniquely identified by the full list of ancestors from which it inherits. You can get the lineage of an object with the DataModelObject class' lineage property, as demonstrated here, using the join() method to join together ancestors with a "->" symbol:

console.log("\t Lineage:", searches.lineage.join(" -> "));

Fields are represented in the Splunk SDK for JavaScript by instances of the DataModelField class. You can retrieve a field by name with the fieldByName() method of the DataModelObject class, or retrieve an array of all the fields defined on the object with the fieldNames() method. Each field contains a type (DataModelField.type) with one of the following values:

  • string
  • number
  • timestamp
  • IPv4 address
  • Boolean value
  • one of two internal types: object count and child count

You can check whether a field is of any of the preceding types by using the following helper functions, respectively:

You can also retrieve certain attributes of a data model field using the following properties of your DataModelField object:

  • fieldName (string): The internal name of this field.

  • displayName (string): A human readable name for this field.

  • type (string): The type of this field.
  • multivalue (Boolean): Whether this field is multivalued.

  • required (Boolean): Whether this field is required on events in the object

  • hidden (Boolean): Whether this field should be displayed in a data model UI.

  • editable (Boolean): Whether this field can be edited.

  • comment (string): A comment for this field, or null if there is none.

  • fieldSearch (string): A search query fragment for this field.

  • lineage (string): The lineage of the data model object on which this field
is defined. Items are delimited by a dot. This is converted into an array of
strings upon construction.

A group of people who are working with sets of similar Splunk Enterprise queries can define a data model object that encapsulates the shared prefix of their various queries, and then use the data model's search command to execute it followed by the rest of their particular query. You can think of a data model object as a kind of stored procedure.

This is a common enough scenario that the data model object provides the startSearch() method (on the DataModelObject class) for it:

searches.startSearch({}, "| stats count by user | sort -count | head 5", done);
Note: Be aware that, any time you see done as a parameter in examples, the done parameter represents a callback function.

Work with pivots

The Splunk SDK for JavaScript provides a pivot table interface to the events in data model objects. You create pivots from data model objects. The SDK gives you the same control over pivots that the pivot tool does within Splunk Enterprise. A pivot is created in several stages.

Note: For a full working example of pivots using the Splunk SDK for JavaScript, see the pivot_async.js example in the examples/node/helloworld/ directory of the SDK.

First, you define a PivotSpecification instance, setting the fields to use to split rows, split columns, or calculate aggregates for each cell in the table. Then call the DataModelObject object's createPivotSpecification() method to create a new PivotSpecification object.

Then, call the pivot() method on the new PivotSpecification object. That sends a request to the Splunk Enterprise server to get a set of SPL queries that represent that pivot, which you can then use however you want. For example, using the searches data model object we defined in "Retrieve a data model object.":

// create a specification of a pivot on the searches 
// data model object we retrieved previously
var pivotSpecification = searches.createPivotSpecification();

// configure pivotSpecification and generate the pivot
pivotSpecification
    .addRowSplit("user", "Executing user")
    .addRangeColumnSplit("exec_time", {limit: 4})
    .addCellValue("search", "Search Query", "values")
    .pivot(function(pivotErr, pivot) {
        console.log("Pivot search is:", pivot.search);
    }
    .run(done);

Configuring the pivot consists of adding four kinds of entities to it:

  • Filters restrict the events to be calculated on in the pivot.
  • Cell values describe an aggregate calculation to be done.
  • Row splits describe how to split the data along one axis before aggregating it in the cell values.
  • Column splits describe how to split the data along the other axis.

The entities are added to a PivotSpecification by calling any of the following methods:

The arguments to each depend on the type of field to be added. Each of these methods is examined in detail in the following sections.

Filters

Filters restrict the events that will be processed by the pivot. They are added by invoking either the addFilter() or addLimitFilter() method.
The addFilter() method adds a Boolean filter on a data model field. The filter is in the form of a constraint. The addFilter() method takes four parameters, in the following order:

  • fieldName: The name of the data model field on which to filter.
  • comparisonType: The type of the field to compare ("string", "number", "timestamp", "objectCount", "childCount", "ipv4", or "boolean").
  • comparisonOp: The comparison operator, such as "=". The comparison operator you can use depends on the comparisonType. See Comparison Types  for lists of available operators.
  • compareTo: The value to which to compare the field.

For instance, pivotSpecification.addFilter("field1", "ipv4", "is", false) would add a filter consisting of the constraint ipv4 is false for the field field1.

The addLimitFilter() method adds a limit on the events shown in a pivot by sorting them according to a specified field, and then taking the specified number of events from the beginning or end of the list. The addLimitFilter() method takes five parameters, in the following order:

  • fieldName: The name of the data model field on which to filter. (Only a field with a type of "string", "number", or "objectCount" can have a limit filter added to it.)
  • sortAttribute: The name of the field to use for sorting.
  • sortDirection: The direction to sort events ("ASCENDING", "DESCENDING", or "DEFAULT").
  • limit: The number of values from the sorted list to allow through this filter.
  • statsFunction: The stats function to use for aggregation before sorting ("count", "dc" (for DISTINCT_COUNT), "average" or "sum"; not all stats functions work with all field types).

For example, the following filter limits the number of distinct values of host to allow, sorted by aggregating the number of users from each host. The filter counts the distinct users that have produced searches from each host, sorts the hosts from largest number of distinct users to smallest, and only admits events with the top 50 hosts into the pivot.

pivotSpecification.addFilter("host", "user", "DESCENDING", 50, "dc");

Cell values

The cells of a pivot table consist of aggregate calculations done on the events that pass through the filters, and which are assigned to that cell by row and column splits. There is only one method for adding cell values (addCellValue()), but not all the aggregating functions defined in the SDK work with all cell values. The functions that can be used with each type are:

  • string: "list", "values" (DISTINCT_VALUES), "first", "last", "count", "dc" (DISTINCT_COUNT)
  • IPv4: (same as string)
  • number: "sum", "count", "average", "max", "min", "stdev", "list", "values" (DISTINCT_VALUES)
  • timestamp: "duration", "earliest", "latest", "list", "values" (DISTINCT_VALUES)
  • childcount or objectcount: "count"
  • Boolean: none. You cannot use Boolean valued fields as cell values.

The addCellValue() method takes the following three parameters:

  • fieldName: The name of the field to aggregate.
  • label: A human-readable name for this aggregate.
  • statsFunction: The function to use for aggregation. See Stats functions for a list of valid functions.

Here are two examples of adding cell values to the PivotSpecification object pivotSpecification, which we retrieved in "Work with pivots.

pivotSpecification
    .addCellValue("host", "Relevant hosts", "values")
    .addCellValue("exec_time", "Longest running job", "max")
    ....;

Row splits

Row splits divide the data in a pivot table into rows before aggregates are calculated for each cell. A PivotSpecification object with more than one row split will result in separate rows for each combination of distinct splits for each row split. So if we have one row split that produces two rows, abcd=0 and abcd=1, and another that produces two, wxyz=a and wxyz=b, then there would be four rows in the split, (abcd=0, wxyz=a), (abcd=1, wxyz=a), (abcd=0, wxyz=b), and (abcd=1, wxyz=b).

Row splits are added to PivotSpecification objects with the following methods:

  • addRowSplit(): Adds a row split on a numeric or string valued field, splitting on each distinct value of the field.
  • addBooleanRowSplit(): Adds a row split on a Boolean valued field.
  • addRangeRowSplit(): Adds a row split on a numeric field, splitting into numeric ranges.
  • addTimestampRowSplit(): Adds a row split on a timestamp valued field, binned by the specified bucket size.

To split the row for each distinct value of a field, for fields of type string or number, use the addRowSplit(fieldName, label) method, where fieldName is the name of the field to split and label is a human-readable label to display in a visual representation.

To add row splits on Boolean-valued fields, use the addBooleanRowSplit(fieldName, label, trueDisplayValue, falseDisplayValue) method, which takes the same fieldName and label arguments, but which also requires two arguments (trueDisplayValue and falseDisplayValue) for the labels to display in each row if the field value is true or false.

To add row splits on number-valued fields, use the addRangeRowSplit(fieldName, label, options) method, which splits number-valued fields into ranges similar to the way in which timestamp-valued fields are split. In addition to the fieldName and label arguments, this method also takes an optional dictionary of collection filtering and pagination options (all integers):

  • start: The value of the start of the first range, or null to take the lowest value in the events.
  • end: The value for the end of the last range, or null to take the highest value in the events.
  • step: The width of each range, or null to indicate that Splunk Enterprise will calculate it.
  • limit: The maximum number of ranges to split into, or null for no limit.

For example, if you want to only bin values between 0 and 100 and have no more than ten bins, you would call:

.addRangeRowSplit("exec_time", "myFieldLabel", {start:0, end:12, step:5, limit:4});

If you want bins that are 15 wide and started at 12, you would call:

.addRangeRowSplit("exec_time", "myFieldLabel", {start:12, end:null, step:15, limit:null});

Finally, to add row splits on timestamp-valued fields, use the addTimestampRowSplit(fieldName, label, binning) method, which splits the field's (with the name fieldName and the label label) values into ranges of a precision specified by binning. Valid values for the binning parameter are "auto", "year", "month", "day", "hour", "minute", and "second".

Column splits

Column splits are the complement to row splits. They divide events that pass through the filters into sets before aggregates are calculated for each cell. The methods for column splits are identical to those for row splits, but they lack a label argument:

Accelerate data models and pivots

Data models take advantage of the built-in support for accelerated searches and aggregations in Splunk Enterprise 6. Within Splunk Enterprise, acceleration entails running a search job on a regular schedule and caching its results for use in the data model and any pivots on the data model. Acceleration is enabled with the Splunk SDK for JavaScript by setting the acceleration property to true when creating a DataModel object. Be aware that only public data models can be accelerated.

To check whether a data model has acceleration enabled, use the DataModel object's isAccelerated() method.

To update a data model's acceleration settings, use the DataModel object's update() method. The update() method takes the props parameter, which is a dictionary that specifies acceleration properties. The props object can contain any of the following keys (any keys not set will remain unchanged on the data model):

  • enabled: Whether acceleration should be enabled after update.
  • earliestTime: A time modifier (e.g., "-2mon") indicating the earliest time relative to now that the acceleration cache should be maintained (for example, the last week, the last two months, or the last three hours.
  • cronSchedule: The cron schedule on which the acceleration job should be run. For example, "0 0 * * *".

Enabling acceleration on a data model will accelerate all objects that inherit from BaseEvent in the data model. Objects that inherit from BaseTransaction or BaseSearch cannot be accelerated, and will be unaffected by enabling acceleration.

You can also do ad hoc acceleration, running and managing the caching job yourself. Call the createLocalAccelerationJob() method on a particular data model object to return an acceleration job, as demonstrated here. Be aware that you must cancel the ad hoc job when you are finished querying the results.

var service = ...; // Get a logged in Splunk service with the Splunk SDK for JavaScript

service.dataModels().fetch(function(err, dataModels) {
    var searches = dataModels.item("internal_audit_logs").objectByName("searches");
    var pivotSpec = searches.createPivotSpecification();

    searches.createLocalAccelerationJob("-1d", function(err, accelerationJob) {
        pivotSpec.addRowSplit("user", "Executing user")
            .setAccelerationJob(acclerationJob) // Or, can pass in the SID by passing in accelerationJob.sid
            .addRangeColumnSplit("exec_time", {start: 0, end: 12, step:5, limit:4})
            .addCellValue("search", "Search Query", "values")
            .pivot(function(pivotErr, pivot) {
                console.log("Pivot search is:", pivot.search);
                pivot.run({}, function(err, job) {
                    console.log("We have a job with SID:", job.sid);
                    // Cancel the local acceleration job when done with it
                });
            });
    });
});

Data model example

The following code demonstrates how to perform a few basic actions with one of a data model's objects. The example:

  1. Connects to Splunk Enterprise. (The full details of this step are not shown, but they are provided in "How to connect to Splunk Enterprise.")
  2. Retrieves a data model (in this case, "internal_audit_logs," which is included with each standard Splunk Enterprise install).
  3. Retrieves the "searches" data model object from the query.
  4. Runs a query on the data model object, appending "| head 5" to the query to return just the first five events.
  5. Tracks the job for completion.
  6. Reads the results of the query, and then prints the results to the console.
  7. Creates a pivot specification by using the data model object as input.
  8. Splits the pivot's events into groups with a distinct user and no more than four execution time ranges, specifying a list of distinct search queries for each cell.
  9. Prints the human-readable Splunk Processing Language (SPL) query that implements the pivot.
  10. Runs the pivot query, and retrieves the pivot's search queries.
  11. Reads the results of the pivot query, and then prints the results to the console.
Note: For a full working example of using data models and pivots using the Splunk SDK for JavaScript, see the pivot_async.js example in the examples/node/helloworld/ directory of the SDK.
var service = new splunkjs.Service({
    username: username,
    password: password,
    scheme: scheme,
    host: host,
    port: port,
    version: version
});
 
var searches; // We'll use this later
 
Async.chain([
        // First, we log in.
        function(done) {
            service.login(done);
        },
        
        function(success, done) {
            if (!success) {
                done("Error logging in");
            }
 
            // Now that we're logged in, let's get the data models collection
            service.dataModels().fetch(done);
        },
        function(dataModels, done) {
            // ...and the specific data model we're concerned with
            var dm = dataModels.item("internal_audit_logs");
            // Get the "searches" object out of the "internal_audit_logs" data model
            searches = dm.objectByName("searches");
 
            console.log("Working with object", searches.displayName,
                "in model", dm.displayName);
 
            console.log("\t Lineage:", searches.lineage.join(" -> "));
            console.log("\t Internal name: " + searches.name);
 
            // Run a data model search query, getting the first 5 results
            searches.startSearch({}, "| head 5", done);
        },
        function(job, done) {
            job.track({}, function(job) {
                job.results({}, done);
            });
        },
        function(results, job, done) {
            // Print out the results
            console.log("Results:");
            for (var i = 0; i < results.rows.length; i++) {
                var rowString = " result " + i + ":  ";
                var row = results.rows[i];
                for (var j = 0; j < results.fields.length; j++) {
                    if (row[j] !== null && row[j] !== undefined) {
                        rowString += results.fields[j] + "=" + row[j] + ", ";
                    }
                }
                console.log(rowString);
                console.log("------------------------------");
            }
            
            var pivotSpecification = searches.createPivotSpecification();
            // Each function call here returns a pivotSpecification so we can chain them
            pivotSpecification
                .addRowSplit("user", "Executing user")
                .addRangeColumnSplit("exec_time", {limit: 4})
                .addCellValue("search", "Search Query", "values")
                .run(done);
        },
        function(job, pivot, done) {
            console.log("Query for binning search queries by execution time and executing user:");
            console.log("\t", pivot.prettyQuery);
            job.track({}, function(job) {
                job.results({}, done);
            });
        },
        function(results, job, done) {
            // Print out the results
            console.log("Results:");
            for (var i = 0; i < results.rows.length; i++) {
                var rowString = " result " + i + ":  ";
                var row = results.rows[i];
                for (var j = 0; j < results.fields.length; j++) {
                    if (row[j] !== null && row[j] !== undefined) {
                        rowString += results.fields[j] + "=" + row[j] + ", ";
                    }
                }
                console.log(rowString);
                console.log("------------------------------");
            }
            job.cancel(done);
        }
    ],
    function(err) {
        if (err) {
            console.log("ERROR", err);
            callback(err);
        }
        callback(err);
    }
);

Reference

Comparison operators

This section details the comparison operators that are available when you are adding filters to pivots. For more information, see "Filters" in "Work with pivots".

Following are valid comparison operators for Boolean types:

  • "="
  • "is"
  • "isNull"
  • "isNotNull"

Following are valid comparison operators for string types:

  • "="
  • "is"
  • "isNull"
  • "isNotNull"
  • "contains"
  • "doesNotContain"
  • "startsWith"
  • "endsWith"
  • "regex"

Following are valid comparison operators for number types:

  • "="
  • "!="
  • "<"
  • ">"
  • "<="
  • ">="
  • "is"
  • "isNull"
  • "isNotNull"

Following are valid comparison operators for IPv4 types:

  • "is"
  • "isNull"
  • "isNotNull"
  • "contains"
  • "doesNotContain"
  • "startsWith"

Stats functions

This section details the comparison operators that are available when you are configuring pivots. For more information, see "Work with pivots".

  • "list"
  • "values"
  • "first"
  • "last"
  • "count"
  • "dc"
  • "sum"
  • "average"
  • "max"
  • "min"
  • "stdev"
  • "duration"
  • "earliest"
  • "latest"