How to get data into Splunk Enterprise using the Splunk SDK for JavaScript

Getting data into Splunk Enterprise involves taking data from inputs, and then indexing that data by transforming it into individual events that contain searchable fields. Here's a brief overview of how it all works.

Data inputs

A data input is a source of incoming event data. Splunk can index data from the following types of inputs:

  • Files and directories—the contents of files and directories of files. You can upload a file for one-time indexing (a oneshot input), monitor for new data, or monitor for file system changes (events are generated when the directory undergoes a change). Files and directories can be included using whitelists, and excluded using blacklists.
  • Network events—data that is received over network Transmission Control Protocol (TCP) and User Datagram Protocol (UDP) ports, such as data that is sent from a Splunk forwarder from a remote computer. TCP inputs are separated into raw (unprocessed) and cooked (processed) inputs, with SSL as an option for either type.
  • Windows data—data from Windows computers, which includes:
    • Windows event log data
    • Windows Registry data
    • Windows Management Instrumentation (WMI) data
    • Active Directory data
    • Performance monitoring (perfmon) data
  • Other data sources—data from custom apps, FIFO queues, scripts that get data from APIs, and other remote data interfaces and message queues.

Your data inputs and their configurations are saved in the inputs.conf configuration file.

Note: At this time, you can't use the Splunk SDK for JavaScript to create or modify data inputs. However, you can send events directly to an index.

Indexes

The index stores compressed, raw event data. When receiving data from your inputs, Splunk parses the data into events and then indexes them, as follows:

  • During parsing, Splunk extracts default fields, configures character-set encoding, identifies line termination, identifies timestamps (creating them if they aren't there), masks sensitive or private data, and can apply custom metadata. Parsing can be done by heavy forwarders. Universal forwarders do minimal parsing.
  • During indexing, Splunk breaks events into segments, builds the index data structures, and writes the raw data and index files to disk.

Splunk can usually determine the data type and handle the data accordingly. But when setting up new inputs, you might consider sending data to a test index first to make sure everything is configured the way you want. You can delete the indexed data (clean the index) and start over as needed. Event processing rules are set in the props.conf configuration file, which you'll need to modify directly if you want to reconfigure how events are processed.

Each index is stored as a collection of database directories (also known as buckets) in the file system, located in $SPLUNK_HOME/var/lib/splunk. Buckets are organized by age:

  • Hot buckets are searchable, actively being written to, one per index. Hot buckets roll to warm at a certain size or when splunkd is restarted, then a new hot bucket is created.
  • Warm buckets are searchable. Oldest warm buckets roll to cold when the number of warm buckets reaches a number limit.
  • Cold buckets are searchable. After a set period of time, cold buckets roll to frozen.
  • Frozen buckets are not searchable. These buckets are archived or deleted.

You can configure aspects such as the path configuration for your buckets. For example, keep the hot and warm buckets on a local computer for quick access, and put the cold and frozen buckets on separate disks for long-term storage. You can also set the storage size.

By default, data is stored in the main index, but you can add more indexes for different data inputs. You might want multiple indexes to:

  • Control user access. Users can search only in indexes they are allowed to by their assigned role.
  • Accommodate varying retention policies. Set a different archive or retention policy by index.
  • Speed searches in certain situations. Create dedicated indexes for each data source, search just in the index you want.

The index APIs

The classes for working with indexes are:

Access these classes through an instance of the splunkjs.Service class. Retrieve a collection, and from there you can access individual items in the collection and create new ones.

Code examples

This section provides examples of how to use the index APIs, assuming you first connect to a Splunk instance:

Here are the parameters for inputs and indexes:

To list indexes

This example shows how to retrieve and list the indexes that have been configured for Splunk, along with the number of events contained in each. For a list of available parameters to use when retrieving a collection, see Collection parameters.

// Get the collection of indexes
var myindexes = service.indexes();

// Iterate through the indexes and list them
myindexes.fetch(function(err, myindexes) {

  console.log("There are " + myindexes.list().length + " indexes");

  var indexcoll = myindexes.list();

  for(var i = 0; i < indexcoll.length; i++) {
    console.log(i + ": " + indexcoll[i].name);
  }
});

To create a new index

When you create an index, all you need to specify is a name. You can also specify additional properties for the index at the same time by providing a dictionary of key-value pairs (the possible properties are summarized in Index parameters). Or, modify properties after you have created the index.

Note: To be able to create and modify an index, the user's role must include those capabilites. For a list of available capabilities, see Capabilities.

This example shows how to create a new index.

Note: If you are using a version of Splunk earlier than 5.0, you can't delete indexes using the SDK or the REST API—something to be aware of before creating lots of test indexes.

// Get the collection of indexes
var myindexes = service.indexes();

// Create the index
myindexes.create("test_index", {}, function(err, newIndex) {
  console.log("The index was created");
});

To add data directly to an index

You can send events directly to an index without configuring a data input. First, retrieve an index using the splunkjs.Service.Index class, and then use either of the following methods (they accomplish the same thing) to send an event over HTTP:

You'll need to provide the event as a string, and you can also specify values to apply to the event (host, source, and sourcetype).

Here is an example of submitting an event over HTTP using Index.submitEvent:

// Get the collection of indexes
var myindexes = service.indexes();

// Get an index to send events to
myindexes.fetch(function(err, myindexes) {
  var myindex = myindexes.item("test_index");

  // Submit an event to the index
  myindex.submitEvent("A new event", {
    sourcetype: "mysourcetype"
  }, function(err, result, myindex) {
    console.log("Submitted event: ", result);
  });
});

To view and modify the properties of an index

This example shows how to view the properties of the index created in the previous example and modify its properties.

To access properties of an index, use the properties method of the index object along with the property's name (see Index parameters for a list of all the possible properties for an index).

To set properties, pass property key-value pairs to the entity's update method to make the changes on the server. Next, call the entity's fetch method to update your local, cached copy of the object with these changes.

Note: To be able to create and modify an index, the user's role must include those capabilites. For a list of available capabilities, see Capabilities.

// Get the collection of indexes
var myindexes = service.indexes();

// Get the index that was just created
myindexes.fetch(function(err, myindexes) {
  var myindex = myindexes.item("test_index");

  // Display some properties
  console.log("Name:                " + myindex.name);
  console.log("Current DB size:     " + myindex.properties().currentDBSizeMB + "MB");
  console.log("Max hot buckets:     " + myindex.properties().maxHotBuckets);
  console.log("# of hot buckets:    " + myindex.properties().numHotBuckets);
  console.log("# of warm buckets:   " + myindex.properties().numWarmBuckets);
  console.log("Max data size:       " + myindex.properties().maxDataSize);
  console.log("Max total data size: " + myindex.properties().maxTotalDataSizeMB + "MB");

  // Modify some properties
  myindex.update({
    maxTotalDataSizeMB: 1000
  }, function() {
    console.log("\n...properties were modified...")
  });

  // Create a small delay to allow time for the update between server and client
  splunkjs.Async.sleep(2000, function() {
    // Update the local copy of the object with changes
    myindex.fetch(function(err, myindex) {
      console.log("\nUpdated properties:");
      console.log("Max total data size: " + myindex.properties().maxTotalDataSizeMB + "MB");
    });
  });
});

Collection parameters

By default, all entities are returned when you retrieve a collection. But by using the parameters below, you can also specify the number of entities to return and how to sort them. These parameters are available whenever you retrieve a collection:

ParameterDescription
countA number that indicates the maximum number of entities to return.
offsetA number that specifies the index of the first entity to return.
searchA string that specifies a search expression to filter the response with, matching field values against the search expression. For example, "search=foo" matches any object that has "foo" as a substring in a field, and "search=field_name%3Dfield_value" restricts the match to a single field.
sort_dirAn enum value that specifies how to sort entities. Valid values are "asc" (ascending order) and "desc" (descending order).
sort_keyA string that specifies the field to sort by.
sort_modeAn enum value that specifies how to sort entities. Valid values are "auto", "alpha" (alphabetically), "alpha_case" (alphabetically, case sensitive), or "num" (numerically).

Index parameters

The parameters you can use for working with indexes correspond to the parameters for the data/indexes endpoint in the REST API.

The following parameters are available for indexes:

ParameterDescription
assureUTF8A Boolean that indicates whether all data retrieved from the index is in proper UTF8 encoding. When true, indexing performance is reduced. This setting is global, not per index.
blockSignatureDatabaseA string that specifies the name of the index that stores block signatures of events. This setting is global, not per index.
blockSignSizeA number that indicates how many events make up a block for block signatures. A value of 0 means block signing has been disabled for this index.
bloomfilterTotalSizeKBA number that indicates the total size of all bloom filter files, in KB.
bucketRebuildMemoryHintA string that contains a suggestion for the Splunk bucket rebuild process for the size of the time-series (tsidx) file to make.
coldPathA string that contains the file path to the cold databases for the index.
coldPath_expandedA string that contains an absolute path to the cold databases for the index.
coldToFrozenDirA string that contains the destination path for the frozen archive. Use as an alternative to the "coldToFrozenScript" parameter. The "coldToFrozenDir" parameter takes precedence over "coldToFrozenScript" if both are specified.
coldToFrozenScriptA string that contains the destination path to the archiving script. If your script requires a program to run it (for example, python), specify the program followed by the path. The script must be in $SPLUNK_HOME/bin or one of its subdirectories.
compressRawdataThis parameter is ignored.
currentDBSizeMBA number that indicates the total size of data stored in the index, in MB. This total includes data in the home, cold, and thawed paths.
defaultDatabaseA string that contains the index destination, which is used when index destination information is not available in the input data.
disabledA Boolean that indicates whether the index has been disabled.
eai:aclA string that contains the access control list for this input.
eai:attributesA string that contains the metadata for this input.
enableOnlineBucketRepairA Boolean that indicates whether to enable asynchronous online fsck bucket repair, which runs in a concurrent process with Splunk. When enabled, you do not have to wait until buckets are repaired to start Splunk. However, you might observe a slight performance degradation.
enableRealtimeSearchA Boolean that indicates whether real-time search is enabled. This setting is global, not per index.
frozenTimePeriodInSecsA number that indicates how many seconds after which indexed data rolls to frozen.
homePathA string that contains a file path to the hot and warm buckets for the index.
homePath_expandedA string that contains an absolute file path to the hot and warm buckets for the index.
indexThreadsA number that indicates how many threads are used for indexing. This setting is global, not per index.
isInternalA Boolean that indicates whether the index in internal.
lastInitTimeA string that contains the last time the index processor was successfully initialized. This setting is global, not per index.
maxBloomBackfillBucketAgeA string that indicates the age of the bucket. If a warm or cold bucket is older than this time, Splunk does not create (or re-create) its bloom filter. The valid format is number followed by a time unit ("s", "m", "h", or "d"), for example "5d".
maxConcurrentOptimizesA number that indicates how many concurrent optimize processes can run against a hot bucket.
maxDataSizeA string that indicates the maximum size for a hot bucket to reach before a roll to warm is triggered. The valid format is a number in MB, "auto" (Splunk auto-tunes this value, setting the size to 750 MB), or "auto_high_volume" (for high-volume indexes such as the main index, setting the size to 10 GB on 64-bit, and 1 GB on 32-bit systems).
maxHotBucketsA number that indicates the maximum number of hot buckets that can exist per index. When this value is exceeded, Splunk rolls the least recently used (LRU) hot bucket to warm. Both normal hot buckets and quarantined hot buckets count towards this total. This setting operates independently of "maxHotIdleSecs", which can also cause hot buckets to roll.
maxHotIdleSecsA number that indicates the maximum life, in seconds, of a hot bucket. When this value is exceeded, Splunk rolls the hot bucket to warm. This setting operates independently of "maxHotBuckets", which can also cause hot buckets to roll. A value of 0 turns off the idle check.
maxHotSpanSecsA number that indicates the upper bound, in seconds, of the target maximum timespan of hot and warm buckets. If this value is set too small, you can get an explosion of hot and warm buckets in the file system.
maxMemMBA number that indicates the amount of memory, in MB, that is allocated for indexing.
maxMetaEntriesA number that indicates the maximum number of unique lines in .data files in a bucket, which may help to reduce memory consumption. When set to 0, this parameter is ignored. When this value is exceeded, a hot bucket is rolled to prevent further increase.
maxRunningProcessGroupsA number that indicates the maximum number of processes that the indexer creates at a time. This setting is global, not per index.
maxTimeA string that contains the UNIX timestamp of the newest event time in the index.
maxTimeUnreplicatedNoAcksA number that specifies the upper limit, in seconds, on how long an event can remain in a raw slice. This value applies only when replication is enabled for this index.
maxTimeUnreplicatedWithAcksA number that specifies the upper limit, in seconds, on how long events can remain unacknowledged in a raw slice. This value applies only when acks are enabled on forwarders and replication is enabled (with clustering).
maxTotalDataSizeMBA number that indicates the maximum size of an index, in MB. If an index grows larger than the maximum size, the oldest data is frozen.
maxWarmDBCountA number that indicates the maximum number of warm buckets. If this number is exceeded, the warm buckets with the lowest value for their latest times are moved to cold.
memPoolMBA number that indicates how much memory is given to the indexer memory pool. This setting is global, not per index.
minRawFileSyncSecsA string that indicates how frequently splunkd forces a file system sync while compressing journal slices. This value can be either an integer or "disable". If set to 0, splunkd forces a file system sync after every slice has finished compressing. If set to "disable", syncing is disabled and uncompressed slices are removed as soon as compression is complete. Some file systems are very inefficient at performing sync operations, so only enable this if you are sure it is needed. During this interval, uncompressed slices are left on disk even after they are compressed, then splunkd forces a file system sync of the compressed journal and removes the accumulated uncompressed files.
minTimeA string that contains the UNIX timestamp of the oldest event time in the index.
nameA string that contains the name of the index.
numBloomfiltersA number that indicates how many bloom filters are created for this index.
numHotBucketsA number that indicates how many hot buckets are created for this index.
numWarmBucketsA number that indicates how many warm buckets are created for this index.
partialServiceMetaPeriodA number that indicates how often to sync metadata, in seconds, but only for records where the sync can be done efficiently in place, without requiring a full re-write of the metadata file. Records that require a full re-write are synced at the frequency specified by "serviceMetaPeriod". When set to 0 or a value greater than "serviceMetaPeriod", metadata is not partially synced, but is synced at the frequency specified by "serviceMetaPeriod".
quarantineFutureSecsA number that indicates a time, in seconds. Events with a timestamp of this value newer than "now" are dropped into a quarantine bucket. This is a mechanism to prevent main hot buckets from being polluted with fringe events.
quarantinePastSecsA number that indicates a time, in seconds. Events with timestamp of this value older than "now" are dropped into a quarantine bucket. This is a mechanism to prevent the main hot buckets from being polluted with fringe events.
rawChunkSizeBytesA number that indicates the target uncompressed size, in bytes, for individual raw slice in the raw data journal of the index. If set to 0, "rawChunkSizeBytes" is set to the default value. Note that this value specifies a target chunk size. The actual chunk size may be slightly larger by an amount proportional to an individual event size.
repFactorA string that contains the replication factor, which is a non-negative number or "auto". This value only applies to Splunk clustering slaves.
rotatePeriodInSecsA number that indicates how frequently, in seconds, to check whether a new hot bucket needs to be created, and how frequently to check if there are any warm or cold buckets that should be rolled or frozen.
serviceMetaPeriodA number that indicates how frequently metadata is synced to disk, in seconds.
summarizeA Boolean that indicates whether to omit certain index details to provide a faster response. This parameter is only used when retrieving the index collection.
suppressBannerListA string that contains a list of indexes to suppress "index missing" warning banner messages for. This setting is global, not per index.
syncA number that indicates how many events can trigger the indexer to sync events. This setting is global, not per index.
syncMetaA Boolean that indicates whether to call a sync operation before the file descriptor is closed on metadata file updates.
thawedPathA string that contains the file path to the thawed (resurrected) databases for the index.
thawedPath_expandedA string that contains the absolute file path to the thawed (resurrected) databases for the index.
throttleCheckPeriodA number that indicates how frequently Splunk checks for index throttling condition, in seconds.
totalEventCountA number that indicates the total number of events in the index.