What's new in HTTP Event Collector

Splunk has made enhancements to HTTP Event Collector (HEC) in recent Splunk software releases.

In the current releases of Splunk Cloud and Splunk Light Cloud, the following feature is new:

In Splunk Enterprise 6.5.0 and later, Splunk Light 6.5.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud, the following feature is new:

In Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud, the following features are new:

Specify a token as a query string

Available in the latest release of Splunk Cloud and Splunk Light Cloud. Support is planned for future versions of Splunk Enterprise and Splunk Light.

You can specify a HEC token as a query string in the URL that you specify in your queries to HEC. This represents another way to authenticate to your Splunk server, and is particularly useful if you are unable or unwilling to use either HTTP Authentication or Basic authentication.

There are two steps involved in specifying a token as a query string. First, set the allowQueryStringAuth setting to true. Then, add the token to the request URL as a query string.

Set allowQueryStringAuth to true

You must enable query string authentication on a per-token basis. On your Splunk server, edit the file at $SPLUNK_HOME/etc/apps/splunk_httpinput/local/inputs.conf. Your tokens are listed by name in this file, in the form [http://<token_name>].

Within the stanza for each token you want to enable query string authentication, add the following setting (or change the existing setting, if applicable):

allowQueryStringAuth = true

Save and close the inputs.conf file. 

Note: For Splunk Cloud, you must open a Splunk Support ticket to set allowQueryStringAuth to true. Support for a UI toggle for this setting is planned for a future release.

Add the token as a query string

To include the HEC token in a query string, append a query string to the end of the Splunk server URL with the following form, where <hec_token> represents a HEC token:

?token=<hec_token>

Consider the following basic example of a query to the HEC endpoint on a Splunk server. This example uses an HTTP Authentication authorization header that includes the HEC token to authenticate with the Splunk server:

curl -k https://mysplunkserver.example.com:8088/services/collector -H "Authorization: Splunk 12345678-1234-1234-1234-1234567890AB" -d '{"event": "Hello, world!", "sourcetype": "manual"}' 

The following example is exactly equivalent to the previous one, except that it includes the HEC token in a query string:

curl -k https://mysplunkserver.example.com:8088/services/collector?token=12345678-1234-1234-1234-1234567890AB -d '{"event": "Hello, world!", "sourcetype": "manual"}' 

Indexed field extractions

Available in Splunk Enterprise 6.5.0 and later, Splunk Light 6.5.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud.

New in Splunk software is support for indexed extractions of JavaScript Object Notation (JSON) fields to HTTP Event Collector. When Splunk software indexes data, it parses the data stream into a series of events. As part of this process, it adds a number of fields to the event data. These fields include default fields that it adds automatically and any custom fields that you specify. The process of adding fields to events is known as field extraction. There are two types of field extraction, search-time field extraction and indexed field extraction. Indexed fields are incorporated into the index at index time and become part of the event data.

Previously, setting up custom fields created at index time required significant configuration steps, as described in Create custom fields at index time, that involve editing the props.conf, transforms.conf, and fields.conf files to add regex extractions. Now, you can use HTTP Event Collector to automate this process.

Note: Indexed field extraction doesn't work with data sent to the raw endpoint.

You can trigger indexed extractions of JSON fields in two ways—as part of the main "event" data or separate from the "event" data but still associated with the event:

  • Use nested JSON inside the "event" property
  • Add a "fields" property at the top JSON level

Use nested JSON inside the "event" property

Assign the "event" property (at the top level of the JSON being sent to HEC) to a JSON object that contains the custom fields to indexed, as key-value pairs. For example, the following "event" property, from within an HTTP request sent to the Splunk server, specifies two custom fields—"club" and "wins":

"event": {"club":"glee", "wins",["regionals","nationals"]}

Notice that the "wins" property has been set to a multi-value JSON array. The "wins" field will be assigned both the values in the array.

At the same level as the "event" property, you must also include a "sourcetype" property, and set it to a sourcetype that has indexed extraction enabled. You can use any sourcetype that has INDEXED_EXTRACTION set to JSON in the props.conf file, including built-in sourcetypes such as _json. For example:

"sourcetype":"_json"

Following is an example cURL command that sends an event to HEC on a Splunk server. In this case, the event data contains two custom fields that will be extracted at index time:

# Extracting JSON fields
curl -k https://mysplunkserver.example.com:8088/services/collector -H "Authorization: Splunk 12345678-1234-1234-1234-1234567890AB" -d '{"sourcetype": "_json", "event": {"club":"glee", "wins",["regionals","nationals"]}}'

Add a "fields" property at the top JSON level

Include the "fields" property at the top level of the JSON being sent to HEC—that is, at the same level as the "event" property. This specifies explicit custom fields that are separate from the main "event" data. This method is useful if you don't want to include the custom fields with the event data, but you want to be able to annotate the data with some extra information, such as where it came from. Using this method is also typically faster than the nested JSON method.

Be aware that you must send HEC requests containing the "fields" property to the /collector/event endpoint. Otherwise, they will not be indexed.

Assign the "fields" property to a JSON object that contains the custom fields to indexed, as key-value pairs. For example, the following "fields" property, from within an HTTP request sent to the Splunk server, specifies two custom fields—"club" and "wins":

"fields": {"club":"glee", "wins",["regionals","nationals"]}

Notice that the "wins" property has been set to a multi-value JSON array. The "wins" field will be assigned both the values in the array.

At the same level as the "event" and "fields" properties, you must also include a "sourcetype" property, and set it to a sourcetype that has indexed extraction enabled. You can use any sourcetype that has INDEXED_EXTRACTION set to JSON in the props.conf file, including built-in sourcetypes such as _json. For example:

"sourcetype":"_json"

Following is an example cURL command that sends an event to HEC on a Splunk server. In this case, the event data contains two custom fields that will be extracted at index time:

# Explicit JSON fields
curl -k https://mysplunkserver.example.com:8088/services/collector/event -H "Authorization: Splunk 12345678-1234-1234-1234-1234567890AB" -d '{"event": "Hello, McKinley High!", "sourcetype": "_json", "fields": {"club":"glee", "wins",["regionals","nationals"]}}' 
Note: As of this writing, only strings can be used as field values. Support for numerics is planned for a future Splunk software release.

Basic authentication

Available in Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud.

Prior to Splunk Enterprise 6.4.0, there was only one way to authenticate with HTTP Event Collector: HTTP Authentication. Once you'd created a token, you would include the following in every request sent to HEC, where <token> indicates the token created on the main HEC instance:

-H 'Authorization: Splunk <token>'

In Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud, we've added support for Basic authentication. For clients who can't or aren't willing to send an HTTP Auth header with their requests, they can now alternatively include a colon-separated user/password pair in their request after -u, inserting the token as the <password>: "<user>:<password>".

For example, the following two cURL requests are identical in function:

# Using HTTP auth
curl -k -H "Authorization: Splunk 4D5F3C17-376E-4782-84D0-5776F4202B98" https://<hostname>:8088/services/collector/event -d '{"sourcetype": "mysourcetype", "event": "basic auth ftw!"}'
# Using Basic auth (6.5.0 and later only)
curl -k -u "x:4D5F3C17-376E-4782-84D0-5776F4202B98" https://<hostname>:8088/services/collector/event -d '{"sourcetype": "mysourcetype", "event": "basic auth ftw!"}'

"Raw" event parsing

Available in Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud.

New for HTTP Event Collector is support for consuming raw events.

Prior to Splunk Enterprise 6.4.0, events had to be formatted in a proprietary JSON event protocol before they were sent to HEC. While this meant that minimal configuration was required on the Splunk Enterprise side, it also assumed that the developer was in complete control of the format of the contents of each event. Therefore, the proprietary protocol was chiefly designed to be used by third-party applications that had the flexibility to alter their output on a granular level.

Now, in current versions of Splunk software, HTTP Event Collector can additionally parse raw text and extract one or more events. HEC expects that the HTTP request will contain one or more events with line-breaking rules in effect. Once HEC accepts the request, it passes its events into the pipeline, which extracts fields such as timestamps. HEC uses a line-breaking strategy that is based on the timestamp, but you can override it by setting a sourcetype in the props.conf file.

Events must be contained within a single HTTP request. They cannot span multiple requests.

To accomodate raw events, the following new endpoint is available:

For versions of Splunk Enterprise 6.6.0 and earlier, this endpoint requires an additional X-Splunk-Request-Channel header field, which must be set to a unique channel identifier (a GUID). A channel identifier must be included with each HTTP request that contains raw events. Following is an example of a cURL statement that constitutes a valid request:

curl https://http-inputs-<customer>.splunkcloud.com/services/collector/raw  -H "X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131" -H "Authorization: Splunk BD274822-96AA-4DA6-90EC-18940FB2414C" -d '<raw data string>' -v

Alternatively, the X-Splunk-Request-Channel header field can be sent as a URL query parameter, as shown here:

curl https://http-inputs-<customer>.splunkcloud.com/services/collector/raw?channel=FE0ECFAD-13D5-401B-847D-77833BD77131 -H "Authorization: Splunk BD274822-96AA-4DA6-90EC-18940FB2414C" -d '<raw data string>' -v

For versions of Splunk Enterprise 6.7.0 and later, omit the X-Splunk-Request-Channel header field.

Note: If the token with which you are authenticating to HTTP Event Collector has indexer acknowledgement enabled, you must also include the channel identifier with your indexer status query. For more information, see the following section, "Indexer acknowledgement," or Enable indexer acknowledgement.

With raw events, you can configure metadata at the global level (all tokens), at the token level, and at the request level using the query string. Metadata specified within a request will apply to all events that are extracted from the request.

Timestamp extraction rules are enabled at the sourcetype level to extract timestamps. Most common timestamp formats are recognized—for example, the "current-time" key—but if no timestamp is able to be extracted, one is assigned based on the current time. For other metadata, you can configure extraction rules in the props.conf file.

For more examples of cURL requests to services/collector/raw, see Input endpoint examples in the Splunk Enterprise REST API Reference Manual.

For more information about channels, see "About channels and sending data" in the Enable indexer acknowledgement topic.

Indexer acknowledgement

Available in Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of Splunk Cloud and Splunk Light Cloud.

Also new for HTTP Event Collector (HEC) in recent Splunk software releases is support for indexer acknowledgement—that is, acknowledgement from the indexer that the event has been indexed.

By default, when HEC receives an event successfully, it immediately sends an HTTP Status 200 to the sender. However, this only means that the event data appears valid, and is returned before the event data enters the processing pipeline. During processing, there are several places where, due to an outage or a system failure, events could be lost before they are indexed. While HEC has precautions in place to prevent data loss, it's impossible to completely prevent such an occurrence, especially in the event of a hardware crash.

To ensure that your events have been successfully indexed, you can now enable indexer acknowledgement on a per-token basis. Each time a request is sent to the HEC endpoint using a token with indexer acknowledgement enabled, the server returns an acknowledgement identifier to the sender. The sender can then query HEC using the identifier to verify whether the events sent in the request the correspond to the identifier have been indexed.

For information about how to enable indexer acknowledgement and how to query HEC to verify indexed status, see Enabling indexer acknowledgement.