Indexed field extractions

Note: The indexed field extractions feature in HTTP Event Collector is available in Splunk Enterprise 6.5.0 and later, Splunk Light 6.5.0 and later, and the current releases of both Splunk Cloud and Splunk Light Cloud.

When Splunk software indexes data, it parses the data stream into a series of events. As part of this process, it adds a number of fields to the event data. These fields include default fields that it adds automatically and any custom fields that you specify. The process of adding fields to events is known as field extraction. There are two types of field extraction, search-time field extraction and indexed field extraction. Indexed fields are incorporated into the index at index time and become part of the event data.

Previously, setting up custom fields created at index time required significant configuration steps, as described in Create custom fields at index time, that involve editing the props.conf, transforms.conf, and fields.conf files to add regex extractions. Now, you can use HTTP Event Collector to automate this process, using built-in support for indexed field extractions.

Note: Indexed field extraction doesn't work with data sent to the raw endpoint.

This topic has the following sections:

Form HEC requests to trigger indexed field extractions

You can trigger indexed extractions of JSON fields in two ways--as part of the main "event" data or separate from the "event" data but still associated with the event.

Use nested JSON inside the "event" property

Assign the "event" property (at the top level of the JSON being sent to HEC) to a JSON object that contains the custom fields to indexed, as key-value pairs. For example, the following "event" property, from within an HTTP request sent to the Splunk server, specifies two custom fields—"club" and "wins":

"event": {"club":"glee", "wins",["regionals","nationals"]}

Notice that the "wins" property has been set to a multi-value JSON array. The "wins" field will be assigned both the values in the array.

At the same level as the "event" property, you must also include a "sourcetype" property, and set it to a sourcetype that has indexed extraction enabled. You can use any sourcetype that has INDEXED_EXTRACTION set to JSON in the props.conf file, including built-in sourcetypes such as _json. For example:

"sourcetype":"_json"

Following is an example cURL command that sends an event to HEC on a Splunk server. In this case, the event data contains two custom fields that will be extracted at index time:

# Extracting JSON fields
curl -k https://mysplunkserver.example.com:8088/services/collector -H "Authorization: Splunk 12345678-1234-1234-1234-1234567890AB" -d '{"sourcetype": "_json", "event": {"club":"glee", "wins",["regionals","nationals"]}}'

Add a "fields" property at the top JSON level

Include the "fields" property at the top level of the JSON being sent to HEC—that is, at the same level as the "event" property. This specifies explicit custom fields that are separate from the main "event" data. This method is useful if you don't want to include the custom fields with the event data, but you want to be able to annotate the data with some extra information, such as where it came from. Using this method is also typically faster than the nested JSON method.

Be aware that you must send HEC requests containing the "fields" property to the /collector/event endpoint. Otherwise, they will not be indexed.

Assign the "fields" property to a JSON object that contains the custom fields to be indexed, as key-value pairs. For example, the following "fields" property, from within an HTTP request sent to the Splunk server, specifies two custom fields—"club" and "wins":

"fields": {"club":"glee", "wins",["regionals","nationals"]}

Notice that the "wins" property has been set to a multi-value JSON array. The "wins" field will be assigned both the values in the array.

At the same level as the "event" and "fields" properties, you must also include a "sourcetype" property, and set it to a sourcetype that has indexed extraction enabled. You can use any sourcetype that has INDEXED_EXTRACTION set to JSON in the props.conf file, including built-in sourcetypes such as _json. For example:

"sourcetype":"_json"

Following is an example cURL command that sends an event to HEC on a Splunk server. In this case, the event data contains two custom fields that will be extracted at index time:

# Explicit JSON fields
curl -k https://mysplunkserver.example.com:8088/services/collector/event -H "Authorization: Splunk 12345678-1234-1234-1234-1234567890AB" -d '{"event": "Hello, McKinley High!", "sourcetype": "_json", "fields": {"club":"glee", "wins",["regionals","nationals"]}}'
Note: As of this writing, only strings can be used as field values. Support for numerics is planned for a future Splunk software release.

Search for index-extracted fields

Once the data is indexed, you can search for this event using indexed extraction ("double-colon") notation, as shown here:

sourcetype=_json club::glee

For more information about using extracted fields to retrieve events, see Use fields to retrieve events in the Splunk Enterprise Search Manual.