Indexer acknowledgement

Note: The indexer acknowledgement feature in HTTP Event Collector is available in Splunk Enterprise 6.4.0 and later, Splunk Light 6.4.0 and later, and the current releases of both Splunk Cloud (self-service and trial) and Splunk Light Cloud. Indexer acknowledgement is not yet supported in managed Splunk Cloud.

HTTP Event Collector (HEC) supports indexer acknowledgement—that is, acknowledgement from the indexer that events have been indexed.

Note: While similar in purpose and identical in name, indexer acknowledgement in HEC is not the same as the indexer acknowledgement capability described in Protect against loss of in-flight data in the Splunk Enterprise Forwarding Data manual.

This topic contains the following sections:

Why indexer acknowledgement

By default, when HEC receives an event successfully, it immediately sends an HTTP Status 200 to the sender. However, this simply means that the event data appears valid, and the status message is sent before the event data enters the processing pipeline. During processing, there are several places where, due to an outage or a system failure, events could be lost before they are indexed. While HEC has precautions in place to prevent data loss, it's impossible to completely prevent such an occurrence, especially in the event of a hardware crash. This is where indexer acknowledgement comes in.

How indexer acknowledgement works

In current versions of Splunk software, you can enable indexer acknowledgement on a per-token basis. The indexer acknowledgement process is similar to a package tracking scenario:

A tracking number is issued upon shipment of a package, the package's status is updated for the tracking number once it's delivered, and then at your convenience you check whether the package arrived successfully by using the tracking number to retrieve the status.

The following diagram illustrates the indexer acknowledgement process in order from top to bottom. Each step is referred to by number in the paragraphs that follow:

Event Collector Global settings page

Each time a request is sent from a client to the HEC endpoint using a token with indexer acknowledgement enabled (1), the server returns an acknowledgement identifier to the client (2). The response body is simply a JSON object with the acknowledgement identifier, such as the following:

{"ackID":"2"}

The client can then query the Splunk server with the identifier to verify whether all the events sent in the request that corresponds to that identifier have been indexed (3). The query is sent to a special endpoint (/services/collector/ack), and contains JSON-formatted data like the following, where the only key, "acks", is set to an array of the ackIDs whose status you are querying:

{"acks":[0,1,2]}

Next, the server responds with the status information to the client (4). The body of the reply contains the status of each of the requests that were queried. A true status indicates that the event that corresponds to that ackID was replicated at the desired replication factor. A true status does not guarantee that the event was indexed, because the parsing pipeline might drop unparsable events. A false status indicates that there is no status information for that ackID, or that the corresponding event has not been indexed. The corresponding event might not have been indexed yet, the ackID might not have been found, or some other problem may have occurred. For example:

{"acks": {"0": true, "1": false, "2": true}}

Because a false status could indicate any number of issues, only query an ackID during the timeframe in which the request could reasonably be expected to be in transit.

Once a true status for an ackID has been retrieved, the server deletes that ackID's status information. If you query the same ackID again, the Splunk server will always return false for that ackID because its status information can no longer be found. For that reason, once you query an ackID and its status returns as true, avoid querying it again.

Enable indexer acknowledgement

There are a few ways to enable indexer acknowledgement:

Splunk Web

When you create a new HEC token in Splunk Web, select the checkbox on the first screen labeled Enable indexer acknowledgement. Then continue with the token creation process.

Screen shot of HTTP Event Collector new token Select Source page

inputs.conf

You can enable indexer acknowledgement for existing tokens by editing the HEC inputs.conf file.

  1. Open the inputs.conf file, which is at the following path:

    • In *nix: $SPLUNK_HOME/etc/apps/splunk_httpinput/local/inputs.conf
    • In Windows: %SPLUNK_HOME%\etc\apps\splunk_httpinput\local\inputs.conf
  2. Within the stanza that corresponds to the token for which you want to enable indexer acknowledgement, add the following line:

    useACK = true
    

  3. Save and close the file.

About channels and sending data

Sending events with indexer acknowledgement enabled is similar to sending them without the setting enabled. However, there is one crucial difference: specifying a channel.

The concept of a channel was introduced in HEC primarily to prevent a fast client from impeding the performance of a slow client. When you assign one channel per client, because channels are treated equally on the Splunk server, one client can't affect another.

You must include a matching channel identifier both when sending data to HEC in an HTTP request and when requesting acknowledgement that events contained in the request have been indexed. If you don't, you will receive the error message, "Data channel is missing." Each request that includes a token for which indexer acknowledgement has been enabled must include a channel identifier, as shown in the following example cURL statement for Splunk Cloud, where <customer> indicates the account-specific portion of a Splunk Cloud URL, and <data> represents the event data portion of the request:

curl https://http-inputs-<customer>.splunkcloud.com/services/collector -H "X-Splunk-Request-Channel: FE0ECFAD-13D5-401B-847D-77833BD77131" -H "Authorization: Splunk BD274822-96AA-4DA6-90EC-18940FB2414C" -d '<data>' -v

Alternatively, the X-Splunk-Request-Channel header field can be sent as a URL query parameter, as shown here:

curl https://http-inputs-<customer>.splunkcloud.com/services/collector?channel=FE0ECFAD-13D5-401B-847D-77833BD77131 -H "Authorization: Splunk BD274822-96AA-4DA6-90EC-18940FB2414C" -d '<data>' -v
Note: Indexer acknowledgement also works with raw JSON data. In that case, the endpoint to use in requests is /services/collector/raw. For more information, see Format event data.

Channels are designed so that you assign a unique channel to each client that sends data to HEC. Each channel has a channel identifier (ID), which must be a GUID but can be randomly generated. You assign channel IDs simply by including them in requests as shown in the examples above. When the Splunk server sees a new channel identifier, it creates a new channel.

Query for indexing status

Once you enable indexer acknowledgement for a token, every request sent to HEC using that token will return the following acknowledgement identifier (ackID) contained in a simple JSON object to the sender, where <int> represents a unique integer identifier that corresponds to the request:

{"ackID":"<int>"}

To verify that the indexer has indexed the event(s) contained in the request, query the following endpoint, where <host> and <port> represent the hostname and port number of your Splunk server, respectively:

https://<host>:<port>/services/collector/ack

The query must contain JSON-formatted data like the following, where the only key, "acks", is set to an array of the ackIDs whose status you are querying:

{"acks":[0,1,2]}

Following is an example cURL statement that queries the Splunk server for the indexing status of the events contained in the requests with the identifiers "0", "1", "2", and "3":

curl -k https://<host>:<port>/services/collector/ack?channel=FE0ECFAD-13D5-401B-847D-77833BD77131 -H "Authorization: Splunk 2EE7B1AE-8577-4FC2-BA31-5CA377266B22" -d "{"acks":[0,1,2,3]}"
Note: Both the data channel ID (?channel=FE0ECFAD-13D5-401B-847D-77833BD77131) and the auth header ("Authorization: Splunk BD274822-96AA-4DA6-90EC-18940FB2414C") are required in this query. For more information, see the previous section, "About channels and sending data."

The body of the reply contains the status of each of the request(s) for whose status you queried. The following example response indicates that the requests with the ackIDs "0" and "2" were successfully indexed, but the requests with the ackIDs "1" and "3" were not successfully indexed:

{"acks": {"0": true, "1": false, "2": true, "3": false}}

Channel limits and indexing status expiration

Acknowledgement IDs and their corresponding status information are cached in memory. To prevent the server from running out of memory, and to prevent malicious or misbehaved clients, several new limit settings have been introduced.

To prevent channels from being overloaded, and to prevent an excessive number of channels from being created, several new settings have been introduced to the [http_input] stanza in the limits.conf file:

Parameter Value type Default value Description
max_number_of_acked_requests_pending_query_per_ack_channel int 1000000 Specifies the maximum number of ackIDs and their corresponding status information that are waiting to be queried in each channel. If a client makes many requests with indexer acknowledgement enabled, this setting prevents the client's channel from becoming full of ackIDs and status information and the client from receiving a server busy error.
max_number_of_ack_channel int 1000000 Specifies the maximum number of channels that clients can acquire for this Splunk server instance. If a single client tries to acquire more than this number of channels, the request will fail with server busy error. This setting is used to prevent a client from acquiring too many channels.
max_number_of_acked_requests_pending_query int 10000000 Specifies the maximum number of ackIDs and their corresponding status information in all channels.

To prevent the likelihood of the limits being reached, the Splunk server can clean up channels that are idle for a period of time and release the memory for those channels. You do this using the following settings, which are set at the global ([http] stanza) level in the inputs.conf file:

  • In *nix: $SPLUNK_HOME/etc/apps/splunk_httpinput/local/inputs.conf
  • In Windows: %SPLUNK_HOME%\etc\apps\splunk_httpinput\local\inputs.conf
Parameter Value type Default value Description
ackIdleCleanup bool false When set to true, causes the server to remove channels that are idle for the number of seconds set in the maxIdleTime setting.
maxIdleTime int 60 Specifies the maximum number of seconds that channels can be idle before they are removed.

Indexer acknowledgement client behavior

This section provides best practice information about how to set up a client for indexer acknowledgement. Follow these guidelines to ensure that the client doesn't exhibit any malicious behavior or ends up hitting the limits described previously.

An indexer acknowledgement client should:

  • Create its own GUID to use as its channel identifier.
  • Send requests using only that channel.
  • Save each acknowledgement identifier (acklD) that is returned from requests to HEC.
  • Continually poll the /services/collector/ack endpoint at an interval (for instance, every 10 seconds) to ensure that acknowledgement status is retrieved in a timely manner. Because status information is deleted from the Splunk server after it is retrieved by clients, this releases memory on the server.
  • Resend any event data for which an acknowledgement hasn't been received within a certain amount of time (for instance, 5 minutes). It is safe to assume that, by that time, the event data has been lost. When you resend the event data, a good practice is to add some additional data in the resent event that indicates it may be duplicate data. It's possible the event was previously indexed but the status expired due to the cleanup of the channel, or the Splunk server may have been restarted, thus clearing the cache of statuses.