High volume HTTP Event Collector data collection using distributed deployment

HTTP Event Collector (HEC) can scale to consume, distribute, and index very large quantities of data by taking advantage of your distributed deployment. Unlike in a typical distributed deployment, local forwarders are not necessary for collecting HEC event data. Splunk Enterprise can accept data from your data sources through HTTP Event Collector, and then distribute that data to indexers.

You can use a deployment server to distribute HTTP Event Collector configuration information to the rest of your deployment. This configuration information can include a custom HTTP Event Collector port number, a preferred protocol (HTTP or HTTPS), SSL settings, and HTTP Event Collector tokens.

Note: You should be familiar with distributed Splunk Enterprise deployment before proceeding. For more information about distributed deployment, see Distributed Splunk Enterprise overview in the Distributed Deployment Manual, and Components of a Splunk Enterprise deployment in the Capacity Planning Manual. For more information about deployment server, see About deployment server and forwarder management in the Updating Splunk Enterprise Instances manual.

This topic contains the following sections:

Distributed deployment scenarios

In this section we talk about three common distributed deployment scenarios for accepting and indexing large quantities of event data. There are many more possible scenarios, of course, but these should give you a starting point to use when planning a Splunk Enterprise deployment that will be ingesting large quantities of data using HTTP Event Collector.

The scenarios are listed in order of capacity, from lowest to highest:

In all distributed deployment configurations, HTTP Event Collector receives events. Then, depending on which configuration is chosen, Splunk Enterprise will either index the events locally or forward them to a pool of indexers.

Scenario 1: One HEC server, pool of indexers

A typical HTTP Event Collector distributed deployment might look like this:

In this scenario, event data is sent by clients to HTTP Event Collector running on a single Splunk Enterprise instance acting as a forwarder. This instance distributes the event data evenly to indexers. You can specify groups of indexers to which to send data by configuring an output group. Once the data is indexed, you can search it using a single search head or using distributed search.

Note: Deploying HTTP Event Collector on a forwarder requires the use of a full Splunk Enterprise install configured as a forwarder—that is, a heavy forwarder. HTTP Event Collector is not supported on universal forwarders.

Because this scenario only involves one instance of Splunk Enterprise using HTTP Event Collector, a deployment server is not necessary to distribute configuration and settings.

This scenario is sufficient for providing reliability when the HTTP Event Collector data ingestion volume is not high—for instance, when adding HEC data collection to an existing distributed Splunk Enterprise deployment. For deployments that will be accepting larger quantities of HEC data, see the next section.

Scenario 2: Traffic load balancer, no forwarder, pool of indexers, using deployment server

If one HEC data input endpoint is inadequate for fielding the number of HTTP requests that are being sent to it, you might instead implement something like this:

This is called traffic load balancing. In this scenario, there are so many clients making HTTP requests that a single HTTP Event Collector endpoint would be overwhelmed. To compensate for this, set up a network traffic load balancer such as NGINX in front of several Splunk Enterprise indexers. This will cause the traffic from clients to be distributed among several HTTP Event Collector endpoints. These Splunk Enterprise instances then index the HEC data. Once the data is indexed, you can search it using a single search head or using distributed search. This scenario relies on distributing configuration to the indexers using deployment server. Each indexer is a deployment client. Tokens are managed centrally on the Splunk Enterprise instance running deployment server, using the UI, CLI, or REST API. Any config changes are then made available to the deployment clients.

The advantage of this scenario is increased data volume capacity and high availability. By indexing on the same Splunk Enterprise instances that are collecting HTTP Event Collector data, you don't need a separate tier of forwarders, and you thereby lessen complexity. However, if the volume of incoming data gets high enough, it could potentially overwhelm the data I/O, which could impact data ingestion performance. If you find that you need more performance or control in how you scale out, consider the next scenario.

Note: For more information about how to set up an NGINX load balancer for use with HTTP Event Collector, see Configure an NGINX load balancer for HTTP Event Collector.

Scenario 3: Traffic load balancer, multiple HEC instances running on forwarders, each forwarding to one or more indexers, using deployment server

The third scenario is for the highest throughput and volume of data:

In this scenario, we also use traffic load balancing, but instead of routing the data to Event Collector instances running on indexers, the HEC runs on heavy forwarders. The forwarders distribute the data to dedicated indexers or groups of indexers. As with the previous scenario, you use deployment server to distribute configuration to the forwarders, which are the deployment clients.

Note: Deploying HTTP Event Collector on a forwarder requires the use of a full Splunk Enterprise install configured as a forwarder—that is, a heavy forwarder. HTTP Event Collector is not supported on universal forwarders.

The advantages of this scenario are maximum throughput, scale, and availability. You have a dedicated pool of HTTP Event Collector instances whose only job is to receive and forward data. You can add more HEC instances without necessarily having to add more indexers. If your indexer becomes a bottleneck, add additional indexers.

Though you've increased reliability in several different places in this architecture, the tradeoff is that there are moving parts and increased complexity.

Specifying groups of indexers

To index large amounts of data, you will likely need multiple indexers. You can specify groups of indexers to handle indexing your HTTP Event Collector data. These are called output groups. You can use output groups to, for example, index only certain kinds of data or data from certain sources. Though using output groups to route data to specific indexers is similar to the routing and filtering capabilities built into Splunk Enterprise, output groups allow you to specify groups of indexers on a token-by-token basis.

When you configure output groups with multiple indexers, Splunk Enterprise evenly distributes data among the servers in your output group.

You configure output groups in the outputs.conf file. Specifically, for HTTP Event Collector, edit the outputs.conf file at $SPLUNK_HOME/etc/apps/splunk_httpinput/local/ (%SPLUNK_HOME%\etc\apps\splunk_httpinput\local\ on Microsoft Windows hosts). If either the local directory or the outputs.conf file doesn't exist at this location, create it (or both).

Note: HTTP Event Collector is not an app, but it stores its configuration in the $SPLUNK_HOME/etc/apps/splunk_httpinput/ directory (%SPLUNK_HOME%\etc\apps\splunk_httpinput\ on Windows) so that its configuration can be easily deployed using built-in app deployment capabilities.

Setting up distributed deployment of HTTP Event Collector data

If you need to use multiple HTTP Event Collector endpoints, such as in the second and third scenarios above, you'll need to set up a distributed deployment that uses deployment server.

Setting up a distributed deployment is covered elsewhere in Splunk documentation, but here you'll find information specific to HTTP Event Collector and HEC token management. It's important to remember that using HTTP Event Collector and distributing its configuration in a distributed deployment uses the standard, built-in deployment server mechanism. If you're familiar with distributed Splunk Enterprise, you already have the tools you need to set up distributed HTTP Event Collector.

To set up a distributed deployment of HTTP Event Collector, do the following:

  1. Plan the deployment. Decide which Splunk Enterprise instances will be used as deployment clients, and which instance will be the deployment server. Similarly, designate a Splunk Enterprise instance as a load balancer. For help with doing this, see Plan a deployment in the Updating Splunk Enterprise Instances manual.

  2. Define a server class. A server class is a group of deployment clients that you can manage as a single unit. Assign the deployment clients you want to use in your HTTP Event Collector deployment to a common server class. Later, when you distribute HTTP Event Collector settings to the deployment clients, only members of that server class will receive the configuration settings. Edit the serverclass.conf file on the deployment server, at $SPLUNK_HOME/etc/system/local/serverclass.conf (%SPLUNK_HOME%\etc\system\local\serverclass.conf on Windows hosts). If serverclass.conf doesn't exist in local, copy it from $SPLUNK_HOME/etc/system/default/ (%SPLUNK_HOME%\etc\system\default\ on Windows) and then edit the copied file. Do not directly edit the serverclass.conf file in the default directory. For information about defining server classes, see Use serverclass.conf to define server classes in the Updating Splunk Enterprise Instances manual. For an example serverclass.conf file set up for HEC, see the "Example serverclass.conf file" section.

  3. Copy the entire current $SPLUNK_HOME/etc/apps/splunk_httpinput/ directory into $SPLUNK_HOME/etc/deployment-apps/. (On Windows, copy the entire current %SPLUNK_HOME%\etc\apps\splunk_httpinput\ directory into %SPLUNK_HOME%\etc\deployment-apps\.) This is a one-time step that is necessary on the deployment server.

  4. Set options. On the deployment server, set options for your deployment clients. At the very least, you must set the useDeploymentServer option globally (in the [http] stanza) in $SPLUNK_HOME/etc/apps/splunk_httpinput/local/inputs.conf (%SPLUNK_HOME%\etc\apps\splunk_httpinput\local\inputs.conf on Windows hosts). Setting this option causes Splunk Enterprise to use the $SPLUNK_HOME/etc/apps/splunk_httpinput/ directory (%SPLUNK_HOME%\etc\apps\splunk_httpinput\ on Windows) for storing and retrieving configuration. For more information on the available settings, see Configure HTTP Event Collector using .conf files.

  5. Enable deployment server. At the command line of the deployment server, execute the following to enable deployment server and restart Splunk Enterprise:

    splunk enable deploy-server
    splunk restart
  6. Prepare deployment clients. On each client, you must specify the deployment server it will connect to. Run the following at the command line on each client, where <deployment_server> indicates the hostname of the deployment server, to specify the deployment server and restart Splunk Enterprise:

    splunk set deploy-poll <deployment server>:8089
    splunk restart
    For more information about configuring deployment clients, see Configure deployment clients in the Updating Splunk Enterprise Instances Manual.

Once the deployment server is enabled and HTTP Event Collector is properly configured, all changes to HEC settings that are made on the deployment server using the UI, the CLI, or REST API are sent to the deployment clients. This configuration information includes:

  • HTTP Event Collector default values (port, SSL, source type, index)
  • SSL settings
  • HTTP Event Collector tokens

For more information about distributed deployment, including advanced configuration options and general examples, see the Updating Splunk Instances Manual.

Example serverclass.conf file

A server class is a group of deployment clients that you can manage as a single unit. You assign the deployment clients you want to use in your HTTP Event Collector deployment to one common server class. Later, when you distribute HTTP Event Collector settings to the deployment clients, only members of that server class will receive the configuration settings.

You define server classes in the serverclass.conf file. Edit the serverclass.conf file on the deployment server, at $SPLUNK_HOME/etc/system/local/serverclass.conf (%SPLUNK_HOME%\etc\system\local\serverclass.conf on Windows). If serverclass.conf doesn't exist in local, copy it from $SPLUNK_HOME/etc/system/default/ (%SPLUNK_HOME%\etc\system\default\ on Windows) and then edit the copied file. Do not directly edit the serverclass.conf file in the default directory.

For information about defining server classes, see Use serverclass.conf to define server classes in the Updating Splunk Enterprise Instances manual.

The following example serverclass.conf file defines a server class "FWD2Local" for HTTP Event Collector.

[global]
whitelist.0=*
restartSplunkd=true
stateOnClient = enabled
 
[serverClass:FWD2Local]
whitelist.0=*
[serverClass:FWD2Local:app:splunk_httpinput]

The [global] stanza level defines settings that apply to all server classes. The [serverClass:<serverClassName>] stanza level defines settings that apply to an individual server class. You can have multiple server class stanzas. The [serverClass:<serverClassName>:app:<appName>] stanza level defines settings that apply to a specific app (<appName>) within an individual server class (<serverClassName>). For the purposes of deploying HTTP Event Collector settings, you can think of HEC as an app called "splunk_httpinput."

Within the stanzas, you can set client filtering attributes and several non-filtering attributes. In the above example, we've set the following attributes:

  • whitelist.0=* This is the whitelist client filter. Setting whitelist.0 to * indicates that all deployment clients match the server class.
  • restartSplunkd=true This non-filtering attribute specifies whether the client's splunkd process will restart after receiving an update.
  • stateOnClient = enabled This non-filtering attribute specifies whether the deployment client receiving an app should enable or disable the the app once it is installed. You can set stateOnClient to enabled, disabled, or noop.

For more information about available client filtering attributes, see the section "Define filters through serverclass.conf" in the topic Set up client filters in the Updating Splunk Enterprise Instances Manual.

To learn more about available non-filtering attributes, see the section "What you can configure for a server class" in the Use serverclass.conf to define server classes topic in the Updating Splunk Enterprise Instances Manual.

Factors that impact performance

If you're experiencing performance slowdowns, or are just interested in speeding up your HTTP Event Collector deployment, consider the following factors that can affect performance:

  • HTTP vs. HTTPS: There is a significant performance improvement when sending data over HTTP versus sending data over HTTPS.
  • Batching: If you batch multiple events into single requests, it can speed up data transmission. Because a request's metadata applies to all events in the request, less data is sent overall. For more information about how event data is packaged, see Format events for HTTP Event Collector.
  • HTTP Keep-alive: Setting keepalive on your connection can increase performance. As long as the client sending the data supports HTTP 1.1 and is set up to support a persistent connection, you're taking advantage of keep-alive.
  • Persistent queues: Persistent queuing slows down performance by storing data in an input queue to disk. For more information, see Use persistent queues to help prevent data loss.
  • Use index-time field extraction: For more information about index-time field extraction, see Create custom fields at index time.