How to run searches and jobs using the Splunk SDK for Ruby

The Splunk SDK for Ruby is deprecated. For more information, see Deprecation notice.

Searches run in different modes, determining when and how you can retrieve results:

  • Normal: A normal search runs asynchronously. It returns a search job immediately. Poll the job to determine its status. You can retrieve the results when the search has finished. You can also preview the results if "preview" is enabled. Normal mode works with real-time searches.
  • Blocking: A blocking search runs synchronously. It does not return a search job until the search has finished, so there is no need to poll for status. Blocking mode doesn't work with real-time searches.
  • Oneshot: A oneshot search is a blocking search that is scheduled to run immediately. Instead of returning a search job, this mode returns the results of the search once completed. Because this is a blocking search, the results are not available until the search has finished.
  • Export: An export search is another type of search operation that runs immediately, does not create a job for the search, and starts streaming results immediately. Export mode works with real-time searches.

For those searches that produce search jobs (normal and blocking), the search results are saved for a period of time on the server and can be retrieved on request. For those searches that stream the results (oneshot and export), the search results are not retained on the server. If the stream is interrupted for any reason, the results are not recoverable without running the search again.

 

Search job APIs

The classes for working with jobs are:

You can retrieve the Jobs collection via the Splunk::Service#jobs method. Retrieve a collection, and from there you can access individual items in the collection and create new ones.

 

Code examples

This section provides examples of how to use the search APIs, assuming you first connect to a Splunk® instance.

The following parameters are available for search jobs:

 

Blocking searches

A blocking search runs synchronously. It does not return a search job until the search has finished, so you don't need to poll it for status. There are two basic types of blocking searches: oneshot searches and export searches.

 
Performing oneshot searches

The simplest way to get data out of Splunk is with a oneshot search, which creates a synchronous search. Calling the Splunk::Service#create_oneshot method blocks until the search finishes and then returns a stream containing the events.

stream = service.create_oneshot("search index=_internal | head 1")

By default, the stream contains XML, which you can parse into proper events with the ResultsReader class. You can call the fields method on a ResultsReader object to get an array of strings that contains the names of all the fields that may appear in any of the events. To iterate over the results, call the each method on the ResultsReader.

results = Splunk::ResultsReader.new(stream)

puts "Fields: #{results.fields}"
results.each do |result|
  puts "#{result["_raw"]}"
end
puts

You can also indicate to the create_oneshot method to return JSON or CSV by specifying the :output_mode argument to be json or csv, respectively, but the Splunk SDK for Ruby provides no parsing support for either of these formats beyond what is already available in Ruby.

stream = service.create_oneshot("search index=_internal | head 1",
                                :output_mode => "json")
puts stream

Hash arguments are how you set various parameters to the search, such as :output_mode (in the previous example), :earliest_time, and :latest_time (both in the following example). For the full list of possible parameters, see Search job parameters. Be aware, however, that most of these parameters don't apply to a oneshot search.

stream = service.create_oneshot("search index=_internal | head 1",
                                :earliest_time => "-1h",
                                :latest_time => "now")
results = Splunk::ResultsReader.new(stream)
results.each do |result|
  puts "#{result["_raw"]}"
end
 
Performing export searches

If you only need the events Splunk has returned, without any of the transforming search commands, perform an export search. Export searches run immediately, do not create jobs for the search, and start streaming results immediately.

To perform an an export search, call the Splunk::Service#create_export method. It is identical to the Splunk::Service#create_oneshot method, but it returns the events produced before any transforming search commands, and will therefore run somewhat faster. Be aware that this will skip any previews that the export returns.

stream = service.create_export("search index=_internal | head 1",
                               :earliest_time => "-1h",
                               :latest_time => "now")
readers = Splunk::MultiResultsReader.new(stream)
readers.each do |reader|
   reader.each do |result|
     puts result["_raw"]
   end
end
 

Normal searches

A normal search runs asynchronously and returns a search job immediately. You can poll the job to determine its status and retrieve the results when the search has finished. You can also preview the results if "preview" is enabled. Normal mode works with real-time searches.

Note: The examples included in this section are duplicated from the file "4_asynchronous_searches.rb" in the /examples folder of the Splunk SDK for Ruby.
 
Performing normal searches

For longer running jobs, you probably don't want to wait until the job finishes, as the create_oneshot method (discussed in Performing oneshot searches) does. In this case, use the Service class' Splunk::Service#create_search method. Instead of returning a stream, it creates an asynchronous job on the server and returns a Job object referencing it.

job = service.create_search("search index=_internal | head 1",
                            :earliest_time => "-1d",
                            :latest_time => "now")
 
Polling for completion

Before you can do anything with a job, including reading its state, you must wait for it to be ready. You can check whether the job is done using the Splunk::Job#is_ready? method.

while !job.is_ready?()
  sleep(0.1)
end

You will most likely just want to wait until the job is done and its events are ready to retrieve. For that, use the Splunk::Job#is_done? method instead. Be aware that a job is always ready before it's done.

while !job.is_done?()
  sleep(0.1)
end
 
Specifying results formatting

If you want the transformed results (that is, XML for parsing with a ResultsReader object; equivalent to what the create_oneshot method would return), call the Splunk::Job#results method on the job. If you want the untransformed results, call Splunk::Job#events. You can optionally pass an :offset or :count parameter, both of which are useful to get manageable sections of large result sets.

stream = job.results(:count => 1, :offset => 0)
# Or: stream = job.events(:count => 3, :offset => 0)
results = Splunk::ResultsReader.new(stream)
results.each do |result|
  puts result["_raw"]
end
 
Performing real-time searches

Real-time searches are asynchronous and are never finished, so neither the Splunk::Job#results or Splunk::Job#events methods will work. Instead, you must call Splunk::Job#preview (which takes the same arguments as results and events).

rt_job = service.create_search("search index=_internal | head 1",
                               :earliest_time => "rt-1h",
                               :latest_time => "rt")

while !rt_job.is_ready?()
  sleep(0.1)
end

stream = rt_job.preview()
results = Splunk::ResultsReader.new(stream)
results.each do |result|
  puts result["_raw"]
end
 

Collection parameters

By default, all entities are returned when you retrieve a collection. Using the parameters below, you can specify the number of entities to return and how to sort them. These parameters are available whenever you retrieve a collection.

Parameter

Description

count A number that indicates the maximum number of entities to return.
offset A number that specifies the index of the first entity to return.
search A string that specifies a search expression to filter the response with, matching field values against the search expression. For example, "search=foo" matches any object that has "foo" as a substring in a field, and "search=field_name%3Dfield_value" restricts the match to a single field.
sort_dir An enum value that specifies how to sort entities. Valid values are "asc" (ascending order) and "desc" (descending order).
sort_key A string that specifies the field to sort by.
sort_mode An enum value that specifies how to sort entities. Valid values are "auto", "alpha" (alphabetically), "alpha_case" (alphabetically, case sensitive), or "num" (numerically).
 

Search job parameters

Properties to set

The parameters you can use for search jobs correspond to the parameters for the search/jobs endpoint in the REST API.

This list summarizes the properties you can set for a search job (click here for properties you can retrieve). For examples of setting these properties, see Running a blocking search and displaying properties of the job and Running a normal search and polling for completion.

Parameter

Description

search Required. A string that contains the search query.
auto_cancel The number of seconds of inactivity after which to automatically cancel a job. 0 means never auto-cancel.
auto_finalize_ec The number of events to process after which to auto-finalize the search. 0 means no limit.
auto_pause The number of seconds of inactivity after which to automatically pause a job. 0 means never auto-pause.
earliest_time A time string that specifies the earliest time in the time range to search. The time string can be a UTC time (with fractional seconds), a relative time specifier (to now), or a formatted time string. For a real-time search, specify "rt".
enable_lookups A Boolean that indicates whether to apply lookups to events.
exec_mode An enum value that indicates the search mode ("blocking", "oneshot", or "normal").
force_bundle_replication A Boolean that indicates whether this search should cause (and wait depending on the value of "sync_bundle_replication") bundle synchronization with all search peers.
id A string that contains a search ID. If unspecified, a random ID is generated.
index_earliest A string that specifies the time for the earliest (inclusive) time bounds for the search, based on the index time bounds. The time string can be a UTC time (with fractional seconds), a relative time specifier (to now), or a formatted time string.
index_latest A string that specifies the time for the latest (inclusive) time bounds for the search, based on the index time bounds. The time string can be a UTC time (with fractional seconds), a relative time specifier (to now), or a formatted time string.
latest_time A time string that specifies the latest time in the time range to search. The time string can be a UTC time (with fractional seconds), a relative time specifier (to now), or a formatted time string. For a real-time search, specify "rt".
max_count The number of events that can be accessible in any given status bucket.
max_time The number of seconds to run this search before finalizing. Specify 0 to never finalize.
namespace A string that contains the application namespace in which to restrict searches.
now A time string that sets the absolute time used for any relative time specifier in the search.
reduce_freq The number of seconds (frequency) to run the MapReduce reduce phase on accumulated map values.
reload_macros A Boolean that indicates whether to reload macro definitions from the macros.conf configuration file.
remote_server_list A string that contains a comma-separated list of (possibly wildcarded) servers from which to pull raw events. This same server list is used in subsearches.
rf A string that adds one or more required fields to the search.
rt_blocking A Boolean that indicates whether the indexer blocks if the queue for this search is full. For real-time searches.
rt_indexfilter A Boolean that indicates whether the indexer pre-filters events. For real-time searches.
rt_maxblocksecs The number of seconds indicating the maximum time to block. 0 means no limit. For real-time searches with "rt_blocking" set to "true".
rt_queue_size The number indicating the queue size (in events) that the indexer should use for this search. For real-time searches.
search_listener A string that registers a search state listener with the search. Use the format: search_state;results_condition;http_method;uri;
search_mode An enum value that indicates the search mode ("normal" or "realtime"). If set to "realtime", searches live data. A real-time search is also specified by setting "earliest_time" and "latest_time" parameters to "rt", even if the search_mode is normal or is not set.
spawn_process A Boolean that indicates whether to run the search in a separate spawned process. Searches against indexes must run in a separate process.
status_buckets The maximum number of status buckets to generate, which corresponds to the size of the data structure used to store timeline information. A value of 0 means to not generate timeline information.
sync_bundle_replication A Boolean that indicates whether this search should wait for bundle replication to complete.
time_format A string that specifies the format to use to convert a formatted time string from {start,end}_time into UTC seconds.
timeout The number of seconds to keep this search after processing has stopped.
 

Properties to retrieve

This list summarizes the properties that are available for an existing search job:

Property

Description

cursorTime The earliest time from which no events are later scanned.
delegate For saved searches, specifies jobs that were started by the user.
diskUsage The total amount of disk space used, in bytes.
dispatchState The state of the search. Can be any of QUEUED, PARSING, RUNNING, PAUSED, FINALIZING, FAILED, DONE.
doneProgress A number between 0 and 1.0 that indicates the approximate progress of the search.
dropCount For real-time searches, the number of possible events that were dropped due to the "rt_queue_size".
eai:acl The access control list for this job.
eventAvailableCount The number of events that are available for export.
eventCount The number of events returned by the search.
eventFieldCount The number of fields found in the search results.
eventIsStreaming A Boolean that indicates whether the events of this search are being streamed.
eventIsTruncated A Boolean that indicates whether events of the search have not been stored.
eventSearch Subset of the entire search before any transforming commands.
eventSorting A Boolean that indicates whether the events of this search are sorted, and in which order ("asc" for ascending, "desc" for descending, and "none" for not sorted).
isDone A Boolean that indicates whether the search has finished.
isFailed A Boolean that indicates whether there was a fatal error executing the search (for example, if the search string syntax was invalid).
isFinalized A Boolean that indicates whether the search was finalized (stopped before completion).
isPaused A Boolean that indicates whether the search has been paused.
isPreviewEnabled A Boolean that indicates whether previews are enabled.
isRealTimeSearch A Boolean that indicates whether the search is a real time search.
isRemoteTimeline A Boolean that indicates whether the remote timeline feature is enabled.
isSaved A Boolean that indicates whether the search is saved indefinitely.
isSavedSearch A Boolean that indicates whether this is a saved search run using the scheduler.
isZombie A Boolean that indicates whether the process running the search is dead, but with the search not finished.
keywords All positive keywords used by this search. A positive keyword is a keyword that is not in a NOT clause.
label A custom name created for this search.
messages Errors and debug messages.
numPreviews Number of previews that have been generated so far for this search job.
performance A representation of the execution costs.
priority An integer between 0-10 that indicates the search's priority.
remoteSearch The search string that is sent to every search peer.
reportSearch If reporting commands are used, the reporting search.
request GET arguments that the search sends to splunkd.
resultCount The total number of results returned by the search, after any transforming commands have been applied (such as stats or top).
resultIsStreaming A Boolean that indicates whether the final results of the search are available using streaming (for example, no transforming operations).
resultPreviewCount The number of result rows in the latest preview results.
runDuration A number specifying the time, in seconds, that the search took to complete.
scanCount The number of events that are scanned or read off disk.
searchEarliestTime The earliest time for a search, as specified in the search command rather than the "earliestTime" parameter. It does not snap to the indexed data time bounds for all-time searches (as "earliestTime" and "latestTime" do).
searchLatestTime The latest time for a search, as specified in the search command rather than the "latestTime" parameter. It does not snap to the indexed data time bounds for all-time searches (as "earliestTime" and "latestTime" do).
searchProviders A list of all the search peers that were contacted.
sid The search ID number.
ttl The time to live, or time before the search job expires after it has finished.