Download | Support
Splunk.com | SplunkBase | dev.splunk.com

Splunk Dev: Archive for the 'dev' Tab

April 29th, 2008

WMI comes to Splunk

The Windows release of Splunk Preview debuts with WMI. So, what is WMI for all you splunkheads out there? It’s an OS interface which allows “instrumented components to provide information and notification”. WMI gives you the ability to query system instrumentation data such as system performance, event logs, end countless other events that occur on the system. It also has the capability of doing this agent-less from remote machines. The most exciting feature is the ability to do collection of Windows event logs from other machines on your network simultaneously. A Splunk install is not required on every single node that generates this data, and you don’t need to do anything special to facilitate this. Assuming you’ve set up proper authentication between the machines, of course. Setting up proper WMI security is a hot topic on its own.

From the standpoint of configuration and what WMI is capable of doing, in the context of Splunk, WMI can be used in two ways: to pull event logs and to query instrumentation data. Assuming that you have enough credentials to poll event logs agentlessly, you can simply specify host name and the log file you are interested in. This is an example of retrieving “Application” event logs from a remote machine named “remotehost”:

[WMI:RemoteApplication]
namespace = \\remotehost\root\cimv2
interval = 10
event_log_file = Application
disabled = 0

The other aspect of WMI warrants more explanation. To get data from WMI providers, you query them using WQL (WMI query language), which is a subset of SQL. Simply specify a query, and all fields returned by the provider will be automatically collated as an event. (Some queries return multiple results, and hence generate multiple events.) An example query will be select FreeMegabytes from Win32_PerfFormattedData_PerfDisk_LogicalDisk, which will poll free disk space from all logical disk partitions on the

Read More...

April 28th, 2008

Splunk Windows Registry Monitor

Hey everyone, just wanted to let you know that a preview release of Splunk just left the docks.

http://www.splunk.com/index.php/preview

I want to introduce to you one the latest features for Windows Splunk - the monitoring of Windows registry in real time for activity/events, and the indexing and searching these events with Splunk.

While working on this we had a few challenges:

First, there aren’t any published win32 APIs that does this in user mode. The best that you can do with win32 API is to poll the registry for certain registry key/hives, and you’ll be notified when if the key or subkey of the hive has been changed. Even when you get a notification for a change, you will not be told which key exactly has changed, you’ll have to figure that out yourself .

Second, scalability. You can’t possibly poll all of the registry in user mode for changes. There are simply too many keys to query.

The solution is to write a device driver that hooks to the kernel and intercepts all registry events. The driver bubbles up the events to the user mode for filtering and tagging, and finally pipe them to Splunk for indexing. Obviously, this driver needs to be very stable and reliable, needs to scale to the point where if you want to monitor all of the events in the registry, and it should be able to handle the load.

With this preview release we launched the first version of the splunk-regmon tool. The tool writes events to standard output, and using Splunk’s ExecProcessor(popen). Splunk is able to get these events and send them through the indexing pipeline. A basic filtering is in place, hard coded for now to only monitor registry events related to changes - i.e. Create, Delete, Set, etc. Create type events

Read More...

March 27th, 2008

Splunk for Virtualization

I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

Thanks
e.

Read More...

March 26th, 2008

The Splunk Python client library (part 1)

Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.

The easiest way to get started with the client library is to get into Splunk’s Python environment. Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell that comes with Splunk:

# bin/splunk cmd python

This will launch the interactive Python prompt, which starts off looking like this:

Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Starting a search

Import the Splunk modules:

import splunk.auth
import splunk.search as se

If you have installed Splunk with the default settings, then your hostpath is https://localhost:8089. The client library knows this default, so you can authenticate directly by providing a username and password:

key = splunk.auth.getSessionKey('admin','changeme')

The getSessionKey method automatically caches the session key in the current interactive session, so you don’t have to pass it along to subsequent methods. In a production implementation, or if you are connecting to multiple servers, you’ll need to keep track of separate session keys.

If your server is on a different hostname or port, then you need to first update the session defaults:

splunk.mergeHostPath('splunk_hostname:12000', True)
key = splunk.auth.getSessionKey('admin','changeme')

The mergeHostPath method takes host information in many different forms:

  • hostname
  • hostname:port
  • https://hostname
  • http://hostname:port

Next, start a search:

job = se.dispatch('search error')

This creates a search job handle object job and start a running search on the server for events that contain the term “error”. If you are connecting to multiple servers, then you’ll also need to provide hostPath and sessionKey parameters as well. This handle is keyed off of the search job ID that is generated by the server, and is available via:

job.id

With this ID, you can always use

Read More...

March 6th, 2008

Using the Atom Feed Format in Enterprise Software

XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments. However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema, resulting in lots of custom code to interface with various systems. Lots of custom code leads to brittleness, and brittleness leads to frustration. The key to salvation lies in standardization.

Enter the Atom standard: a standards-track schema that defines a generic collection/item container format in XML. Most people equate Atom to an RSS competitor, which is true, but that only covers half of what it does. The Atom Publishing Protocol is a well-defined protocol for performing CRUD (Create, Read, Update, Delete) operations on items over HTTP. The Atom Syndication Format, which is the most commonly used portion, defines the XML schema used to deliver data during a Read operation. Atom was spearheaded by Sam Ruby, and is now back by people like Brad Fitzpatrick, Tim Bray, Jeremy Zawodny, Mark Pilgrim, and is heavily implemented by Google.

Like most software systems, the majority of Splunk’s internal entities can be loosely viewed as a collection of similar items. The requested searches, configuration information, saved searches, users, roles — all just collections. So instead of creating five separate XML schemas for each of these collections that perfectly describe their contents, I chose Atom to serve as a single generic container to describe all of the entities. This kind of reuse is echoed by Pat Helland of Amazon, who gives a great talk on relating the rise of the industrial age to standardization, and Tim Bray (Mr. XML himself), who advocates against creating your own XML unless absolutely necessary.

The benefit of sticking to a standard is that there is a much greater chance that external developers already know exactly

Read More...

March 6th, 2008

Splunk Replay: Search results in motion

Inspired by glTail.rb and Digg Lab’s Stack, Splunk Replay is an animated data visualization that “replays” search results as a simulated event stream. The simulation displays events at a rate proportional to the times at which the events originally occurred.

Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart. Upon landing in the column chart, one of the event’s fields is output in a readable format below the chart. Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones. Rolling your mouse over any column displays the field values for that column.

Read More...

March 3rd, 2008

Exploring Splunk’s REST API

Splunk 3.2 is available for download! This release is one of our biggest so far, representing a tremendous amount of effort by our engineering team, and is a product that I’m proud to stand behind. As I mentioned in my last post about our push for the Splunk Platform, a central tenet is to make a compelling product that developers will not only understand, but also enjoy using. While Dr. LogLogic rambles on about how catering to developers sucks, we know that developers are a huge part of our user base (drop by the #splunk channel on EFNet sometime) and we will continue to make Splunk as flexible and extensible as possible.

With 3.2, we have begun moving some of Splunk’s core services over to a proper REST API. Now, for those of you who have already been using the REST API in 3.1, the new API in 3.2 and beyond is distinctly different, and is intended to replace any older versions. Therefore, the REST API of version 3.1 and before will now be referred to as the UI API, and the term “REST API” will refer to the new API that I’m covering in this post.

Before I dive into the details though, I’d like to clarify the usage of “REST” and what I mean when I speak of it. First of all, REST is not a protocol or standard. There is no RFC, or ISO specification on what constitutes REST; it is a philosophy about the relationship between entities in a software system and the interface to interact with those entities. Roy Fielding’s original thesis named it Representational State Transfer, which when put into practice means that URIs should convey meaning in a durable manner. In essence, REST emphasizes the “what” of a system rather than the “how”. In comparison,

Read More...

March 1st, 2008

Splunk Cross-Platform JavaScript Profiler Complete!

I’ve completed a cross-platform Splunk JavaScript profiler, that’s proving to be useful in identifying and eliminating performance issues, particularly in IE. It’s now merged into current, so here’s how to use it:

0. Load the Splunk UI. Be sure that compressStaticFiles is OFF.

1. Type javascript: Profiler.start() into your location bar

2. Wait for the message bar to say “Profiler is running…”

3. Do some stuff that you’d like to test for speed.

4. Type javascript: Profiler.report() to see a list of function times in a format similar to the Firebug profiler.

5. To compare two profiles, which is useful to test code changes, optimizations, or browser vs. browser, notice at the bottom of any profile a long line of “Profile JSON”. Copy and paste this into a buffer somewhere (one that doesn’t corrupt the data by inserting newlines).

6. Then start over with a new profile. This time, when done, call javascript: Profiler.compare() instead of Profiler.report().

7. Paste the first report’s JSON into the prompt() box that appears.

8. The pasted report’s times are shown in parentheses, with significant variances (>20%) shown in bold.

9. It’s really handy to make bookmarklets for javascript: Profiler.start() etc. Saves typing.

Profiler.compare() output

Results are commensurate with the Firebug profiler. Notably, function times include the time consumed by other functions called within their scope - and if this happens to be a synchronous XmlHttpRequest, the total time is going to reflect the server portion as well.

Enjoy! Let me know if you have questions or comments.

Read More...

February 22nd, 2008

You want a platform? We got your platform right here, buddy.

There has been a lot of talk about the Splunk Platform of late, but what exactly does it mean when we say we have a platform? I figured this would be an interesting question to spring upon unsuspecting members of the development team, and here’s what they (and I) had for our answers:

Browsing over on Wikipedia, one excerpt states that “a platform describes some sort of hardware architecture or software framework”, and the description for a software framework, says it “may include support programs, code libraries, a scripting language, or other software to help develop and glue together the different components of a software project”.

Read More...

February 22nd, 2008

Delimiter base KV extraction - advanced

If you’ve read my previous post on delimiter based KV extraction, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the “advanced” cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.

Observations
1. Header-body. Some applications, for different reasons, choose to format their log files using a header and a body section. The header usually describes the way the fields are organized in each logged event, while the body consists of logged events, usually one per line, with field values delimited as described in the header. W3C, CSV etc come to mind, see examples
2. Single-delimiter. Other applications choose to use a single delimiter to delimit keys from values and values from keys, while this is not very common it’s been observed in the field.

Data Examples
The following header-body sample, as you can probably guess, is from an exchange server. There is a header section which among other things has the list of field names, delimited from each other using the delimiter used to delimit values in the body section, in this case a tab character is used (even though our blogging platform chooses to mangle tabs to spaces - gotta love it !!!).

# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Fields: time client-ip cs-method sc-status
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240

The following example shows how a single-delimiter can be used to list fields, it is pretty easy for us, as humans, to recognize the key value pairs:

"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"

Enabling header-body kv/extract
The delimiter based KV extraction solves the header-body problem by

Read More...


Close
E-mail It