Download | Support
Splunk.com | SplunkBase | dev.splunk.com

April 30th, 2008

Help Me Help You

Peoples of the Interweb,

As one of the Splunk Support Monkeys I am going to try to start a semi-regular series of posts on a topic that is near and dear to me — getting the Splunk community to be able to troubleshoot their issues without the need to reach out to the Support Team.

The most important piece of any troubleshooting exercise is getting a solid understanding of the problem. The common statement “Shit is broke” while ’summarizing’ the problem doesn’t do much in the way of isolating the specific problem. Taking a minute or two to think about the problem at and documenting the sequence of events leading up to the problem goes a long way to getting outsiders up to speed on the issue.
Here are few things to keep in mind when working with support:

I don’t work in the next cube over.

This means I don’t have insight into all of the other moving parts of your network. Try avoiding acronyms that are specific to your organization. I don’t know the naming convention that you use for machine names, so if one box is in LA and the other is New York tell me, don’t expect me to know that foo.company.com is sitting in the LA data center.

Less is not more.

You can never give a support engineer to much data. Often times folks think that they have identified the offending error message in the logs and provide that one line in their support ticket. The problem with this is that the support engineer does not get the benefit of context. Most errors are the result of a series of events leading up the final failure. Being able to see what was going on leading up to the problem often times is what allows

Read More...

April 29th, 2008

WMI comes to Splunk

The Windows release of Splunk Preview debuts with WMI. So, what is WMI for all you splunkheads out there? It’s an OS interface which allows “instrumented components to provide information and notification”. WMI gives you the ability to query system instrumentation data such as system performance, event logs, end countless other events that occur on the system. It also has the capability of doing this agent-less from remote machines. The most exciting feature is the ability to do collection of Windows event logs from other machines on your network simultaneously. A Splunk install is not required on every single node that generates this data, and you don’t need to do anything special to facilitate this. Assuming you’ve set up proper authentication between the machines, of course. Setting up proper WMI security is a hot topic on its own.

From the standpoint of configuration and what WMI is capable of doing, in the context of Splunk, WMI can be used in two ways: to pull event logs and to query instrumentation data. Assuming that you have enough credentials to poll event logs agentlessly, you can simply specify host name and the log file you are interested in. This is an example of retrieving “Application” event logs from a remote machine named “remotehost”:

[WMI:RemoteApplication]
namespace = \\remotehost\root\cimv2
interval = 10
event_log_file = Application
disabled = 0

The other aspect of WMI warrants more explanation. To get data from WMI providers, you query them using WQL (WMI query language), which is a subset of SQL. Simply specify a query, and all fields returned by the provider will be automatically collated as an event. (Some queries return multiple results, and hence generate multiple events.) An example query will be select FreeMegabytes from Win32_PerfFormattedData_PerfDisk_LogicalDisk, which will poll free disk space from all logical disk partitions on the

Read More...

April 28th, 2008

Splunk Windows Registry Monitor

Hey everyone, just wanted to let you know that a preview release of Splunk just left the docks.

http://www.splunk.com/index.php/preview

I want to introduce to you one the latest features for Windows Splunk - the monitoring of Windows registry in real time for activity/events, and the indexing and searching these events with Splunk.

While working on this we had a few challenges:

First, there aren’t any published win32 APIs that does this in user mode. The best that you can do with win32 API is to poll the registry for certain registry key/hives, and you’ll be notified when if the key or subkey of the hive has been changed. Even when you get a notification for a change, you will not be told which key exactly has changed, you’ll have to figure that out yourself .

Second, scalability. You can’t possibly poll all of the registry in user mode for changes. There are simply too many keys to query.

The solution is to write a device driver that hooks to the kernel and intercepts all registry events. The driver bubbles up the events to the user mode for filtering and tagging, and finally pipe them to Splunk for indexing. Obviously, this driver needs to be very stable and reliable, needs to scale to the point where if you want to monitor all of the events in the registry, and it should be able to handle the load.

With this preview release we launched the first version of the splunk-regmon tool. The tool writes events to standard output, and using Splunk’s ExecProcessor(popen). Splunk is able to get these events and send them through the indexing pipeline. A basic filtering is in place, hard coded for now to only monitor registry events related to changes - i.e. Create, Delete, Set, etc. Create type events

Read More...

April 24th, 2008

On the off chance you need help with Windows

Hello Internets,

As one of the splunkers responsible for answering the phone I’m going to use this space to talk about something near and dear to my hart — empowering my customers so they are able to figure out their own problems thereby allowing me read FARK all day long.

Since we recently released our Windows version a bunch of the folks in the office have been trying to figure out how they do the things they do in a UNIX enviornment (like wget a file) in Windows. I’ve been sharing some of my favorite Windows resources here at the office and figures the rest of you would probably like to know about them as well.

Google
Everyone seems to start here when they are looking for something. Most however don’t know that http://www.google.com/microsoft will restirct your search to Windows sites. They also have these search sites for linux, bsd, and the mac.

SysInternals
Mark and Bryce have created the ultimate coolection of free Windows utilities. Simple executables that allow to get so many of the diagnostic/monitoring things that a UNIX admin takes for granted. Some of my favorites (and especially useful in working with Splunk) in no particular order:

  • AccessEnum
    Lets you see who has access to what. This is really helpful when trying to figure out why Splunk isn’t indexing one of your files.
  • Process Monitor
    Watch the registry, running process/thread/DLL, and file system usage in real-time
  • PS Tools
    A bunch of command-line utilities for listing the processes running, working with the event log, rebooting the machine, etc.
  • Active Directory Explorer
    Advanced viewer/editor for Actiive Directory. This will be a godsend you are trying to configure Splunk to authenticate against your domain controller
  • WhoIS
    Doesn’t do much in the way of troubleshooting Splunk, but who doesn’t want to be able to see if ultramegaextrmeme.com is available and if not

Read More...

April 16th, 2008

overriding default syslog host extraction

I had a customer recently ask how to change the host that was applied to a particular set of incoming events. Normally this wouldn’t be a big deal, just specify the new name in inputs.conf. But this is from syslog. When you set one of the syslog sourcetypes there is some extra processing to extract the correct hostname which overrides other settings. And the hostname in the event is wrong.

So to get the right one, I set up this transform to force it to a specified value. And still give it my correct syslog sourcetype.
My inputs.conf is tailing an entire directory, which for sake of demonstration I’m going to pretend is all syslog.

$ more inputs.conf
host = support09.splunk.com
[tail:///var/log]
disabled = false
host = support09.splunk.com
sourcetype = syslog

props.conf is specifying a transform only for the source of interest:

$ more props.conf
[source::/var/log/system.log]
# note: overriding default syslog transform!
TRANSFORMS = feorlenhost

and transforms.conf is defining what to do to it. I have to specify a REGEX, but I’m not actually using it so I’ll just say ‘.’ to match everything. The FORMAT line is what is going to set my host:

$ more transforms.conf
[feorlenhost]
DEST_KEY = MetaData:Host
REGEX = .
FORMAT = host::feorlenhost.splunk.com

So whatever syslog put in there for host, ignore and use my static value instead.

Read More...

March 27th, 2008

Splunk for Virtualization

I’m looking for some help.
I’ve built a VMWare app for splunk and in the process of doing the same for Xen. These Apps use the VMWare and Xensource API’s to index everything about the VM environment. When combined with splunk instances running within the guest OS you get a very comprehensive historical picture. I’m curious are there any splunk customers out there using VMWare or Xen? I’m looking for usecases so that i better understand how to configure the apps. I’d be curious to know what types of information would be useful to capture and what types of searches would one want to perform. Both Xen and VMWare have so much data available that configuration could be complicated. I’m trying to narrow it down to several useful out of the box configurations. If your have any thoughts comment here or email me at erik at splunk dot com.

Thanks
e.

Read More...

March 26th, 2008

The Splunk Python client library (part 1)

Splunk 3.2 introduces a publicly available Python client library that allows external developers to programmatically interact with Splunk by importing a few key modules.

The easiest way to get started with the client library is to get into Splunk’s Python environment. Locate your Splunk install directory (/opt/splunk by default), and start the python interactive shell that comes with Splunk:

# bin/splunk cmd python

This will launch the interactive Python prompt, which starts off looking like this:

Python 2.5.1 (r251:54863, Nov 18 2007, 16:13:41)
[GCC 4.0.1 (Apple Computer, Inc. build 5363)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Starting a search

Import the Splunk modules:

import splunk.auth
import splunk.search as se

If you have installed Splunk with the default settings, then your hostpath is https://localhost:8089. The client library knows this default, so you can authenticate directly by providing a username and password:

key = splunk.auth.getSessionKey('admin','changeme')

The getSessionKey method automatically caches the session key in the current interactive session, so you don’t have to pass it along to subsequent methods. In a production implementation, or if you are connecting to multiple servers, you’ll need to keep track of separate session keys.

If your server is on a different hostname or port, then you need to first update the session defaults:

splunk.mergeHostPath('splunk_hostname:12000', True)
key = splunk.auth.getSessionKey('admin','changeme')

The mergeHostPath method takes host information in many different forms:

  • hostname
  • hostname:port
  • https://hostname
  • http://hostname:port

Next, start a search:

job = se.dispatch('search error')

This creates a search job handle object job and start a running search on the server for events that contain the term “error”. If you are connecting to multiple servers, then you’ll also need to provide hostPath and sessionKey parameters as well. This handle is keyed off of the search job ID that is generated by the server, and is available via:

job.id

With this ID, you can always use

Read More...

March 13th, 2008

Digging into metrics.log

Occasionally people ask for help in identifying a rogue data input that is suddenly spewing events. If it’s hidden in a ton of similar data it can be difficult to sort out which one is actually the problem. One place to look is the Splunk internal metrics.log. You can find it by searching the internal index (add “index=_internal” to your search) or just look in the file itself (located in $SPLUNK_HOME/var/log/splunk.)

Before I get into what can be found there, I need to explain what metrics.log is not. It is a sampling over 30 second intervals, so it will not give you an exact accounting of all your inputs. For each type of item reported, you get the top ten hot sources over the interval, based on the size of the event (_raw.) It is different from the numbers reported by LicenseManager, which include the indexed fields. Also, the default configuration only maintains the metrics data in the internal index a few days, but by going to the files you can see trends over a period of months if your rolled files go that far back.

A typical metrics.log has stuff like this:

03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=tail, processor=tail, cpu_seconds=0.000000, executes=31, cumulative_hits=73399
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=annotator, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=clusterer, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=readerin, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=pipeline, name=typing, processor=sendout, cpu_seconds=0.000000, executes=63, cumulative_hits=134912
03-13-2008 10:48:55.620 INFO Metrics - group=thruput, name=index_thruput, instantaneous_kbps=0.302766, instantaneous_eps=2.129032, average_kbps=0.000000, total_k_processed=19757, load_average=0.124023
03-13-2008 10:48:55.620 INFO Metrics - group=per_host_thruput, series=”fthost”, kbps=0.019563, eps=0.096774, kb=0.606445
03-13-2008 10:48:55.620 INFO Metrics - group=per_host_thruput, series=”grumpy”, kbps=0.283203, eps=2.032258, kb=8.779297
03-13-2008 10:48:55.620 INFO Metrics - group=per_index_thruput, series=”_internal”, kbps=0.275328, eps=1.903226, kb=8.535156
03-13-2008 10:48:55.620 INFO Metrics - group=per_index_thruput, series=”_thefishbucket”, kbps=0.019563, eps=0.096774, kb=0.606445

Read More...

March 6th, 2008

Using the Atom Feed Format in Enterprise Software

XML is a great format for exchanging information because it balances readability, extensibility, and compatibility across heterogeneous environments. However, its flexibility is also a disadvantage because it is far too easy to create a proprietary XML schema, resulting in lots of custom code to interface with various systems. Lots of custom code leads to brittleness, and brittleness leads to frustration. The key to salvation lies in standardization.

Enter the Atom standard: a standards-track schema that defines a generic collection/item container format in XML. Most people equate Atom to an RSS competitor, which is true, but that only covers half of what it does. The Atom Publishing Protocol is a well-defined protocol for performing CRUD (Create, Read, Update, Delete) operations on items over HTTP. The Atom Syndication Format, which is the most commonly used portion, defines the XML schema used to deliver data during a Read operation. Atom was spearheaded by Sam Ruby, and is now back by people like Brad Fitzpatrick, Tim Bray, Jeremy Zawodny, Mark Pilgrim, and is heavily implemented by Google.

Like most software systems, the majority of Splunk’s internal entities can be loosely viewed as a collection of similar items. The requested searches, configuration information, saved searches, users, roles — all just collections. So instead of creating five separate XML schemas for each of these collections that perfectly describe their contents, I chose Atom to serve as a single generic container to describe all of the entities. This kind of reuse is echoed by Pat Helland of Amazon, who gives a great talk on relating the rise of the industrial age to standardization, and Tim Bray (Mr. XML himself), who advocates against creating your own XML unless absolutely necessary.

The benefit of sticking to a standard is that there is a much greater chance that external developers already know exactly

Read More...

March 6th, 2008

Splunk Replay: Search results in motion

Inspired by glTail.rb and Digg Lab’s Stack, Splunk Replay is an animated data visualization that “replays” search results as a simulated event stream. The simulation displays events at a rate proportional to the times at which the events originally occurred.

Each event is represented by a single square particle that flows from its place in a legend of values to its corresponding position in a stacked column chart. Upon landing in the column chart, one of the event’s fields is output in a readable format below the chart. Both the legend of values and the stacked column chart retain the order of their values according to a configurable comparator and truncate older values to make space for new ones. Rolling your mouse over any column displays the field values for that column.

Read More...


Close
E-mail It