February 22nd, 2008
Delimiter base KV extraction - advanced
If you’ve read my previous post on delimiter based KV extraction, you might be wandering whether you could do more with it (Anonymous Coward did). Well, yes you can, I am going to cover the “advanced” cases here. Before covering the capabilities, as in other posts, I would first go over some observations and examples.
Observations
1. Header-body. Some applications, for different reasons, choose to format their log files using a header and a body section. The header usually describes the way the fields are organized in each logged event, while the body consists of logged events, usually one per line, with field values delimited as described in the header. W3C, CSV etc come to mind, see examples
2. Single-delimiter. Other applications choose to use a single delimiter to delimit keys from values and values from keys, while this is not very common it’s been observed in the field.
Data Examples
The following header-body sample, as you can probably guess, is from an exchange server. There is a header section which among other things has the list of field names, delimited from each other using the delimiter used to delimit values in the body section, in this case a tab character is used (even though our blogging platform chooses to mangle tabs to spaces - gotta love it !!!).
# Message Tracking Log File
# Exchange System Attendant Version 6.5.7638.1
# Fields: time client-ip cs-method sc-status
14:13:11 10.1.1.9 HELO 250
14:13:13 10.1.1.9 MAIL 250
14:13:19 10.1.1.9 RCPT 250
14:13:29 10.1.1.9 DATA 250
14:13:31 10.1.1.9 QUIT 240
The following example shows how a single-delimiter can be used to list fields, it is pretty easy for us, as humans, to recognize the key value pairs:
"url http://splunk.com referer http://dev.splunk.com ip 10.10.10.10"
Enabling header-body kv/extract
The delimiter based KV extraction solves the header-body problem by
