Splunk SDK for Python Analytics application

The Analytics application is like a mini implementation of a web service in the style of Google Analytics or Mixpanel, using the Splunk SDK for Python. Without needing to define a schema, you can log different types of events with key-value properties for each. The number of properties can vary too, even for the same type of event.

 

What you need to run the application

To use the Analytics application, you'll need the Splunk SDK for Python (for more, see the requirements and installation pages).

Make sure Splunk is running. You ought to add some events if you haven't used Splunk before, although this application does create its own events as you navigate the site (it logs itself).

To run the application, open a command prompt in the /splunk-sdk-python/examples/analytics directory and enter the following command:

python server.py

Then, open a browser and navigate to http://localhost:8080/applications, and you'll see something like this:

 

A closer look at the code

The Analytics application files are located in the /splunk-sdk-python/examples/analytics directory. This application has two pieces of reusable code to manage getting data in and out of Splunk.

 

AnalyticsTracker

The AnalyticsTracker class logs analytics data, and is defined in the input.py file as the input component of the Analytics service.

The AnalyticsTracker class encapsulates all the information that is required to log events to Splunk. This includes the application name (which you can think of as a namespace, if you log events from multiple applications into the same Splunk instance) and Splunk connection parameters. The application also takes an optional index parameter, which is used mostly for testing purposes.

So, for example, you could use AnalyticsTracker like this:

from analytics.input import AnalyticsTracker
splunk_opts = ...
tracker = AnalyticsTracker("myapp", splunk_opts)

Once you have an instance of the AnalyticsTracker, you can use it to track your events. For example, to log an event when a user logs in with additional values (user ID, user name, user agent), you could do something like this:

user_id = ...
username = ...
useragent = ...
tracker.track("login", distinct_id = user_id, "username"=username, "useragent"=useragent)

Here are some of the parameters:

  • The event parameter is the name of the event to log.
  • The distinct_id parameter specifies a unique ID that you can use to group events, for example, if you only want to count unique logins by user_id.

The rest of the parameters are arbitrary key-value pairs that you can extract.

When you use AnalyticsTracker to log an event, internally this class constructs a textual representation of that event. This class also encodes the content to work properly with Splunk. For example, the example event above would look something like this:

2011-08-08T11:45:17.735045 application="myapp" event="login" distinct_id="..." analytics_prop__username="..." analytics_prop__useragent="..."

We use the analytics_prop__ prefix to make sure there is no ambiguity between known fields (such as application and event), and user-defined key-value properties.

 

AnalyticsRetriever

The AnalyticsRetriever class extracts events that are logged using the AnalyticsTracker class, and is defined in the output.py file as the output component of the Analytics service.

You can create an AnalyticsRetriever instance like this:

from analytics.output import AnalyticsRetriever
splunk_opts = ...
retriever = AnalyticsRetriever("myapp", splunk_opts)

You can use the AnalyticsRetriever methods to query information about events. These methods run a Splunk search, retrieve the results, and transform them into a well-defined Python dictionary format. Here are some examples of how to use the methods in AnalyticsRetriever.

  • To list all applications:
  • print retriever.applications()
  • To list all the types of events in the system:
  • print retriever.events()
  • To list the union of all properties used for a particular event:
  • event_name = "login"
    print retriever.properties(event_name)
  • To get all the values of a given property for some event:
  • event_name = "login"
    prop_name = "useragent"
    print retriever.property_values(event_name, prop_name))
  • To get a graph of event information over time for all events of a specific application (this uses the default TimeRange.MONTH):
  • print retriever.events_over_time()
  • To get a graph of event information over time for a specific event:
  • print retriever.events_over_time(event_name="login")
 

A simple web app

In the server.py file, we created a simple web app built on the Analytics service for displaying analytics data. This web app lists applications, and for each application displays a graph of events over time and its properties. But we had some help:

  • We make use of the open source Flot graphing library to render our JavaScript graphs.
  • We use Bottle (bottle.py) as a micro web framework.