How to create modular inputs in the Splunk SDK for Java

Support for modular inputs in Splunk Enterprise 5.0 and later enables you to add new types of inputs to Splunk Enterprise that are treated as native Splunk Enterprise inputs. Your users interactively create and update your custom inputs using Splunk Home, just as they do for native inputs. If you're not already familiar with modular inputs, see the following topics to get started:

Some examples of modular inputs include:

The Splunk SDK for Java enables you to use Java to create new modular inputs for Splunk Enterprise. This section of the Splunk SDK for Java documentation will show you how to create modular inputs using Java, and then how to integrate them with your Splunk Enterprise app.

Why use modular inputs?

Modular inputs are ideal for packaging and sharing technology-specific data sources. One of the primary reasons to use modular inputs is that it enables users to interact with key information using the familiar Splunk Web interface, without needing to edit config files. Modular inputs also provide runtime controls and allow the input to specify per-event index-time settings.

More information about the differences between modular inputs and traditional scripted inputs is available here: Modular inputs vs. scripted inputs.

To create modular inputs programmatically

With the Splunk SDKs, you can create modular inputs programmatically using your preferred programming language, such as Java, C#, or Python. Here we describe how to create a modular input programmatically with Java.

To create a modular input, you first set up a modular input script. A modular input script does the following:

  1. Return the introspection scheme to Splunk Enterprise. The introspection scheme defines the behavior and endpoints of the script. When Splunk Enterprise starts, it runs the script to determine the modular input's behavior and configuration.
  2. Validate the script's configuration (optional). Whenever a user creates or edits an input, Splunk Enterprise can call the script to validate the configuration.
  3. Stream data. The script streams event data that can be indexed by Splunk Enterprise. Splunk Enterprise invokes the script and waits for it to stream events.

The easiest way to create a modular input programmatically using Java is to subclass the SDK's abstract base class, com.splunk.modularinput.Script. The preceding three steps are accomplished as follows using the Splunk SDK for Java:

  1. Return the introspection scheme: Override the getScheme method.
  2. Validate the script's configuration (optional): Override the validateInput method. This is required if the scheme returned by getScheme was set to use external validation (that is, it had Scheme.setUseExternalValidation(true) called on it).
  3. Stream data: Override the streamEvents method.

In addition, you must provide a simple main method. It need only be one line. It can't be inherited because it must create an instance of your class, and the abstract base class (com.splunk.modularinput.Script) has no way to do so.

Here's the skeleton of a new modular input script created in Java:

import com.splunk.modularinput;
import javax.xml.stream.XMLStreamException;
import java.io.IOException;

public class MyInput extends Script {

    public static void main(String[] args) {
        new Program().run(args);
    }

    @Override
    public Scheme getScheme() {
        // Returns scheme.
    }

    @Override
    public void validateInput(ValidationDefinition definition) throws Exception {
        // Validates input.
    }

    @Override
    public void streamEvents(InputDefinition inputs, EventWriter ew) throws
            MalformedDataException, XMLStreamException, IOException {
        // Splunk Enterprise calls the modular input, 
        //   streams XML describing the inputs to stdin, 
        //   and waits for XML on stdout describing events.
    }
}

In the SDK's examples directory is a working demonstration of a modular input script written in Java. It generates random numbers every half second to demonstrate event generation and streaming. It's located at .../examples/com/splunk/examples/random_numbers in the Splunk SDK for Java repository. The following section goes step-by-step through its creation, using the skeleton provided previously as a starting point. You can also go right to the sample code, which has been extensively commented.

To write a modular input script in Java

Using the starting point from the previous section, we'll now guide you through the creation of the components of a modular input script in Java. This is the same script that is located at .../splunk-sdk-java/examples/com/splunk/examples/random_numbers in the Splunk SDK for Java repository. It generates random numbers every half second to demonstrate event generation and streaming.

The getScheme method

When Splunk Enterprise starts, it looks for all the modular inputs defined by its configuration, and tries to run them with the argument --scheme. Splunkd expects each modular input to print a description of itself in XML to stdout. The SDK's modular input framework takes care of all the details of formatting the XML and printing it. You only need to override the getScheme method and return a new Scheme object.

First, create a new Scheme object, and then name and provide a description for it:

    Scheme scheme = new Scheme("random_numbers");
    scheme.setDescription("Generates events containing a random number.");

In this case, Splunk Enterprise will display "random_numbers" to users for this input, with the given description.

Next, specify whether you want to use external validation using the Scheme.setUseExternalValidation(true) setter method. External validation is taken care of by overriding the validateInput method. If you set external validation without overriding the validateInput method, the script will accept anything as valid.

    scheme.setUseExternalValidation(true);

If you set setUseSingleInstance(true) on the scheme, the scheme will pass all the instances of the modular input to a single instance of the script. You're then responsible for handling all of them. Otherwise, Splunk Enterprise starts a Java Virtual Machine (JVM) for each instance of the input.

    scheme.setUseSingleInstance(true);

Generally you only need external validation if there are relationships you must maintain among the parameters, such as requiring one variable to be less than another, or checking whether some resource is reachable or valid. If you don't choose external validation, Splunk Enterprise lets you specify a validation string for each argument and runs validation internally using that string.

In the example modular input, there are two variables, min and max, that represent the minimum and maximum values, respectively, for the generated random numbers. We'll add them to the scheme using the Argument class, its setters, and the addArgument method:

    Argument minArgument = new Argument("min");
    minArgument.setDataType(Argument.DataType.NUMBER);
    minArgument.setDescription("Minimum value to be produced by this input.");
    minArgument.setRequiredOnCreate(true);
    // If not using external validation, add something like:
    // setValidation("min > 0");
    scheme.addArgument(minArgument);

    Argument maxArgument = new Argument("max");
    maxArgument.setDataType(Argument.DataType.NUMBER);
    maxArgument.setDescription("Maximum value to be produced by this input.");
    maxArgument.setRequiredOnCreate(true);
    scheme.addArgument(maxArgument);

After adding any validation variables to the scheme, return the scheme:

    return scheme;

The validateInput method

The validateInput method is where the configuration of an input is validated, and is only needed if you've set your modular input to use external validation. If validateInput does not throw an exception, the input is assumed to be valid. Otherwise it prints the exception as an error message when it tells splunkd that the configuration is not valid.

When you use external validation, after splunkd calls the modular input with the --scheme argument to get a scheme, it calls it again with the --validate-arguments option for each instance of the modular input in its configuration files, feeding XML on stdin to the modular input to get it to do validation. It calls it the same way again whenever a modular input's configuration is changed.

In our example, we're using external validation, since we want the max variable to always be greater than the min value. Our validateInput method contains basic logic that retrieves the two variables and then compares them to each other:

    @Override
    public void validateInput(ValidationDefinition definition) throws Exception {
        // Get the values of the two parameters. 
        double min = ((SingleValueParameter)definition.getParameters().get("min")).getDouble();
        double max = ((SingleValueParameter)definition.getParameters().get("max")).getDouble();

        if (min >= max) {
            throw new Exception("min must be less than max; found min=" + Double.toString(min) +
                    ", max=" + Double.toString(max));
        }
    }

The streamEvents method

The streamEvents method is where the event streaming happens. Splunk calls the modular input with no arguments, streams XML that describes the inputs to stdin, and then waits for XML on stdout describing events.

In our example, since we're using a single instance of this script (we set setUseSingleInstance(true) in the getScheme method), we start a thread for each instance of the modular input. For scripts that are not single instance, it's simpler to do the work directly in the streamEvents method.

    @Override
    public void streamEvents(InputDefinition inputs, EventWriter ew) throws MalformedDataException,
            XMLStreamException, IOException {
        for (String inputName : inputs.getInputs().keySet()) {
            // We get the parameters for each input and start a new thread for each one. 
            // All the real work happens in the class for event generation.
            double min = ((SingleValueParameter)inputs.getInputs().get(inputName).get("min")).getDouble();
            double max = ((SingleValueParameter)inputs.getInputs().get(inputName).get("max")).getDouble();

            Thread t = new Thread(new Generator(ew, inputName, min, max));
            t.run();
        }
    }

In our example, we've created a class called Generator, which implements Runnable. Within it are a constructor and the run method. Within the run method, we first log an INFO message at the thread has started, get a random number, and then write a new event. At minimum, you must set the stanza that the event is supposed to go to (setStanza; though you can skip it if your modular input is not single instance) and the data of the event (setData).

import java.util.Random;

...

    class Generator implements Runnable {
        private double min, max;
        EventWriter ew;
        String inputName;

        public Generator(EventWriter ew, String inputName, double min, double max) {
            super();
            this.min = min;
            this.max = max;
            this.ew = ew;
            this.inputName = inputName;
        }

        public void run() {
            ew.synchronizedLog(EventWriter.INFO, "Random number generator " + inputName +
                    " started, generating numbers between " +
                    Double.toString(min) + " and " + Double.toString(max));

            final Random randomGenerator = new Random();

            while (true) {
                Event event = new Event();
                event.setStanza(inputName);
                event.setData("number=" + randomGenerator.nextDouble() * (max - min) + min);

                try {
                    ew.writeEvent(event);
                } catch (MalformedDataException e) {
                    ew.synchronizedLog(EventWriter.ERROR, "MalformedDataException in writing event to input" +
                            inputName + ": " + e.toString());
                }

                try {
                    Thread.sleep(500);
                } catch (InterruptedException e) {
                    return;
                }
            }
        }
    }

The INFO message will show up in splunkd.log and in Splunk's _internal index. Keep in mind that EventWriter provides both log and synchronizedLog (a synchronized version of a log). In the case of our example, synchronizing at the level of each log message and event is exactly what we want. In more complicated cases, you may want to use the unsynchronized version and do your own synchronization..

Next steps

You're now ready to integrate your modular input script into your app. Before that, however, compile it into your program, and fix any problems that crop up.

To integrate the modular input into your program and add it to Splunk Enterprise

With your modular input compiled, you're ready to integrate the script into your app.

  1. Package your modular input script as an executable JAR file.

    1. First, include the Splunk SDK for Java along with the code you've written for the program.

    2. Create a manifest file. Something like the following is sufficient:

      Manifest-Version: 1.0
      Main-Class: com.yourpackage.YourProgram
    3. Copy the /splunk/com/splunk/modularinput directory from the Splunk SDK for Java to com/splunk/modularinput in your source directory, and compile its contents to class files. At this point you should have a directory structure that looks like this, where <yourpackage> and <YourProgram>.class correspond to the names you've give the package and the program, respectively:

      .../
        com/<yourpackage>/
          <YourProgram>.class
        com/splunk/modularinput/
          Argument.class
          Argument$DataType.class
          Event.class
          ...
        MANIFEST.MF
    4. At this directory level, run the following, substituting <myinput> with the name you want to give the JAR file:

      jar cmf MANIFEST.MF <myinput.jar> com

      You can check whether this was successful by running the following, again substituting <myinput> with the name of the JAR:

      java -jar <myinput.jar> --scheme

      If the JAR packaging was successful, the XML of your modular input's scheme prints to the console.

  2. Create an app.conf file. In your app's default directory, create a file called app.conf. This file is used to maintain the state of an app or customize certain aspects of it in Splunk Enterprise. The contents of the app.conf file can be very simple:

    [install]
    is_configured = 0
    
    [ui]
    is_visible = 1
    label = random-numbers
    
    [launcher]
    author = Sample Author
    description = 
    version = 1.0
  3. Copy the executable JAR into your app. Copy the JAR file into your app's jars directory. You can optionally write a file <myinput>.vmopts (where <myinput> is the base name of the modular input) that contains arguments to pass to the JVM on startup (for instance, -Xms512M -agent someagent.jar), and save it into in the same directory. Your app directory now looks like this:

    .../
      ...
      default/
        app.conf
      jars/
        myinput.jar
        myinput.vmopts
      ...
    
  4. In your app's inputs.conf.spec file (in the README directory), create a stanza that describes the input. For example, a modular input called myinput with two fields, abc and xyz, would look like the following:

    [myinput://default]
    *This is a comment describing this modular input
     
    abc = <value>
    xyz = <value>
    

    Your app directory now looks like this:

    .../
      ...
      README/
        inputs.conf.spec
      jars/
        myinput.jar
        myinput.vmopts
      ...
    
  5. Set up the shims. Splunk Enterprise can't directly run Java JAR files, so the Splunk SDK for Java provides a set of shims to proxy between Splunk Enterprise and the JVM. You will find these shims in the launchers directory of the SDK. The file called shim-darwin.sh is a shell script that works as a shim on OS X/macOS. The shim-linux.sh file is a shim for Linux. (It might also work on other Unix systems, but they have not been tested and are not supported.) The shim-windows_x86.exe and shim-windows_x86_64.exe files are shims for 32-bit and 64-bit Windows, respectively. In your app, create the following directories, including the bin directory in each one, as shown:

    • darwin_x86_64/bin/
    • linux_x86/bin/
    • linux_x86_64/bin/
    • windows_x86/bin/
    • windows_x86_64/bin/

    Copy the shim files into the appropriate bin directory: (Copy shim-linux.sh into both Linux bin directories.) Your directory structure now looks like this:


    .../
      darwin_x86_64/bin/
        shim-darwin.sh
      linux_x86/bin/
        shim-linux.sh
      linux_x86_64/bin/
        shim-linux.sh
      windows_x86/bin/
        shim-windows_x86.exe
      windows_x86_64/bin/
        shim-windows_x86_64.exe
    

    Rename the shim files to correspond to the base name of your modular input. For instance, for our SDK example, we've renamed the shims to myinput.sh and myinput.exe. Now your directory structure looks like this:

    .../
      darwin_x86_64/bin/
        myinput.sh
      linux_x86/bin/
        myinput.sh 
      linux_x86_64/bin/
        myinput.sh 
      windows_x86/bin/
        myinput.exe
      windows_x86_64/bin/
        myinput.exe 
    
  6. Install and restart. Install your app, and then restart Splunk Enterprise. Go to Splunk Manager (or, in Splunk 6.0 and later, click the Settings menu) and click Data inputs. Splunk Enterprise picks up the modular input and lists it here. You can now add new instances of your modular input to Splunk Enterprise.