Splunk SDK for Java Export application

The Export application is a simple but interesting example that exports events from a Splunk index to a file. By default, the application exports all events from one index and saves them to a file in CSV format. But, you can use additional command-line arguments to change the output format and use a search query to limit the events that are exported. This application also has the ability to recover the indexing process if interrupted by determining which events were already indexed and appending only the newer events to the file.

 

Run the Export application

All you need to run the Export application is the Splunk SDK for Java (and its requirements, naturally). After you've built the SDK, run the example at the command line—open a command prompt in the /splunk-sdk-java directory, then enter the following command to export the "main" index:

java -jar dist/examples/export.jar main --username="admin" --password="changeme"

You can omit the username and password arguments if you save your login credentials in the .splunkrc file, which is used for convenience while you play with the Splunk SDKs. From here on out, we'll assume you're using the .splunkrc file and omit those arguments from command-line examples.

The Export application exports the "main" index to export.out, which is saved to the current working directory. If you want to run this application again, delete the export.out file before you try again—otherwise, you'll get an error.

Here's another command-line example showing how to include a search query and change the output format to JSON:

java -jar dist/examples/export.jar main --search="search sourcetype=access_*" json
 

Run the Export application in recover mode

To use the recover option for exporting, you'll first need to run the Export application normally but stop the process before it's done. For example, enter:

java -jar dist/examples/export.jar main

Then, after waiting a second or two, press CTRL + C to interrupt the process. Then, start exporting again using the recover argument:

java -jar dist/examples/export.jar main recover

The "main" index should then be completely exported to export.out.

 

A closer look at the code

The code for the Export example application is located in the /splunk-sdk-java/examples/export directory. The flow includes a couple of paths:

 

A normal export

During the normal export process, the application parses the command-line arguments, and then builds an args object that includes the default and user-specified settings:

    static void run(String[] argv) throws Exception {
        Command command = Command.splunk("export");
        command.addRule("search", String.class, "Search string to export");
        command.parse(argv);
        Service service = Service.connect(command.opts);
        Args args = new Args();

The name of the index you want to export is extracted from the command-line arguments you specified, and is included later in the final search query:

        String indexName = null;
        if (command.args.length == 0) {
            System.out.println("Exporting main index");
            indexName = "main";
        } else
            indexName = command.args[0];

The output format is also extracted from the command-line arguments, and then set later as the output_mode:

        if (command.args.length > 1) {
            for (int index=1; index < command.args.length; index++) {
                if (command.args[index].equals("recover"))
                    recover = true;
                else if (command.args[index].equals("csv"))
                    format = "csv";
                else if (command.args[index].equals("xml"))
                    format = "xml";
                else if (command.args[index].equals("json"))
                    format = "json";
                else
                    throw new Error("Unknown option: " + command.args[index]);
            }
        }

Various default settings are used for the required search query (time to keep the search, the output format, and the index name). If you don't specify a search query as an argument at the command line, the default is to search everything (*) in the index.

        args.put("timeout", "60"); 
        args.put("output_mode", format); 
        args.put("earliest_time", "0.000"); 
        args.put("time_format", "%s.%Q");
        if (command.opts.containsKey("search")) {
            search = (String)command.opts.get("search");
        }
        else {
            search = String.format("search index=%s *", indexName);
        }

Then, the application calls the Service.export method with the search query and arguments (which then calls the GET search/jobs/export endpoint in the Splunk REST API). The return value is an InputStream object that contains the search results, which are then written to the export.out file.

InputStream is = service.export(search, args);
 

A recovery export

When the recover option is used, the application reviews the most recent events in the export.out file to find a good place to restart the process. The application does this by:

  • Gathering a buffer's worth of data from the end of the file.
  • Find the starting point of a set of events by finding a change in the timestamp, as defined in getEventStart. This process branches by format (CSV, JSON, or XML) because the events need to be parsed accordingly. However, the basic process is the same for each format.
  • When a change in time is found between events, the file is truncated at that point and the remaining data is discarded. The timestamp of the first discarded event is then used as the start time for the recovery search query.
  • If a change in time is not found, another buffer's worth of data is read, and the process is repeated.

Let's use a simplified example to illustrate this process. Let's say we have an incomplete export file that contains 100 events, and the application buffer reads 10 events at a time. To recover this export, the application reads the last 10 events from the end of our export file (events 90 to 99). If events 90 to 93 have the same timestamp but event 94 has a different one, the file is truncated between events 93 and 94. The export process is resumed, starting with the earliest time set to the timestamp of event 94.

Here's the code that performs the basic recover process, but be sure to review the entire Export code file for all the details.

        if (recover && file.exists() && file.isFile()) {
            final int bufferSize = (64*1024);
            RandomAccessFile raf = new RandomAccessFile(file, "rw");
            long fptr = Math.max(file.length() - bufferSize, 0);
            long fptrEof = 0;
            while (fptr > 0) {
                byte [] buffer = new byte[bufferSize];
                raf.seek(fptr);
                raf.read(buffer, 0, bufferSize);
                int eventStart = getEventStart(buffer, format);
                if (eventStart != -1) {
                    fptrEof = nextEventOffset + fptr;
                    break;
                }
                fptr = fptr - bufferSize;
            }
            if (fptr < 0)
                fptrEof = 0;
            else {
                args.put("latest_time", lastTime);
                addEndOfLine = true;
            }
            FileChannel fc = raf.getChannel();
            fc.truncate(fptrEof);
        }