February 4th, 2008
The SSL Performance Odyssey
When you come to dev.splunk.com, you see pictures of beer pong, full bars, stuffed ponies with fart machines taped to their ass, etc - basically engineers gone wild. Somewhere between all of this insaneness, we actually find the time to write code and solve problems like this one.This post is all about a crazy-weird performance issue that we were experiencing, how it manifested itself and ultimately how it was fixed.
I suspect others may be having this problem, as the problem lives in some very popular open source code as far as I can tell. With that, I’ll begin telling you about my journey into hell.
Splunk has a home grown embedded HTTP(S) server that serves up all external interfaces to the ’splunkd’ daemon. We use it as the core engine for our REST and XML/RPC-like API’s. The GUI and the CLI both end up talking to the daemon via this server.
When I wrote the core of it a few months ago, I ran some rudimentary performance tests on several platforms and it seemed decent enough for our use, but a week ago, the manager of the Search and Indexing team (Stephen) said that he was seeing abysmal performance using SSL. He said that the GUI performance was being impacted. I didn’t believe him and insisted that it was something else and that he was high.
So to prove to him that it wasn’t my server, or my problem like all engineers do, I gave him a small python script that hits the server in a tight loop and we checked the performance. It sucked. Continuing with the theme of “this isn’t my problem” - I told him it was probably the handler of the request that was doing something that made the server seem slow. This is when he laughed

