Why Benchmarking Tools Suck
October 22, 2012
A brief aside into why I think no benchmarking tool is exactly correct and why I wrote my own.
Benchmarking is (or should be) a fairly important part of most developers job or duty. To determine the load that the systems that they build can withstand. We are currently at a point in our development lifecycle at work where load testing is a fairly high priority. We need to be able to answer questions like, what kind of load can our servers currently handle as a whole?, what kind of load can a single server handle?, how much throughput can we gain by adding X more servers?, what happens when we overload our servers?, what happens when our concurrency doubles? These are all questions that most have probably been asked at some point in their career. Luckily enough there is a plethora of HTTP benchmarking tools to help try to answer these questions. Tools like, ab, siege, beeswithmachineguns, curl-loader and one I wrote recently (today), tommygun.
Every single one of those tools suck, including the one I wrote (and will probably keep using/maintaining). Why? Don’t a lot of people use them? Yes, almost everyone I know has used ab (most of you probably have) and I know a decent handful of people who use siege, but that does not mean that they are the most useful for all use cases. In fact they tend to only be useful for a limited set of testing. Ab is great if you want to test a single web page, but what if you need to test multiple pages at once? or in a sequence? I’ve also personally experienced huge performance issues with running ab from a mac. These scope issues of ab make way for other tools such as siege and curl-loader which can test multiple pages at a time or in a sequence, but at what cost? Currently at work we are having issues getting siege to properly parse and test a few hundred thousand urls, some of which contain binary post data.
On top of only really having a limited set of use cases, each benchmarking tool also introduces overhead to the machine that you are benchmarking from. Ab might be able to test your servers faster and with more concurrency than curl-loader can, but if curl-loader can test your specific use case, which do you use? Curl-loader can probably benchmark exactly what your trying to test but if it cannot supply the source load of what you are looking for, then how useful of a tool is it? What if you need to scale your benchmarking tool? How do you scale your benchmarking tool? What if you are running the test from the same machine as your development environment? What kind of effect will running the benchmarking tool itself have on your application?
So, what is the solution then? I think instead of trying to develop these command line tools to fit each scenario we should try to develop a benchmarking framework with all of the right pieces that we need. For example, develop a platform that has the functionality to run a given task concurrently but where you supply the task for it to run. This way the benchmarking tool does not become obsolete and useless as your application evolves. This will also pave the way for the tool to be protocol agnostic. Allowing people to write tests easily for HTTP web applications or even services that do not interpret HTTP, such as message queues or in memory stores. This framework should also provide a way to scale the tool to allow more throughput and overload on your system. Lastly, but not least, this platform should be lightweight and try to introduce as little overhead as possible, for those who do not have EC2 available to them for testing, or who do not have spare servers lying around for them to test from.
I am not saying that up until now load testing has been nothing but a pain and the tools that we have available to us (for free) are the worst things out there and should not be trusted. I just feel that they do not and cannot meet every use case and that I have been plighted by this issue in the past. How can you properly load test your application if you do not have the right load testing tool for the job?
So, I know what some might be thinking, “sounds neat, when will your framework be ready for me to use?” That is a nice idea, but if the past few months are any indication of how much free time I have, I might not be able to get anything done right away (seeing how I was able to write my load testing tool while on vacation). I am however, more than willing to contribute to anyone else’s attempt at this framework and I am especially more than willing to help test anyone else’s framework.
Side Note: If anyone knows of any tool or framework currently that tries to achieve my “goal” please let me know. I was unable to find any tools out there that worked as I described or that even got close, but I might not of searched for the right thing or maybe skipped over the right link, etc.