A co-worker and I set out to each build our own http proxy cache.
One of them was written in Go and the other as a C++ plugin for
So, I know what most people are thinking: “Not another cache benchmark post,
with skewed or biased results.” But luckily that is not what this post is about;
there are no opinionated graphs showing that my favorite caching system happens
to be better than all the other ones. Instead, this post is about why at work we
decided to write our own API caching system rather than use Varnish
(a tested, tried and true HTTP caching system).
Let us discuss the problem we have to solve. The system we have is a simple
request/response HTTP server that needs to have very low latency (a few
milliseconds, usually 2-3 on average) and we are adding a third-party HTTP API
call to almost every request that we see. I am sure some people see the issue
right away, any network call is going to add at least a half to a whole millisecond
to your processing time and that is if the two servers are in the same datacenter,
more if they are not. That is just network traffic, now we must rely on the
performance of the third-party API, hoping that they are able to maintain a
consistent response time under heavy load. If, in total, this third-party API call
is adding more than 2 milliseconds response time to each request that our system
is processing then that greatly reduces the capacity of our system.
THE SOLUTION! Lets use Varnish. This is the logical solution, lets put a caching
system in front of the API. The content we are requesting isn’t changing very often
(every few days, if that) and it can help speed up the added latency from the API
call. So, we tried this but had very little luck; no matter what we tried we could
not get Varnish to respond in under 2 milliseconds per request (which is a main
requirement of solution we were looking for). That means Varnish is out, the next
solution is to write our own caching system.
Now, before people start flooding the comments calling me a troll or yelling at me
for not trying this or that or some other thing, let me try to explain really why
we decided to write our own cache rather than spend extra days investing time into
Varnish or some other known HTTP cache. We have a fairly specific requirement from
our cache, very low and consistent latency. “Consistent” is the key word that really
matters to us. We decided fairly early on that getting no response on a cache miss
is better for our application than blocking and waiting for a response from the
proxy call. This is a very odd requirement and most HTTP caching systems do not
support it since it almost defeats their purpose (be “slow” 1-2 times so you can be
fast all the other times). As well, HTTP is not a requirement for us, that is,
from the cache to the API server HTTP must be used, but it is not a requirement
that our application calls to the cache using HTTP. Headers add extra bandwidth
and processing that are not required for our application.
So we decided that our ideal cache would have 3 main requirements:
1. Must have a consistent response time, returning nothing early over waiting for a proper response
2. Support the Memcached Protocol
3. Support TTLs on the cached data
This behavior works basically like so: Call to cache, if it is a cache miss,
return an empty response and queue the request to a background process to make the
call to the API server, every identical request coming in (until the proxy call
returns a result) will receive an empty response but not add the request to the
queue. As soon as the proxy call returns, update the cache and every identical call
coming in will yield the proper response. After a given TTL consider the data in
the cache to be old and re-fetch.
This was then seen as a challenge between a co-worker,
Dan Crosta, and myself to see who
can write the better/faster caching system with these requirements. His solution,
entitled “CacheOrBust”, was a
Kyoto Tycoon plugin
written in C++ which simply used a subset of the memcached protocol as well as some
background workers and a request queue to perform the fetching. My solution,
Ferrite, is a
custom server written in Go
(originally written in C) that has the same functionality (except using
rather than background workers and a queue). Both servers used
as the underlying caching data structure.
So… results already! As with most fairly competitive competitions it is always a
sad day when there is a tie. Thats right, two similar solutions, written in two
different programming languages yielded similar results (we probably have
Kyoto Cabinet to thank). Both of our caching systems were able to yield us the
results we wanted, consistent sub-millisecond response times, averaging about
.5-.6 millisecond responses (different physical servers, but same datacenter),
regardless of whether the response was a cache hit or a cache miss. Usually the
morale of the story is: “do not re-invent the wheel, use something that already
exists that does what you want,” but realistically sometimes this isn’t an option.
Sometimes you have to bend the rules a little to get exactly what your application
needs, especially when dealing with low latency systems, every millisecond counts.
Just be smart about the decisions you make and make sure you have sound
justification for them, especially if you decide to build it yourself.