Lets Make a Metrics Beacon
June 22, 2014
Recently I wrote a simple javascript metrics beacon library. Let me show you what I came up with and how it works.
So, what do I mean by “javascript metrics beacon library”? Think RUM (Real User Monitoring) or Google Analytics, it is a javascript library used to capture/aggregate metrics/data from the client side and send that data to a server either in one big batch or in small increments.
For those who do not like reading articles and just want the code you can find the current state of my library on github: https://github.com/brettlangdon/sleuth
Before we get into anything technical, lets just take a quick look at an example usage:
<script type="text/javascript" src="//raw.githubusercontent.com/brettlangdon/sleuth/master/sleuth.min.js"></script>
<script type="text/javascript">
Sleuth.init({
url: "/track",
});
// static tags to identify the browser/user
// these are sent with each call to `url`
Sleuth.tag('uid', userId);
Sleuth.tag('productId', productId);
Sleuth.tag('lang', navigator.language);
// set some metrics to be sent with the next sync
Sleuth.track('clicks', buttonClicks);
Sleuth.track('images', imagesLoaded);
// manually sync all data
Sleuth.sendAllData();
</script>
Alright, so lets cover a few concepts from above, tags
, metrics
and syncing
.
Tags
Tags are meant to be a way to uniquely identify the metrics that are being sent
to the server and are generally used to break apart metrics. For example, you might
have a metric to track whether or not someone clicks an “add to cart” button, using tags
you can then break out that metric to see how many times the button has been pressed
for each productId
or browser or language or any other piece of data you find
applicable to segment your metrics. Tags can also be used when tracking data for
A/B Tests where you want to segment your
data based on which part of the test the user was included.
Metrics
Metrics are simply data points to track for a given request. Good metrics to record are things like load times, elements loaded on the page, time spent on the page, number of times buttons are clicked or other user interactions with the page.
Syncing
Syncing refers to sending the data from the client to the server. I refer to it as “syncing” since we want to try and aggregate as much data on the client side and send fewer, but larger, requests rather than having to make a request to the server for each metric we mean to track. We do not want to overload the Client if we mean to track a lot of user interactions on the site.
How To Do It
Alright, enough of the simple examples/explanations, lets dig into the source a bit to find out how to aggregate the data on the client side and how to sync that data to the server.
Aggregating Data
Collecting the data we want to send to the server isn’t too bad. We are just going
to take any specific calls to Sleuth.track(key, value)
and store either in
LocalStorage or in an object until we need
to sync. For example this is the track
method of Sleuth
:
Sleuth.prototype.track = function(key, value){
if(this.config.useLocalStorage && window.localStorage !== undefined){
window.localStorage.setItem('Sleuth:' + key, value);
} else {
this.data[key] = value;
}
};
The only thing of note above is that it will fall back to storing in this.data
if LocalStorage is not available as well we are namespacing all data stored in
LocalStorage with the prefix “Sleuth:” to ensure there is no name collision with
anyone else using LocalStorage.
Also Sleuth
will be kind enough to capture data from window.performance
if it
is available and enabled (it is by default). And it simply grabs everything it can
to sync up to the server:
Sleuth.prototype.captureWindowPerformance = function(){
if(this.config.performance && window.performance !== undefined){
if(window.performance.timing !== undefined){
this.data.timing = window.performance.timing;
}
if(window.performance.navigation !== undefined){
this.data.navigation = {
redirectCount: window.performance.navigation.redirectCount,
type: window.performance.navigation.type,
};
}
}
};
For an idea on what is store in window.performance.timing
check out
Navigation Timing.
Syncing Data
Ok, so this is really the important part of this library. Collecting the data isn’t
hard. In fact, no one probably really needs a library to do that for them, when you
just as easily store a global object to aggregate the data. But why am I making a
“big deal” about syncing the data either? It really isn’t too hard when you can just
make a simple AJAX call using jQuery $.ajax(...)
to ship up a JSON string to some
server side listener.
The approach I wanted to take was a little different, yes, by default Sleuth
will
try to send the data using AJAX to a server side url “/track”, but what about when
the server which collects the data lives on another hostname?
CORS can be less than
fun to deal with, and rather than worrying about any domain security I just wanted
a method that can send the data from anywhere I want back to whatever server I want
regardless of where it lives. So, how? Simple, javascript pixels.
A javascript pixel is simply a script
tag which is written to the page with
document.write
whose src
attribute points to the url that you want to make the
call to. The browser will then call that url without using AJAX just like it would
with a normal script
tag loading javascript. For a more in-depth look at tracking
pixels you can read a previous article of mine:
Third Party Tracking Pixels.
The point of going with this method is that we get CORS-free GET requests from any client to any server. But some people are probably thinking, “wait, a GET request doesn’t help us send data from the client to server”? This is why we will encode our JSON string of data for the url and simply send in the url as a query string parameter. Enough talk, lets see what this looks like:
var encodeObject = function(data){
var query = [];
for(var key in data){
query.push(encodeURIComponent(key) + '=' + encodeURIComponent(data[key]));
};
return query.join('&');
};
var drop = function(url, data, tags){
// base64 encode( stringify(data) )
tags.d = window.btoa(JSON.stringify(data));
// these parameters are used for cache busting
tags.n = new Date().getTime();
tags.r = Math.random() * 99999999;
// make sure we url encode all parameters
url += '?' + encodeObject(tags);
document.write('<sc' + 'ript type="text/javascript" src="' + url + '"></scri' + 'pt>');
};
That is basically it. We simply base64 encode a JSON string version of the data and send
as a query string parameter. There might be a few odd things that stand out above, mainly
url length limitations of base64 encoded JSON string, the “cache busting” and the weird
breaking up of the tag “script”. A safe url length limit to live under is around
2000
to accommodate internet explorer, which from some very crude testing means each reqyest
can hold around 50 or so separate metrics each containing a string value. Cache busting
can be read about more in-depth in my article again about tracking pixels
(http://brett.is/writing/about/third-party-tracking-pixels/#cache-busting), but the short
version is, we add random numbers and the current timestamp the query string to ensure that
the browser or cdn or anyone in between doesn’t cache the request being made to the server,
this way you will not get any missed metrics calls. Lastly, breaking up the script
tag
into “sc + ript” and “scri + pt” makes it harder for anyone blocking scripts from writing
script
tags to detect that a script tag is being written to the DOM (also an img
or
iframe
tag could be used instead of a script
tag).
Unload
How do we know when to send the data? If someone is trying to time and see how much time someone is spending on each page or wants to make sure they are collecting as much data as they want on the client side then you want to wait until the last second before syncing the data to the server. By using LocalStorage to store the data you can ensure that you will be able to access that data the next time you see that user, but who wants to wait? And what if the user never comes back? I want my data now dammit!
Simple, lets bind an event to window.onunload
! Woot, done… wait… why isn’t my data
being sent to me? Initially I was trying to use window.onunload
to sync data back, but
found that it didn’t always work with pixel dropping, AJAX requests worked most of the time.
After some digging I found that with window.onunload
I was hitting a race condition on
whether or not the DOM was still available or not, meaning I couldn’t use document.write
or even query the DOM on unload for more metrics to sync on window.onunload
.
In come window.onbeforeunload
to the rescue! For those who don’t know about it (I
didn’t before this project), window.onbeforeunload
is exactly what it sounds like
an event that gets called before window.onunload
which also happens before the DOM
gets unloaded. So you can reliably use it to write to the DOM (like the pixels) or
to query the DOM for any extra information you want to sync up.
Conclusion
So what do you think? There really isn’t too much to it is there? Especially since we only covered the client side of the piece and haven’t touched on how to collect and interpret this data on the server (maybe that’ll be a follow up post). Again this is mostly a simple implementation of a RUM library, but hopefully it sparks an interest to build one yourself or even just to give you some insight into how Google Analytics or other RUM libraries collect/send data from the client.
I think this project that I undertook was neat because I do not always do client side
javascript and every time I do I tend to learn something pretty cool. In this case
learning the differences between window.onunload
and window.onbeforeunload
as well
as some of the cool things that are tracked by default in window.performance
I
definitely urge people to check out the documentation on window.performance
.
TODO
What is next for Sleuth? I am not sure yet, I am thinking of implementing more ways of tracking data, like adding counter support, rate limiting, automatic incremental data syncs. I am open to ideas of how other people would use a library like this, so please leave a comment here or open an issue on the projects github page with any thoughts you have.