Cookieless User Tracking

November 30, 2013

A look into various methods of online user tracking without cookies.


Over the past few months, in my free time, I have been researching various methods for cookieless user tracking. I have a previous article that talks on how to write a tracking server which uses cookies to follow people between requests. However, recently browsers are beginning to disallow third party cookies by default which means developers have to come up with other ways of tracking users.

Browser Fingerprinting

You can use client side javascript to generate a browser fingerprint, or, a unique identifier for a specific users browser (since that is what cookies are actually tracking). Once you have the browser’s fingerprint you can then send that id along with any other requests you make.

var user_id = generateBrowserFingerprint();
document.write(
    '<script type="text/javascript" src="/track/user/"' + user_id + '></ sc' + 'ript>'
);

Local Storage

Newer browsers come equipped with a feature called local storage , which is used as a simple key-value store accessible through javascript. So instead of relying on cookies as your persistent storage, you can store the user id in local storage instead.

var user_id = localStorage.getItem("user_id");
if(user_id == null){
    user_id = generateNewId();
    localStorage.setItem("user_id", user_id);
}
document.write(
    '<script type="text/javascript" src="/track/user/"' + user_id + '></ sc' + 'ript>'
);

This can also be combined with a browser fingerprinting library for generating the new id.

ETag Header

There is a feature of HTTP requests called an ETag Header which can be exploited for the sake of user tracking. The way an ETag works is when a request is made the server will respond with an ETag header with a given value (usually it is an id for the requested document, or maybe a hash of it), whenever the bowser then makes another request for that document it will send an If-None-Match header with the value of ETag provided by the server last time. The server can then make a decision as to whether or not new content needs to be served based on the id/hash provided by the browser.

As you may have figured out, instead we can assign a unique user id as the ETag header for a response, then when the browser makes a request for that page again it will send us the user id.

This is useful, except for the fact that we can only provide a single id per user per endpoint. For example, if I use the urls /track/user and /collect/data there is no way for me to get the browser to send the same If-None-Match header for both urls.

Example Server

from uuid import uuid4
from wsgiref.simple_server import make_server


def tracking_server(environ, start_response):
    user_id = environ.get("HTTP_IF_NONE_MATCH")
    if not user_id:
        user_id = uuid4().hex

    start_response("200 Ok", [
        ("ETag", user_id),
    ])
    return [user_id]


if __name__ == "__main__":
    try:
        httpd = make_server("", 8000, tracking_server)
        print "Tracking Server Listening on Port 8000..."
        httpd.serve_forever()
    except KeyboardInterrupt:
        print "Exiting..."

Redirect Caching

Redirect caching is similar in concept to the the ETag tracking method where we rely on the browser cache to store the user id for us. With redirect caching we have our tracking url /track/, when someone goes there we perform a 301 redirect to /<user_id>/track. The users browser will then cache that 301 redirect and the next time the user goes to /track it will just go to /<user_id>/track instead.

Just like the ETag method we run into an issue where this method really only works for a single endpoint url. We cannot use it for an end all be all for tracking users across a site or multiple sites.

Example Server

from uuid import uuid4
from wsgiref.simple_server import make_server


def tracking_server(environ, start_response):
    if environ["PATH_INFO"] == "/track":
        start_response("301 Moved Permanently", [
            ("Location", "/%s/track" % uuid4().hex),
        ])
    else:
        start_response("200 Ok", [])
    return [""]


if __name__ == "__main__":
    try:
        httpd = make_server("", 8000, tracking_server)
        print "Tracking Server Listening on Port 8000..."
        httpd.serve_forever()
    except KeyboardInterrupt:
        print "Exiting..."

A project worth noting is Samy Kamkar’s Evercookie which uses standard cookies, flash objects, silverlight isolated storage, web history, etags, web cache, local storage, global storage… and more all at the same time to track users. This library exercises every possible method for storing a user id which makes it a reliable method for ensuring that the id is stored, but at the cost of being very intrusive and persistent.

Other Methods

I am sure there are other methods out there, these are just the few that I decided to focus on. If anyone has any other methods or ideas please leave a comment.

References

comments powered by Disqus