Third Party Tracking Pixels

May 3, 2013

An overview of what a third party tracking pixel is and how to create/use them.

So, what exactly do we mean by “third party tracking pixel” anyways? Lets try to break it down piece by piece:

Tracking Pixel:

A pixel referes to a tag that is placed on a site that offers no merit other than calling out to a web page or script that is not the current page you are visiting. These pixels are usually an html script tag that point to a javascript file with no content or an img tag with a empty or transparent 1 pixel by 1 pixel gif image (hence the term “pixel”). A tracking pixel is the term used to describe a pixel that calls to another page or script in order to provide it information about the users visit to the page.

Third Party:

Third party just means the pixel points to a website that is not the current website. For example, Google Analytics is a third party tracking tool because you place scripts on your website that calls and sends data to Google.

What is the point?

Why do people do this? In the case of Google Analytics people do not wish to track and follow their own analytics for their website, instead they want a third party host to do it for them, but they need a way of sending their user’s data to Google. Using pixels and javascript to send the data to Google offers the company a few benefits. For starters, they do not require any more overhead on their servers for a service to send data directly to Google, instead by using pixels and scripts they get to off load this overhead onto their users (thats right, we are using our personal computers resources to send analytical data about ourselves to Google for websites that use Google analytics). Secondly, the benefit of using a tracking pixel that runs client side (in the user’s browser) we are allowed to gather more information about the user. The information that is made available to us through the use of javascript is far greater than what is given to our servers via HTTP Headers.

How do we do it?

Next we will walk through the basics of how to create third party tracking pixels. Code examples for the following discussion can be found here. We will walk through four examples of tracking pixels accompanied by the server code needed to serve and receive the pixels. The server is written in Python and some basic understanding of Python is required to follow along. The server examples are written using only standard Python wsgi modules, so no extra installation is needed. We will start off with a very simple example of using a tracking pixel and then each example afterwards we will begin to add features to the pixel.

Simple Example

For this example all we want to accomplish is to have a web server that returns HTML containing our tracking pixel as well as a handler to receive the call from our tracking pixel. Our end goal is to serve this HTML content:

<html>
  <head></head>
  <body>
    <h2>Welcome</h2>
    <script src="/track.js"></script>
  </body>
</html>

As you can see, this is fairly simple HTML; the important part is the script tag pointing to “/track.js”, this is our tracking pixel. When the user’s browser loads the page this script will make a call to our server, our server can then log information about that user. So we start with a wsgi handler for the HTML code:

def html_content(environ, respond):
    headers = [('Content-Type', 'text/html')]
    respond('200 OK', headers)
    return [
        """
        <html><head></head><body>
        <h2>Welcome</h2><script src="/track.js"></script>
        </body></html>
        """
    ]

Next we want to make sure that we have a handler for the calls to “/track.js” from the script tag:

def track_user(environ, respond):
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    prefixes = ['PATH_', 'HTTP', 'REQUEST', 'QUERY']
    for key, value in environ.iteritems():
        if any(key.startswith(prefix) for prefix in prefixes):
            print '%s: %s' % (key, value)
    return ['']

In this handler we are taking various information about the request from the user and simply printing it to the screen. The end point “/track.js” is not meant to point to actual javascript so instead we return back an empty string. When this code runs you should see something like the following:

brett$ python tracking_server.py
Tracking Server Listening on Port 8000...
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET / HTTP/1.1" 200 89
HTTP_REFERER: http://localhost:8000/
REQUEST_METHOD: GET
QUERY_STRING:
HTTP_ACCEPT_CHARSET: ISO-8859-1,utf-8;q=0.7,*;q=0.3
HTTP_CONNECTION: keep-alive
PATH_INFO: /track.js
HTTP_HOST: localhost:8000
HTTP_ACCEPT: */*
HTTP_USER_AGENT: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.31 (KHTML, like Gecko) Chrome/26.0.1410.65 Safari/537.31
HTTP_ACCEPT_LANGUAGE: en-US,en;q=0.8
HTTP_DNT: 1
HTTP_ACCEPT_ENCODING: gzip,deflate,sdch
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /track.js HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 20:03:21] "GET /favicon.ico HTTP/1.1" 204 0

You can see in the above that first the browser makes the request “GET /” which returns our HTML containing the tracking pixel, then directly afterwards makes a request for “GET /track.js” which prints out various information about the incoming request. This example is not very useful as is, but helps to illustrate the key point of a tracking pixel. We are having the browser make a request on behalf of the user without the user’s knowledge. In this case we are making a call back to our own server, but our script tag could easily point to a third party server.

Add Some Search Data

Our previous, simple, example does not really provide us with any particularly useful information other than allow us to track that a user’s browser made the call to our server. For this next example we want to build upon the previous by sending some data along with the tracking pixel; in this case, some search data. Let us make an assumption that our web page allows users to make searches; searches are given to the page through a url query string parameter “search”. We want to pass that query string parameter on to our tracking pixel, which we will use the query string parameter “s”. So our requests will look as follows:

http://localhost:8000?search=my cool search
http://localhost:8000/track.js?s=my cool search

To do this, we simply append the query string parameter “search” onto our track.js script tag in our HTML:

def html_content(environ, respond):
    query = parse_qs(environ['QUERY_STRING'])
    search = quote(query.get('search', [''])[0])
    headers = [('Content-Type', 'text/html')]
    respond('200 OK', headers)
    return [
        """
        <html><head></head><body>
        <h2>Welcome</h2><script src="/track.js?s=%s"></script>
        </body></html>
        """ % search
    ]

For our tracking pixel handler we will simply print the value of the query string parameter “s” and again return an empty string.

def track_user(environ, respond):
    query = parse_qs(environ['QUERY_STRING'])
    search = query.get('s', [''])[0]
    print 'User Searched For: %s' % search
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    return ['']

When run the output will look similar to:

brett$ python tracking_server.py
Tracking Server Listening on Port 8000...
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /?search=my%20cool%20search HTTP/1.1" 200 110
User Searched For: my cool search
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /track.js?s=my%20cool%20search HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:24] "GET /favicon.ico HTTP/1.1" 204 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /?search=another%20search HTTP/1.1" 200 108
User Searched For: another search
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /track.js?s=another%20search HTTP/1.1" 200 0
1.0.0.127.in-addr.arpa - - [24/Apr/2013 21:35:34] "GET /favicon.ico HTTP/1.1" 204 0

Here we can see the two search requests made to our web page and the similar resulting requests to track.js. Again, this example might not seem like much but it proves a way of being able to pass values from our web page along with to the tracking server. In this case we are passing search terms, but we could also pass any other information along we needed.

Track User’s with Cookies

So now we are getting somewhere, our tracking server is able to receive some search data about the requests made to our web page. The problem now is we have no way of associating this information with a specific user; how can we know when a specific user searches for multiple things. Cookies to the rescue. In this example we are going to add the support of using cookies to assign each visiting user a specific and unique id, this will allow us to associate all the search data we receive with “specific” users. Yes, I say “specific” with quotes because we can only associate the data with a given cookie, if multiple people share a computer then we will probably think they are a single person. As well, if someone clears the cookies for their browser then we lose all association with that user and have to start all over again with a new cookie. Lastly, if a user does not allow cookies for their browser then we will be unable to associate any data with them as every time they visit our tracking server we will see them as a new user. So, how do we do this? When receive a request from a user we want to look and see if we have given them a cookie with a user id, if so then we will associate the incoming data with that user id and if there is no user cookie then we will generate a new user id and give it to the user.

def track_user(environ, respond):
    cookies = SimpleCookie()
    cookies.load(environ.get('HTTP_COOKIE', ''))

    user_id = cookies.get('id')
    if not user_id:
        user_id = uuid4()
        print 'User did not have id, giving: %s' % user_id

    query = parse_qs(environ['QUERY_STRING'])
    search = query.get('s', [''])[0]
    print 'User %s Searched For: %s' % (user_id, search)
    headers = [
        ('Content-Type', 'application/javascript'),
        ('Set-Cookie', 'id=%s' % user_id)
    ]
    respond('200 OK', headers)
    return ['']

This is great! Not only can we now obtain search data from a third party website but we can also do our best to associate that data with a given user. In this instance a single user is anyone who shares the same user id in their browsers cookies.

Cache Busting

So what exactly is cache busting? Our browsers are smart, they know that we do not like to wait a long time for a web page to load, they have also learned that they do not need to refetch content that they have seen before if they cache it. For example, an image on a web site might get cached by your web browser so every time you reload the page the image can be loaded locally as opposed to being fetched from the remote server. Cache busting is a way to ensure that the browser does not cache the content of our tracking pixel. We want the user’s browser to follow the tracking pixel to our server for every page request they make because we want to follow everything that that user does. When the browser caches our tracking pixel’s content (an empty string) then we lose out on data. Cache busting is the term used when we programmatically generate query string parameters to make calls to our tracking pixel look unique and therefore ensure that the browser follows the pixel rather than load from it’s cache. To do this we need to add an extra end point to our server. We need the HTML for the web page, along with a cache busting script and finally our track.js handler. A cache busting script will use javascript to add our track.js script tag to the web page. This means that after the web page is loaded javascript will run to manipulate the DOM to add our cache busted track.js script tag to the HTML. So, what does this look like?

var now = new Date().getTime();
var random = Math.random() * 99999999999;
document.write('<script type="text/javascript" src="/track.js?t=' + now + '&r=' + random + '"></script>

This script adds the extra query string parameters ”r” which is a random number and “t” which is the current timestamp in milliseconds. This will give us a unique enough request that will trick our browsers into ignoring anything that is has in it’s cache for track.js and forces it to make the request anyways. Using a cache buster requires us to modify the html we server slightly to server up the cache busting javascript as opposed to our track.js pixel.

<html>
  <head></head>
  <body>
    <h2>Welcome</h2>
    <script src="/buster.js"></script>
  </body>
</html>

And we need the following to serve up the cache buster script buster.js:

def cache_buster(environ, respond):
    headers = [('Content-Type', 'application/javascript')]
    respond('200 OK', headers)
    cb_js = """
            function getParameterByName(name){
                name = name.replace(/[\[]/, "\\\[").replace(/[\]]/, "\\\]");
                var regexS = "[\\?&]" + name + "=([^&#]*)";
                var regex = new RegExp(regexS);
                var results = regex.exec(window.location.search);
                if(results == null){
                    return "";
                }
                return decodeURIComponent(results[1].replace(/\+/g, " "));
            }

            var now = new Date().getTime();
            var random = Math.random() * 99999999999;
            var search = getParameterByName('search');
            document.write('<script src="/track.js?t=' + now + '&r=' + random + '&s=' + search + '"></script>');
            """
    return [cb_js]

We do not care very much if the browser caches our cache buster script because it will always generate a new unique track.js url every time it is run.

Conclusion

There is a lot of stuff going on here and probably a lot to digest so lets review quick what we have learned. For starters we learned that companies use tracking pixels or tags on web pages whose sole purpose is to make your browser call our to external third party sites in order to track information about your internet usage (usually, they can be used for other things as well). We also looked into some very simplistic ways of implementing a server whose job it is to accept tracking pixels calls in various forms.

We learned that these tracking servers can use cookies stored on your browser to store a unique id for you in order to help associate the data collected to you. That you can remove this association by clearing your cookies or by not allowing them at all. Lastly, we learned that browsers can cause issues for our tracking pixels and data collection and that we can get around them using a cache busting javascript.

As a reminder the full working code examples can be located at “https://github.com/brettlangdon/tracking-server-examples.