How to kill referrer-spam in Google Analytics

Ok, who isn’t annoyed by referrer spam? Constantly new domains in Google Analytics (and other tracking tools, like etracker) which supposetly send traffic to your website. Large sites with a lot of traffic might not mind those hundreds to thousands of faked “visitors” per month, as it won’t falsify the real numbers too much. But for smaller websites the effect might become that strong that you can barely have a correct view on the real data.

But how to prevent that this referrer spam shows up in your analytics reports?

Well, for that you have to understand how this referrer spam is generated. This happens two ways:

  1. by real “bot”-visits and
  2. by pure calling/executing of the tracking code

 

Referrer spam by real bot-visits

For this you have to know that the “referrer” is an information, which is stored by the browser and can be read by the tracking software. Which means that the page visite doesn’t necessarily have to come from that URL and of course you can fake it, and that’s exactly what these bots do. These bot are simply programs, which are programmed in such a way that they “visit” websites and then execute the implemented tracking code and leave the desired “referrer”. Of course this is a simplified description, but this shouldn’t become an how-to to create referrer-spam ;).

But how to stop this?

No more... referrer spam reports!Well, the simplest method is to deny access to the website to all visitors reporting a referrer from a known spam domain.

If you’re running an apache server (and for now vast majority of servers run on apache) you can block those visitors either by extending your .htaccess file (how to do that see below). For websites which are hosted on servers where you don’t have access to the apache configuration files, unfortunately this is also the only way.
But if you have the rights to modify the apache configuration files, so I recommend this method. What you have to do here and how you can automate adding new spam domains I’ve described below.
If your website is running on nginx the method is similar, but since nginx doesn’t know anthing similar to .htaccess files you have to modify the central configuration files of the nginx-server. Here you find the how-to for nginx-servers.

Referrer spam by pure calling/executing of the tracking code

In this “solution” the programs function in such a way that they simply “guess” the tracking-ID and – without actually visiting the website at all – send the corresponding tracking information directly to the servers of the tracking software. It is obvious that the above mentioned method doesn’t work here, since your website doesn’t get visited at all. To keep this type of referrer spam out of your analytics reports you have to take action within your analytics tool and filter out this spam traffic. In the meantime Google Analytics has integrated an option to filter known bots out. But Google is very, very hesitant to add new referrer spam domains and bots to it’s blacklist. Therefore there’s still a quite some referrer spam getting through.

How to filter out this referrer spam?

Referrer Spam Bots - We'll get you all!Well, at first we can make use of the fact that these bots don’t know the actual domain and therefore can’t tell the tracking software what URL they claim to be visiting right now.

So at first you create a filter that prevents that “pagevisits”, where your own domain is not in the called URL, get registered. To the how-to

Unfortuantely this still doesn’t filter every referrer spam out. So you can’t help but create filters which filter out the remaining “traffic” coming from referrer spam domains.

Since – at least at Google Analytics – the length of a filter is limited to 256 chars and there are in the meantime so many known spam domains, you have to set up (and maintain) about 25 filters. To make this a bit easier you can finde here the latest set of filter rules plus a list of filter rules which contain only those added by day.

 

How-to .htaccess

To block referrer spam in your .htaccess file you have to put the content of this file: htaccess.txt into your .htaccess-file:

  1. Download your .htaccess to your local PC with your favorite FTP-program – if you don’t see an .htaccess file in your FTP-program, you might have to activate “show hidden files” – if you don’t have an .htaccess file (very unlikely) then upload the above mentioned file into the root directory of your webserver and rename it to .htaccess (yes, with a leading . and without extension).
  2. Create a backup copy of the just downloaded .htaccess file
  3. Find the following line in your .htaccess file:
    RewriteEngine On

    and copy from the htaccess.txtr all line exactly underneath

    Shouldn’t there be a RewriteEngine On line in your .htaccess file (again very unlikely), so add the line

    RewriteEngine On

    at the very top of your .htaccess file and copy the complete content of the htaccess.txt file underneath.

  4. Upload the new, modified .htaccess file back onto your server
  5. Thoroughly test if everything works as expected – if not you can upload your backup copy back onto the server.

If you want to update the rules just download a new copy of htaccess.txt and replace the code between

RewriteEngine On

and

RewriteRule .* - [F].

 

How-to automatic update of the central apache config files

Requirements:

  1. You have root access to your server
  2. You have git installed on your server
  3. You know how to create a cron-job for root

Procedure:

  1. Log on to your server via ssh (i.E. with PuTTY) and get – if neccesary – with
    sudo

    root-rights.

  2. Change to /root/ directory – or another directory, which is not within the directory tree of any website and execute the following commands:
    git clone https://github.com/piwik/referrer-spam-blacklist
    git clone https://github.com/mher30/referrer-spam
    
  3. execute in the referrer-spam directory the command
    ./cron.sh

    once

  4. now you can copy the the file apache-spamblock.conf into the configuration directory of your apache webserver (usually /etc/apache2) and
  5. include the apache-spamblock.conf in your virtual server configurations
    (example code):
<VirtualHost *:80>
ServerName www.beispiel-domain.de
ServerAlias beispiel-domain.de
ServerAdmin admin@beispiel-domain.de

Include apache-spamblock.conf
...
</VirtualHost>

 

  • to automate this create a cron-job (i.e. in the /etc/cron.daily directory), which then
    1. changes into the above mentioned referrer-spam directory
    2. then calls the cron.sh there
    3. and then copies the new apache-spamblock.conf into the apache configuration directory
    4. checks if the the apache configuration ist still correct/valid – there’s always the chance that a something goes wrong 😉
    5. and, if everything’s allright reloads the new configuration

    If you don’t know, how to program this, then ask someone, who is proficient with shell scripting, to do this for you.
    There are so many different installation- and configuration variants, that one can’t program this in such a way that it works everywhere – Well, OK, at least I can’t 😉

  • Finally: test, test, test!!
  • Done

 

How-to nginx configuration

Put the nginx configuration file into the global nginx configuration directory (probably /etc/nginx/global) and include it into your server configuration:

...
server {
listen 443;
server_name dein-server-name-hier.de;

include /etc/nginx/global/*;
...

How-to Google-Analytics filter

(Disclaimer: The screenshots are from a german backend – I’ll try to make extra screenshots from an english backend – also all option and field names are my best guess and might vary from the real option and field names).
Keep in mind: At google-analytics filters exclude traffic before it gets recorded, which means that traffic once filtered out is lost forever. Therefore: Always keep the standard data view (“all website data”) without any filters! And use additional data views for filtering.

This is how to create a new data-view:

Choose in the Google Analytics main menu the option “Manage”:
Google Analytics Menü Verwalten
Then choose in the right column “DATA VIEW” the option : “Create new data view”:
Google Analytics Neue Datenansicht erstellen
Next you have to name your new data view. Here you enter – according to our planned filtering – “Referrer Spam Filtered” (keep the preselected option to filter a website) and select the correct timezone (in this example: “Germany”):
Google Analytics Neue Datenansicht einrichten
Next you make one change to “Settings of the data view”:
Google Analytics - Einstellungen der Datenansicht
by activating that Google should filter out hits from known bots and spiders.
Google Analytics - Bots herausfiltern
As already mentioned above, Google is very hesitant adding new bots and spiders to it’s blacklist, so you can’t help and define some more filters yourself. So choose “Filter”:
Google Analytics Filter
and create your first filter right away:
Google Analytics Filter hinzufügen
At first you filter all those “ghost visits”, where the bot doesn’t “report” visiting an URL from your domain, which it mostly can’t since the bot just “blindly” calls/executes your tracking code without knowing on which domain it’s integrated. So it’s best to name the filter accordingly: “Ghost Visits Filter”. Select “user defined” and set up the following:
Include: You want to track only those hits, where the “hostname” reported by the browser contains your domain:
Google Analytics - Ghost Visits Filter Einstellungen
(the \ in front of the . should not be ignored, since Google interprets this entry as a regular expression and within regular expressions a period . stand for any character – so write all dots/periods always as \. – which tells Google a dot and nothing but a dot should be here!)
If your site has been molested by ghost visits within the last 7 days you can check immediately if your filter will work in the future, by clicking “check filter”:
Google Analytics - Filter überprüfen
Then you’ll get a result similar to this (just probably with a lot more lines):
Google Analytics - Ergebnis Filter Überprüfung

If you blocked the bots on your server by one of the above mentioned methods, you should have eliminated about 95% of all referrer spam hits by now.
The only ones still coming through are bots which combine the above methods, by producing ghost visits but sending the right hostname, too.
These you can just control by explicitly filtering them within Google Analytics. As of today you need 25 more filter rules.
A file with the current filterrules you can download right here as well.
For every line in this file you have to create one filter:
This time you want to exclude hits, precisely all those where the “campaign source” is one of the domains mentioned in the filter rule:
Google Analytics Referrer Filter
You have to repeat this now for all lines in the file filters.txt… unfortunately there is no shortcut for that, and if you have multiple domains, you have to repeat this for all as there doesn’t exist an export/import function for filters. 🙁

Now you just have to check regularly on https://www.mher.de/referrer-spam, if there are new spambot domains, and if so, update your configuration files accordingly…

and…

pulp-fiction-referrer-spam

Good Luck!

P.S.: All these instructions and files come with absolutely no guarantee! They just describe what worked for me! Your setting might differ from mine so this actually could break things on your side (not likely, but also not impossible) so apply everything on your own risk!