How to: Remove Google Analytics Spam

Do those look familiar to you?

  • get-free-social-traffic.com
  • floating-share-buttons.com
  • www.event-tracking.com
  • site8.free-floating-buttons.com
  • video–production.com
  • sexyali.com

I’m certain you’ve noticed such referrals in your Google Analytics profile:

analytics referral spam

Today we will see what, why and how this “spam” happens. And how to clean up your Google Analytics profile by removing referral, event and search term spam.

What are we talking about here?

This is one of the spam tactics which plays on webmaster’s and marketer’s curiosity of which websites bring them traffic.

If we would to simplify the basic referrer spam it would go something like this:

  1. I make a link from my website to your website
  2. Click through that link a multiple times
  3. Your Google Analytics profile will see my website as referral. You will be interested how people clickthrough from it and visit it.
  4. ???
  5. Profit.

Why they are doing it?

To get traffic.

Most spammers just sell some low quality webmaster-targeted products. Some are affiliates that will refer you to popular websites like AliExpress.com (Asian Amazon) in hopes that you will convert to buyer immediately or later and they will get a commission.

How they are doing it?

Notice that for you to push data to your Analytics profile, no special authorization is necessary. All you need is profile’s “UA” Tracking ID, which looks like this: UA-124214-1

And it can be easily extracted from your website’s source.

Now, with popularization of Apps, there was a need to move Google Analytics beyond websites. Universal Analytics was born.

One of its features is Measurement Protocol, which:

“… allows developers to make HTTP requests to send raw user interaction data directly to Google Analytics servers”.

This is basically a robust Google Analytics API. With it you can track anything, including offline events and tie them to user’s website or app behavior. All tracking is done server-side. Great stuff.

But this had a side effect. You can automate the heck out spamming and perform it in bulk.

This is how the process goes for the most advanced spammers:

  1. Make a Measurement Protocol request which mimics website hit from a defined referral
  2. Send is as a HTTP request from a server
  3. Repeat with next victim’s “UA” Tracking ID
  4. ???
  5. Profit.

But, the manipulations could be done for any visitor data. So step 1 could be: Event, Spam search term and even user’s browser.

And keep in mind that you don’t even need something special to come up with “UA” Tracking IDs. Google Analytics just uses progressive numbers.

So in theory, you could just hit all the existing profiles, even though most of them are abandoned. Because why not, we’re spammers anyway.

This is how fast you can make event 100 hits, all to different profiles:

analytics measurement protocol

It’s 2 lines of code in a loop using Universal Analytics for Python library.

And imagine running this on a server 24/7, or on multiple servers. Programmed by someone who knows what they’re doing.

Since most spam is done server-side, we see a caveat – most spammers don’t define which “hostnames” send the data.

Hostname of a visit exists to show what web property was used to register the hit. Basically most of your hits would arrive from your domain’s hostname. You can also see some hits from Google Translate hosts and from your local development hosts if you use any during development (127.0.0.1 or localhost).

Sample referral spam hits from undefined hostnames:

analytics spam

So, by only including Google Analytics data only from known hosts, we can easily eliminate most spam.

We can then add a few more exclusion filters to remove more advanced spammers who define clever hostnames, or have more physical robots (which actually visit your website).

Setting up Google Analytics views to filter out spam

First, make sure you or someone else didn’t try to combat spam using “Referral Exclusion List” under “Property → Tracking Info → Referral Exclusion List“:

analytics referral exclusion

This is wrong. Your should only use this when you want to remove referral information from a domain, for example when your payment processor redirects people back to the purchase. Or your “support.” sub-domain, for example.

Next thing on your to-do list will be to create a new view and call it “Unfiltered”.

We will keep this one to fall back on if we suspect that filters on our main profile are wonky.

google analytics

Assign the same settings to it as to your main view (timezone, currency, enable Ecommerce etc.), make sure “Exclude all hits from known bots and spiders” is unchecked in settings.

Including only known hostnames

Now that we have a backup view, let’s add a filter which will only allow Analytics hits from known hosts.

Go to your main View, “Audience → Technology → Network“, change primary dimension to “Hostname”, and you’ll see something like this:

analytics hostnames

Breaking it down:

  1. Your own hostname.
  2. (not set) for server side tracks that don’t define a hostname. Most spam in my case.
  3. Hits from people using Google Translate for translations.
  4. Hits from my “development” environment – when I run my website on my computer.
  5. Spammers that define hostnames. HULFINGTONPOST is my favorite.

Keep in mind that your case may have more hostname hits similar to Google Translate, if you see substantial number that doesn’t look like spam, check it out. Perhaps it’s worth including it in the filter.

Important: if you are using server side tracking for some events, some of the (not set) events will be yours. I suggest you either add hostname to context of a server side tracking call or add (not set) along with other allowed hostnames to filter.

Ok, new let’s setup hostname filter. Go to “Admin”, select your main view and go to “Filters”. Click “+ NEW FILTER”, then configure it:

analytics hostname filter

  1. Type: Custom
  2. “Include”, as we will only be including needed hostnames.
  3. Field: “Hostname”
  4. Filter pattern: regular expressions work here. You should add all your domain, sub-domains and other valid hostnames
  5. Verify & save your filter.

By including only known hostnames in our reporting we will eliminate most spam, but not all. So let’s proceed.

Referral spam

Some spammers cleverly define hostnames of their target website.

analytics referral spam

For this will will just exclude them as referrals. And we will be adding more as we discover new ones.

Keep in mind that filters won’t work retroactively, so check for spam referrals for periods after you’ve applied hostname inclusion filter. Or go to Hostnames, pick your hostname, check referrals and find the fishy ones.

And here’s the filter:

analytics referral filter

  1. Type: Custom
  2. “Exclude”.
  3. Field: “Campaign Source”
  4. Filter pattern: add all spam referral domains that are left. Use “|” between domains, don’t use spaces.
  5. Verify & save your filter.
Search term (Keyword) spam

Some spam started appearing in organic search terms recently.

analytics keyword spam

And here is a filter to remove those.

analytics search filter

  1. Type: Custom
  2. “Exclude”.
  3. Field: “Campaign Term”
  4. Filter pattern: add spam keywords, separate using “|”.
  5. Verify & save your filter.
Events spam

Most spam event would be filtered by including only known hosts, like this one:

analytics spam event

analytics spam event

But still, if you see event’s spam, here is a sample filter to filter them out.

analytics events filter

  1. Type: Custom
  2. “Exclude”.
  3. Field: “Event Action” (or Category)
  4. Filter pattern: add contents of the event.
  5. Verify & save your filter.

That’s it, enjoy your Google Analytics view without spam data.

Don’t forget to update your filters if you notice new spam referrals, events or search terms coming through.