Tag Archives: settings

Get Bunk Referrer Semalt OUT of Your GA Stats

Ugh, semaltI help several small & local businesses get their web tools up and running, and few things are more disappointing than when garbage like “semalt” shows up in the results.

Semalt claims to be an SEO and Analytics tool, but word around the web suggests they are, at best, a cause of referral spam.

Referral spam, in general, is when a robot (think the Googlebot, a site crawler robot that helps index your site in Google’s search results) is sent to check out your website, but pretends to be a person by using falsified referrer header data.

bots difference

 

This is typically done via falsifying the Referrer and  User Agent parts of the header data.

The User Agent header tells the website and tools like Google Analytics what the visitor used to access the site.

Here’s a User Agent example from Chrome running on Windows 8:

Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.93 Safari/537.36

Googlebot, your friendly neighborhood web crawler and search indexer, identifies itself with User Agents that look like this:

  • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
  • Googlebot/2.1 (+http://www.googlebot.com/bot.html)
  • Googlebot/2.1 (+http://www.google.com/bot.html)

(Source : http://www.useragentstring.com/pages/Googlebot/)

You probably get a fair amount of referral spam without even realizing it! Google Analytics will only show/capture traffic data from visitors that can execute javascript, and use cookies (those tiny files that hold bits of data a server can use to recognize you later). If odd bots are probing your site, even ones with fake human credentials, but can’t execute javascript or use cookies.

The reason that Semalt appears in your Google Analytics? It pretends to be a human better than most bots.

Apologies to anyone with arachnophobia.
Totally legit. Source: http://imgur.com/gallery/UceXs6C

 

It’s up in your business, pretending to be a real visitor and possibly perpetuating further spam using your site. At minimum, it’s mucking up your data.

How to handle this?

You can deal with Semalt and bot traffic in one of three ways (or all three, honestly).

Option 1: Use GA’s autobot blocker

No it’s not going to prevent the Transformers from finally showing up to give you the birthday you’ve wanted since you were five.

Check your settings!

Go to the View settings for your website’s GA account. Check the Bot Filtering box. Save settings.

This option uses the IAB/ABC International Spiders & Bots list to help clean and press your traffic data against known robots.

If you decide to use this feature, you should make sure to create a “Raw Data” View without this – or any other – filter in place. This will give you a point of comparison in the event your data looks really unusual after implementing a major filter change.

Because it’s faking coming from a real browser and can execute javascript, this probably won’t block Semalt. But it is a good way to prevent known bots from blowing up your stats.

Option 2: Create a filter

Creating Filters in Google Analytics sounds a little scary, but it’s not.

Step 1: Go to the Filters section in your Admin area Under View and select +New FilterCreate a new filter

Step 2: Name your Filter. Something like “Semalt” will do just fine.
2015-02-03_14-43-32
Step 3: Click on Custom
Step 4: Select Referral from the Filter Field list
2015-02-03_14-35-33
Step 5: Type semalt\.com into the Filter Pattern box. (The \ is important because this field uses Regular Expressions!)
2015-02-03_14-39-53Step 6: SAVE!

Note: If you’re using Filters you absolutely should be setting up a Raw Data profile! Making mistakes with Filters can throw your data off pretty badly, and having a Raw Data (aka no Filter) profile gives you a way to sanity check your data, and provides a backup in case of data lost to Filter mishaps.

This filter will not change your historical data, sorry. You’re stuck with your prior Semalt traffic in your reports. But from the time you make the Filter live onwards, Semalt as a Source/Referrer will no longer appear in your reports. This only cleans up your reports, it doesn’t keep them from crawling your site.

Option 3: Knock knock! who’s there? NOT YOU! (AKA the htaccess option)

So, you’ll need to actually be able to edit your site’s .htaccess file, and it helps to have a basic idea of how to use it (or at least not break it).

But the good news is, blocking them is as simple as adding the following to the bottom of your .htaccess:

SetEnvIfNoCase Referer semalt.com spammer=yes

Order allow,deny
Allow from all
Deny from env=spammer

(Source: WordPress Codex example)

If that’s not working  – and you can check that Raw Data profile you set up to figure out if it is or not  😉 – or you’re getting a bunch of new garbage, then it’s time to hit more .htaccess resources. Your web host may be able to help you out, too.

So that’s it!

For another Google Analytics Filter set to get rid of other bots using the “Mozilla Compatible Agent” browser User Agent, check out this Filter from LunaMetrics.