Confusion about rel=”nofollow” links, robots.txt files, and robots meta tags

Posted by on April 19, 2007

It seems that some people are getting mixed signals about the difference between using the attribute/value pair of rel="nofollow" anchor links, disallow from robots.txt files, and the robots meta tags.

I’ll try to give an explanation with some examples to help clear the difference up.

Meta Tags

Those webmasters who have been using a robots meta tag know that if you tell a compliant (considerate?) spider or robot to ‘nofollow’ it means they should not follow any links that you have on your page. The meta tag goes in the head of your web page and might look something like this:

<meta name="robots" content="nofollow" />

You can take it a step further and ask the spider to not even index your page at all:

<meta name="robots" content="noindex, nofollow" />

You can indicate that you would like to be indexed or have your links followed, or not, or any combination. For example, these are all valid:

<meta name="robots" content="index, follow" />

<meta name="robots" content="noindex, follow" />

<meta name="robots" content="index, nofollow" />

<meta name="robots" content="noindex, nofollow" />

This is done on a page-by-page basis. In other words, each Web page would have a meta tag in the head of the document that might look something like this:

<head>
	<title>Some page on the Web</title>
	<meta name="robots" content="noindex, nofollow" />
</head>

Note that you are indicating your wishes here, and that robot spiders may or may not listen to your request.

There are other attribute values you can use. See the links for more reading.

Robots.txt

You can control how search spiders and robots index your site (or parts of it) by using an ASCII-encoded text (not HTML) file called robots.txt (case sensitive) in the root directory of your Web server.

This plain text file can define some simple guidelines for robots to use. For example, if you ask all robots (identified by a wildcard character of *) to not index your site at all (everything from the root of your server: /), your text file would look like this:

User-agent: *
Disallow: /

If you wanted all robots to index everything, you might try this:

User-agent: *
Allow: /

You could single out a single robot and ask it to do something link this:

User-agent: Googlebot
Disallow: /admin/

You can have several different rules for different robots. Again, not all robots will follow your requests.

Rel=”nofollow”

Here is where some of the confusion starts. Some people think that when you have a link on a page to another page, and you use the rel="nofollow" attribute/value pair, that search engine spiders will not follow this link.

Considering the name of the value (nofollow), plus the behaviour of the robots meta tag with nofollow, this seems like a logical assumption. However, it is false. Here’s why…

Back in 2005, several large search engines agreed that comment spam (comments in blogs, forums, etc with links to Web sites that existed only to drive traffic and were not really there are legitimate comments or links) was a serious problem. They came up with a plan to add an attribute to the (X)HTML anchor tag to help describe links that the site owner could not verify as being approved.

So, a normal link might look like this:

<a href="http://www.lanoie.com/index.html">Lanoie.com</a>

but if it was put there by a user in a comment block, the software could alter it to look like this:

<a href=http://www.lanoie.com/index.html rel="nofollow">Lanoie.com</a>

As links are often counted as part of the ranking of Web sites by search engines, the more links that link spammers can have their scripts automatically put in comment blocks, the more popular their sites would become in the search engine result pages (SERPs). The idea is that if a search engine spider sees a nofollow link, it will not use it for ranking algorithms. This does not mean that the spider will not follow the link and index the destination page, it just means that it won’t help with that page’s rank.

So that’s the theory. What happens in real life? That depends on the players in the game.

Yahoo, Microsoft, and Google all initially agreed in 2005 to respect this attribute with their spiders. Ask and several other search sites seem to be aware of it, too. The trick is that they are not all doing the same thing with it.

Some sites do not follow the link or index the destination page at all. Other spiders seem to follow the link and index the page, but not count it towards the rankings, while others seem blissfully unaware that it even exists and ignore the attribute entirely.

The end result is that, with all three of these tools, you are only giving your wishes and you have no guarantee that they will be followed.

Personally, the comment spam was so bad on this blog that I had to disable comments entirely.

Gordon Lanoie (18 Posts)

With a background in Computer Engineering Technology from Red River Community College, Gordon has been active on the digital scene since the mid-1990s. He currently works with all Lanoie clients to determine their needs and to develop affordable, effective online strategies for their businesses and organizations. With a background in teaching internet technology in the late 1990s at Winnipeg Technical College (formerly South Winnipeg Technical Center) and soon after at UWinnipeg PACE (formerly the Division of Continuing Education), Gordon has a broad background with the needs of students as well as administration in the education industry.


This entry was posted in XHTML on by .

About Gordon Lanoie

With a background in Computer Engineering Technology from Red River Community College, Gordon has been active on the digital scene since the mid-1990s. He currently works with all Lanoie clients to determine their needs and to develop affordable, effective online strategies for their businesses and organizations. With a background in teaching internet technology in the late 1990s at Winnipeg Technical College (formerly South Winnipeg Technical Center) and soon after at UWinnipeg PACE (formerly the Division of Continuing Education), Gordon has a broad background with the needs of students as well as administration in the education industry.