Monthly Archives: April 2007

Confusion about rel=”nofollow” links, robots.txt files, and robots meta tags

It seems that some people are getting mixed signals about the difference between using the attribute/value pair of rel="nofollow" anchor links, disallow from robots.txt files, and the robots meta tags.

I’ll try to give an explanation with some examples to help clear the difference up.

Meta Tags

Those webmasters who have been using a robots meta tag know that if you tell a compliant (considerate?) spider or robot to ‘nofollow’ it means they should not follow any links that you have on your page. The meta tag goes in the head of your web page and might look something like this:

<meta name="robots" content="nofollow" />

You can take it a step further and ask the spider to not even index your page at all:

<meta name="robots" content="noindex, nofollow" />

You can indicate that you would like to be indexed or have your links followed, or not, or any combination. For example, these are all valid:

<meta name="robots" content="index, follow" />

<meta name="robots" content="noindex, follow" />

<meta name="robots" content="index, nofollow" />

<meta name="robots" content="noindex, nofollow" />

This is done on a page-by-page basis. In other words, each Web page would have a meta tag in the head of the document that might look something like this:

<head>
	<title>Some page on the Web</title>
	<meta name="robots" content="noindex, nofollow" />
</head>

Note that you are indicating your wishes here, and that robot spiders may or may not listen to your request.

There are other attribute values you can use. See the links for more reading.

Robots.txt

You can control how search spiders and robots index your site (or parts of it) by using an ASCII-encoded text (not HTML) file called robots.txt (case sensitive) in the root directory of your Web server.

This plain text file can define some simple guidelines for robots to use. For example, if you ask all robots (identified by a wildcard character of *) to not index your site at all (everything from the root of your server: /), your text file would look like this:

User-agent: *
Disallow: /

If you wanted all robots to index everything, you might try this:

User-agent: *
Allow: /

You could single out a single robot and ask it to do something link this:

User-agent: Googlebot
Disallow: /admin/

You can have several different rules for different robots. Again, not all robots will follow your requests.

Rel=”nofollow”

Here is where some of the confusion starts. Some people think that when you have a link on a page to another page, and you use the rel="nofollow" attribute/value pair, that search engine spiders will not follow this link.

Considering the name of the value (nofollow), plus the behaviour of the robots meta tag with nofollow, this seems like a logical assumption. However, it is false. Here’s why…

Back in 2005, several large search engines agreed that comment spam (comments in blogs, forums, etc with links to Web sites that existed only to drive traffic and were not really there are legitimate comments or links) was a serious problem. They came up with a plan to add an attribute to the (X)HTML anchor tag to help describe links that the site owner could not verify as being approved.

So, a normal link might look like this:

<a href="http://www.lanoie.com/index.html">Lanoie.com</a>

but if it was put there by a user in a comment block, the software could alter it to look like this:

<a href=http://www.lanoie.com/index.html rel="nofollow">Lanoie.com</a>

As links are often counted as part of the ranking of Web sites by search engines, the more links that link spammers can have their scripts automatically put in comment blocks, the more popular their sites would become in the search engine result pages (SERPs). The idea is that if a search engine spider sees a nofollow link, it will not use it for ranking algorithms. This does not mean that the spider will not follow the link and index the destination page, it just means that it won’t help with that page’s rank.

So that’s the theory. What happens in real life? That depends on the players in the game.

Yahoo, Microsoft, and Google all initially agreed in 2005 to respect this attribute with their spiders. Ask and several other search sites seem to be aware of it, too. The trick is that they are not all doing the same thing with it.

Some sites do not follow the link or index the destination page at all. Other spiders seem to follow the link and index the page, but not count it towards the rankings, while others seem blissfully unaware that it even exists and ignore the attribute entirely.

The end result is that, with all three of these tools, you are only giving your wishes and you have no guarantee that they will be followed.

Personally, the comment spam was so bad on this blog that I had to disable comments entirely.

Planning Your Site for Users and Search Engines

Choose a subject for your site

This might seem obvious to some but an effective web site is focused on one subject. If it’s your business web site, then it’s about your business, your products, your services, and what you can do for your customers. That’s the theme of your site and you should stick to it.

A focused site is easier to develop and maintain. It offers a clear signal to the user about what you are trying to do and offers a similar signal to search engines. Search engines consider a tightly focused site on one topic to be more valuable than a random smattering of topics, all other things being equal.

Focus on one topic per site

Sites that have mixed signals as to their purpose lack focus, efficiency, and clarity. Everything you do should be focused on one topic area. If you are running a pet store, your web site should be all about pet information, products and services. Talking about real estate, online gambling, and your favourite music might best be served on a separate web site, both from a user’s point of view as well as search engine ranking.

If a search engine sees that each and every page on a web site is about pets in one form or another, then the site is focused on pets and, therefore, might have some authority (combined with other factors) on the topic. If different pages follow wildly varied topics, then the site isn’t all about one thing and, therefore, is not an authority on any specific topic.

Which brings up a good point: if you keep your users in mind and design the site to give them the best overall experience you can, you will already be well ahead of others in making your site search engine friendly, as they have similar needs and goals.

One subject per page

Just like choosing an overall theme for your site to keep it focused on one purpose, any given page within the site should also be focused on one specific subject where everything is in sync.

When people type in keywords to search for topics, you need to carefully place similar words and phrases in your site.

Let us say that you have a web site about selling pet products. You might have several web pages, each with a specific topic. Let us try to imagine what the titles of some sections and pages might look like:

  • About Us
  • Canned Cat Food
  • Canned Dog Food
  • Cat Food
  • Cat Toys
  • Contact Us
  • Dog Food
  • Dog Toys
  • Dry Cat Food
  • Dry Dog Food
  • History
  • Home
  • Location
  • Pet Food
  • Pet Toys
  • etc.

Each separate web page should be about one specific thing and have the title, headings, and keywords match to give a strong impression to the user and search engines exactly what the purpose of each page is.

Storyboard your site

You need to brainstorm exactly what pages your site is going to have and how you want to organize them. You can organize them with fancy software tools, common office suites, or even on a paper napkin over coffee.

Some people visually sketch out the logic like an organization flow chart. That is sometimes called Storyboarding or flow charting the site.

Other people might list pages in groups related by topic. Do whatever works best for you but it is an important stage as it helps your visitors focus and find what they need quickly and easily.
This will also help you identify any orphan pages that do not fit the overall theme, as well as pages that might need to be merged or broken apart for better usability.

When you are organizing your pages, avoid forcing your users to click 20 levels deep to get to a page. Some search engines do shallow crawls (only a few levels deep) when your site is young and only do deep crawls after the site is more mature. Try to keep all content three or four clicks from your home page. This is best for users and certainly can help with spiders.
In the end, this step is a prerequisite for creating your web site navigation.

Creating a text site map of your site benefits both users and spiders. This is a page that lists all the major (and, if the site isn’t too large, minor) pages in one spot. It should show the organization of your site (information architecture) so that your users can easily focus on what they want by scanning the page with their eyes. This helps users to quickly visualize the content of your site without having to use a search form to find pages within your site.

Text site maps and search forms

Spiders cannot use forms to search your site, so your text site map can give it easy access to all areas for indexing. Sites that use content that is dynamically generated from a database (Content Management Systems, for example), may not be fully accessible to the search engines, as the spiders cannot themselves enter a term in a search box and hit the ‘Search’ button to search your site. Much of your content would be ‘dark’ (hidden or unteachable).

By having a site map and creating static text links deeper into the content, you can get much more of your site indexed to attract traffic. Site maps on small to medium sites help spiders index more thoroughly by pointing to all major areas of the site. Medium to larger sites might try using Google Site maps to ensure that the spiders crawl as much of your site as possible.
Make your site maps for humans first but do not forget about the search spiders.