meta name=”robots” and robots.txt

While talking to bloggers and other webmasters, I find out that often the concepts for the meta-tag robots (meta name=”robots” …) and the robots.txt file are not understood and not seldom used in a way that hurts the rankings of the website.

Index, noindex, follow, nofollow and combinations

The meta-tag “robots” give you the ability to give the search-engines 2 “recommendations” – usually the major search-engines (google, bing, yandex) will follow your recommendation:

  • index or noindex
  • follow or nofollow

index/noindex

With index or rather noindex you “recommend” the search-engines to index this page or not to index this page. Standard is “index”, that means without an explicit noindex the search-engines will index this page.

On which pages does it make sense to use noindex?

Pages, that do not give the user any value if these pages would be the first the user sees, or pages you simply don’t want to show up in the search-engines search results, be it that they could rank above other (more important) pages and therefore “steal” their ranking, or you simply don’t want your website to be found by searching for the content of this page.

Examples:

  • Search results pages (especially the pages 2 to n if you use pagination)
  • Imprint, privacy policy, …
  • Login-Pages

follow/nofollow

With the follow or nofollow meta tag you recommend the search-engines to follow or not follow all links on this page. This means if they should add the URLs of these links to their crawling-queue or not and at the same time to consider or not consider these links when computing the “link-power” for the linked page. Again, follow is the default and doesn’t have to be set to let the search-engines “follow” the links.

On which pages does it make sense to set the meta tag robots to nofollow?

None!

But there is the another possibility  to use nofollow – on link-level.
Here you can set a single link to “nofollow”. The effects are exactly the ones described above.

For which links does it make sense to use nofollow?

First of all there is the statement by Google that all paid links have to be marked as nofollow. Paid links are not only links for which you explicitly received money, but also all kinds of advertising-banners, advertorials with links (to the customer) but also links for which you received any kind of “quid pro quo” (be it tangible or intangible assets).

Furthermore you can mark links with nofollow, if you don’t want that link to be considered as a “SEO-Recommendation” für the receiving website or webpage. I.e. a link to content of which you would like to explicitly dissociate yourself.

But you should never, ever put a nofollow tag on links pointing to internal/own pages, even if you marked them with noindex so they won’t show up in the search-engines search results. Since these pages will be crawled too (otherwise the search-engine robots couldn’t see the noindex meta tag) and so these pages can inherit the received link-power via their links to other “index”-pages.

robots.txt

What can you do with the robots.txt?

The robots.txt is a special text-file, with which you can forbid search-engines to crawl certain pages.

It is also possible to explicitly allow the the crawling of pages, which would have been forbidden to be crawled by another – more generally defined – rule. These “allow” directives are only honored/understood by Ask.com, Googlebot,Yahoo!Slurp and msnbot (bing).

Besides that you can also point the crawlers to an URL where they can find the XML-sitemap for your site. Again this is an addition to the standard which only Googlebot, Yahoo!Slurp, msnbot,Ask.com understand.

Sources:

http://www.robotstxt.org/robotstxt.html
https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt
https://en.wikipedia.org/wiki/Robots_Exclusion_Standard

Update 1-1-2019:

Even the Google Search Console reports pages that were blocked by the robots.txt file and still ended up in the index as a “website coverage problem”:

indexed though blocked by robots.txt
indexed though blocked by robots.txt

 

Dieser Beitrag ist auch verfügbar auf: German

Leave a Comment