PDA

View Full Version : robots.txt use with vBulletin


esllou
Thu 3rd Mar '05, 7:18pm
I want to use robots.txt to keep search engine spiders out of my whole forum...except for the archive.

Is this doable...and has anyone done it? I don't even need to allow the spiders onto the main index.php forum homepage, just have links into the archive from my main site.

I want to do this to prevent some of the duplicate content issues that vB currently has with multiple url's leading to same pages (well documented problems)

So, question is...is this a quick and easy fix, are there any potential problems (would only allow Mediapartners in for Adsense reasons) and what would I need to put in my robots.txt

Many thanks in advance.....sorry if I should have put this on a better forum, but is quite a general issue.

Jake Bunce
Thu 3rd Mar '05, 9:54pm
Here is a guide I found in a Google search:

http://www.searchengineworld.com/robots/robots_tutorial.htm

esllou
Thu 3rd Mar '05, 10:24pm
thanks for that Jake.

I also wanted to know if it was a viable policy....keeping spiders out of the main forums and restricting them only to the archives.

esllou
Sat 5th Mar '05, 7:15am
Don't you hate people who bump their own threads!

sorry!....no-one tell me anything about this issue?

splooge
Sat 5th Mar '05, 1:43pm
Yes, it's a viable policy, just note that not all spiders will honor the robots.txt file.

esllou
Sat 5th Mar '05, 1:54pm
yes, I understand that. I already have an .htaccess banning about 200 bad bots and scrapers.

I was talking mainly about the Big Three...G, msn and Y

They will obey the robots.txt more or less so I could control them. Will it actually achieve what I want? Only my archive pages will be in their indexes afterall. No duplicate content, no ? parameters at all.

would I just list out the "banned" pages in my robots.txt?

Disallow: forum/showtopic.php
Disallow: forum/search.php
Disallow: forum/showthread.php

and so on? Would it be that simple?

splooge
Sat 5th Mar '05, 2:08pm
Yes it's that simple. As far as I know the big search engines do honor the robots.txt file, though someone said they cache it, so don't expect to get immediate results.

esllou
Sun 6th Mar '05, 7:59am
is there any danger in following this policy? Is there something I am overlooking?

Could it hit my results in the long term or does it seem a sensible approach to the "not totally SEFriendly" problems vB has?

any feedback appreciated