View Full Version : Prevent Robots from using bandwidth
Scott MacVicar
Fri 10th May '02, 12:30pm
Here is a nice simple way to stop the majority of robots spidering files that you don't want to or they shouldn't need to.
Place the following code in robots.txt and upload it to your domain root so when you go to http://forums.site.com/robots.txt you get this file
User-agent: *
Disallow: attachment.php
Disallow: avatar.php
Disallow: editpost.php
Disallow: member.php
Disallow: member2.php
Disallow: misc.php
Disallow: moderator.php
Disallow: newreply.php
Disallow: newthread.php
Disallow: online.php
Disallow: poll.php
Disallow: postings.php
Disallow: printthread.php
Disallow: private.php
Disallow: private2.php
Disallow: report.php
Disallow: search.php
Disallow: sendtofriend.php
Disallow: threadrate.php
Disallow: usercp.php
Disallow: /admin/
Disallow: /images/
Disallow: /mod/
This will stop them from trying to access files that won't have anything intresting to spider. Also stops them getting images which should save on bandwidth when you have lots of spiders on your forums.
Note: The majority of spiders check for these not all do.
hypedave
Fri 10th May '02, 2:30pm
wow this is cool, im gonna try this out
Scott MacVicar
Fri 10th May '02, 3:01pm
If you want to check, look at your error logs you'll probably see a few 404 errors for /robots.txt in the log. Its spiders like google looking for the file to see if you have any rules it has to follow like the ones posted.
hypedave
Fri 10th May '02, 3:40pm
cool, I just modified this to fit with my vbp directory
TObject
Fri 10th May '02, 6:22pm
For folks, whose board is in a subfolder, make sure that you do not forget to put that folder name, before the php files in the list. The robots.txt has to be in the "/" folder of the web site.
For example, http://www.vbulletin.com/robots.txt file, would look something like this:
User-agent: *
Disallow: /forum/attachment.php
Disallow: /forum/avatar.php
Disallow: /forum/editpost.php
Disallow: /forum/member.php
Disallow: /forum/member2.php
Disallow: /forum/misc.php
Disallow: /forum/moderator.php
Disallow: /forum/newreply.php
Disallow: /forum/newthread.php
Disallow: /forum/online.php
Disallow: /forum/poll.php
Disallow: /forum/postings.php
Disallow: /forum/printthread.php
Disallow: /forum/private.php
Disallow: /forum/private2.php
Disallow: /forum/report.php
Disallow: /forum/search.php
Disallow: /forum/sendtofriend.php
Disallow: /forum/threadrate.php
Disallow: /forum/usercp.php
Disallow: /forum/admin/
Disallow: /forum/images/
Disallow: /forum/mod/
The only thing that worries me is that a hacker, would read the robots.txt file, and know exactly what php files you have where. But since structure of vBulletin is not exactly secret anyway, it is probably not that big of a deal…
For more info on robots.txt see this:
http://www.robotstxt.org/wc/exclusion-admin.html
Millward
Sat 11th May '02, 3:22pm
ok, i've done all that, just one question. Ummm, what's a robot?
eva2000
Sat 11th May '02, 3:46pm
my current robot.txt file for my forums has only
User-Agent: Googlebot-Image
Disallow: /
:)
TObject
Sat 11th May '02, 4:48pm
Originally posted by Millward
ok, i've done all that, just one question. Ummm, what's a robot?
Have you ever seen Futurama? Robots like to steel things, like bandwidth, and images...
Scott MacVicar
Sat 11th May '02, 6:32pm
but george that stops them spidering your forums and most people like to have there forum spidered, well I think its useful at least you appear in search engines.
eva2000
Sat 11th May '02, 6:39pm
Originally posted by PPN
but george that stops them spidering your forums and most people like to have there forum spidered, well I think its useful at least you appear in search engines. look again it only prevents google from grabbing my images for indexing.. http://www.google.com/remove.html#images
Scott MacVicar
Sat 11th May '02, 10:42pm
oh hehe
I never noticed what the useragent was.
Millward
Sun 12th May '02, 8:16am
lol, ah i see why it called robots now, sorry. I still dont get where they come from.......... im thick arn't i?
TObject
Sun 12th May '02, 1:54pm
They usually come from all kinds of search engines: Altavista, Google, etc...
Millward
Sun 12th May '02, 6:13pm
is it sort of like hotlinking where some one puts an <img> tag in their page that calls for an image on a different server?
Scott MacVicar
Sun 12th May '02, 6:39pm
robots spider you webpage and gather the documents, they index these so when you go to a search engine you can type in a word and it matches your site if it found the word.
Spiders following links on web pages to other pages, but they also follow images which is a bad thing sometimes as it uses your bandwidth especially if you have a big board and its getting spidered once a day by many search engines.
JamieFry
Thu 24th Oct '02, 7:36pm
So is the long list of things in the first post preferable to this:
User-agent: *
Disallow: /
I got this from Fast Crawler. Thanks.
TObject
Fri 25th Oct '02, 2:27pm
Absolutely not! There is even an easier way – just close down the web site all together: helps with the bandwidth usage tremendously. :)
JamieFry
Fri 25th Oct '02, 2:38pm
Is there something in my post that elicted a snotty response?
MUG
Fri 25th Oct '02, 4:37pm
That would prevent any robot from visiting the site, including search engines. If you don't want to be listed in anything that would be the way to go.
JamieFry
Fri 25th Oct '02, 4:41pm
Thank you, MUG. My forums are only open to registered members so the robots were only getting "access denied" pages anyway.
TObject
Fri 25th Oct '02, 7:20pm
Originally posted by JamieFry
Is there something in my post that elicted a snotty response?
What snotty response?
Merjawy
Sat 26th Oct '02, 11:57am
what about ppl searching the forum itself? I mean sometimes I get 30 or more online guests but the IPs are of one of them and just last digit differ, I know they are not 30 ppl in there but one person searching... would this stop the local search?
ccd1
Sat 26th Oct '02, 5:31pm
Originally posted by merjawy
what about ppl searching the forum itself? I mean sometimes I get 30 or more online guests but the IPs are of one of them and just last digit differ, I know they are not 30 ppl in there but one person searching... would this stop the local search?
Only robots search for that file.
SloppyGoat
Wed 4th Dec '02, 9:50pm
This is cool! I'm definitely trying it. :cool:
irc
Fri 6th Dec '02, 1:04pm
What if you wanted to restrict all spiders but one specific one? Does anyone know the syntax for excluding all robots but one, i.e. google?
Kohhal
Wed 21st May '03, 10:16am
What if you wanted to restrict all spiders but one specific one? Does anyone know the syntax for excluding all robots but one, i.e. google?
That info is here : http://www.robotstxt.org/wc/norobots.html
Beorn
Sat 24th May '03, 6:17pm
You may want to check out this for the bad bots:
Catch Bad Bots (http://216.239.57.100/search?q=cache:InZvlLBmUCQJ:www.kloth.net/internet/bottrap+PHP+robot+trap&hl=en&ie=UTF-8)
(note: I needed to use the Google cache because the site seems to be slow. Search for 'PHP robot trap' in Google some other time....)
d3nnis
Thu 12th Jun '03, 8:13am
hi can i just do this?
User-agent: *
Disallow: /
by the way should i ftp in ASCII mode or binary mode?
DJ5A
Sat 19th Jul '03, 5:09pm
Hello Everybody:
I noticed that this post was started back in 2000, can Scott or someone else tell me if the robots.text file is correct for the vB version 2.3.0?
If it does work could I just add eva2000's code Like this to Scotts Code on the same page...
User-Agent: Googlebot-Image
Disallow: /
User-agent: *
Disallow: /forum/attachment.php
Disallow: /forum/avatar.php
Disallow: /forum/editpost.php
Disallow: /forum/member.php
and it work properly?
Faruk
Fri 8th Aug '03, 7:10am
That should work fine, yes.
I have that, essentially. With much more files to disallow, though.
Chroder
Thu 16th Oct '03, 11:02pm
I've got my forums on a subdomain (http://forums.devbox.net) but when I FTP to my site, there's just a directory "forums". Do I put the robots.txt in my forums directory because its like its own domain, or do I put it in my root directory and edit the paths?
/////edit
Doh! Nevermind, it says right in the first post! Stupid me...
marcjd
Mon 8th Dec '03, 2:29pm
Is the text on the original post the same for vb3 Gamma or would I need to make some modifications? Thank you.
marcjd
Thu 11th Dec '03, 6:15pm
I used the file on the first page and it helped prevent Google from getting the "permission denied" pages most of the time, but it was still getting it some. I looked at the vb3 files and added a couple and haven't seen Google get the "permission denied" page again yet.
User-agent: *
Disallow: /forums/attachment.php
Disallow: /forums/avatar.php
Disallow: /forums/editpost.php
Disallow: /forums/member.php
Disallow: /forums/member2.php
Disallow: /forums/misc.php
Disallow: /forums/moderator.php
Disallow: /forums/newreply.php
Disallow: /forums/newthread.php
Disallow: /forums/online.php
Disallow: /forums/poll.php
Disallow: /forums/postings.php
Disallow: /forums/printthread.php
Disallow: /forums/private.php
Disallow: /forums/private2.php
Disallow: /forums/report.php
Disallow: /forums/search.php
Disallow: /forums/sendtofriend.php
Disallow: /forums/threadrate.php
Disallow: /forums/usercp.php
Disallow: /forums/admin/
Disallow: /forums/images/
Disallow: /forums/mod/
Disallow: /forums/sendmessage.php
Disallow: /forums/register.php
Disallow: /forums/subscription.php
I, Brian
Thu 1st Jan '04, 6:13am
Excellent list - will try this out.
EDIT: admin and mod folders need renaming for vb3 - this should be good:
User-agent: *
Disallow: /attachment.php
Disallow: /avatar.php
Disallow: /editpost.php
Disallow: /member.php
Disallow: /member2.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /private2.php
Disallow: /report.php
Disallow: /search.php
Disallow: /sendtofriend.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /admincp/
Disallow: /modcp/
Disallow: /images/
Disallow: /sendmessage.php
Disallow: /register.php
Disallow: /subscription.php
Exero
Tue 6th Jan '04, 10:00am
my forum is at www.exero.net/forum/
so i use this
User-agent: *
Disallow: /
Disallow: /forum/
Disallow: /forum/admincp/
Disallow: /forum/modcp/
Disallow: /forum/images/
Disallow: /forum/includes/
Disallow: /forum/archive/
Disallow: /forum/clientscript/
Disallow: /forum/cpstyles/
Disallow: /forum/customavatars/
Disallow: /forum/install/
Disallow: /forum/subscriptions/
achtungbaby
Thu 8th Jan '04, 8:59pm
robots spider you webpage and gather the documents, they index these so when you go to a search engine you can type in a word and it matches your site if it found the word.
Spiders following links on web pages to other pages, but they also follow images which is a bad thing sometimes as it uses your bandwidth especially if you have a big board and its getting spidered once a day by many search engines.Why are so many of them visiting my forums at once? Right now I have about 44 members logged in and about 97 Inktomi spiders...!
yolise
Fri 9th Jan '04, 9:09am
Note: The majority of spiders check for these not all do.
Sadly, it appears that Google is one that does not. I added those lines to my robots.txt and Google is still spidering things like memberlists, reply forms, etc. Has been now for days - usually 10 to 15 seessions at a time for a goodly portion of the day. :(
Scott MacVicar
Fri 9th Jan '04, 9:38am
All search engines fetch it just before they start a spidering session.
Google should behave next time it spiders at the end of the month.
In order to save bandwidth Googlebot only downloads the robots.txt file once a day or whenever we have fetched many pages from the server. So, it may take a while for Googlebot to learn of any changes that might have been made to your robots.txt file. Also, Googlebot is distributed on several machines. Each of these keeps its own record of your robots.txt file. Finally, you may want to check that your syntax is correct against the standard at: http://www.robotstxt.org/wc/norobots.html (http://www.robotstxt.org/wc/norobots.html). If there still seems to be a problem, please let us know, and we will correct it.
http://www.google.com/intl/en/webmasters/3.html
B3
DJ5A
Mon 23rd Feb '04, 9:36am
Hello:
Could someone tell me which of the robots.txt list above to use or show me a list to use for vB3.0RC4 to reduce bandwidth, I do want my forum spidered. The path to my forums home page is (.com/index.php?)
I do have the archive set up & I do want this searched.
I'm only asking because some of these list are old & I do not know which one to use for vB3.0RC4 for best results in saving bandwidth? Anybody?
DJ5A
Wed 3rd Mar '04, 10:11am
Hello Everybody:
If I want my forum (vB3.0RC4) searched by the Spiders & Save bandwidth, I should...
Disallow all files & folders except this folder...
/archive/
Is this correct?
Floris
Wed 3rd Mar '04, 12:08pm
Could Scott be so nice to update the first post with a .txt file for version 2 and version 3 of vb ? - there are obviously some directory and file changes.
DJ5A
Thu 4th Mar '04, 10:45pm
Hello Anybody Out Thar?
I'm very patient... 13 days & no Reply?
If I want my forum (vB3.0 RC4) searched by the Spiders & Save bandwidth, everything is disallowed except the Archive folder, Is this Correct?
User-agent: *
Disallow: /admincp/
Disallow: /cgi-bin/
Disallow: /clientscript/
Disallow: /cpstyles/
Disallow: /customavatars/
Disallow: /images/
Disallow: /includes/
Disallow: /install/
Disallow: /modcp/
Disallow: /subscriptions/
Disallow: /announcement.php
Disallow: /attachment.php
Disallow: /calendar.php
Disallow: /clear.gif
Disallow: /cron.php
Disallow: /editpost.php
Disallow: /external.php
Disallow: /faq.php
Disallow: /forumdisplay.php
Disallow: /global.php
Disallow: /image.php
Disallow: /joinrequest.php
Disallow: /login.php
Disallow: /member.php
Disallow: /memberlist.php
Disallow: /misc.php
Disallow: /moderator.php
Disallow: /newattachment.php
Disallow: /newreply.php
Disallow: /newthread.php
Disallow: /online.php
Disallow: /poll.php
Disallow: /postings.php
Disallow: /printthread.php
Disallow: /private.php
Disallow: /profile.php
Disallow: /register.php
Disallow: /report.php
Disallow: /reputation.php
Disallow: /search.php
Disallow: /sendmessage.php
Disallow: /showgroups.php
Disallow: /showpost.php
Disallow: /showthread.php
Disallow: /subscription.php
Disallow: /subscriptions.php
Disallow: /threadrate.php
Disallow: /usercp.php
Disallow: /usernote.php
marcjd
Thu 25th Mar '04, 1:47pm
People may not have responded because the correct code is already on this page. Look at I, Brian's...which corrected the one I did for the modcp and admincp directories.
On a side note, it was working fine for me for quite a while then Google started going to no permission pages again. Maybe they are mad at me...lol.
Grover
Tue 7th Sep '04, 2:42pm
Could Scott be so nice to update the first post with a .txt file for version 2 and version 3 of vb ? - there are obviously some directory and file changes.
Yes, I would like to see an updated version as well. Maybe this can be put somewhere as a sticky or an official 'How do I'?
I am looking for an updated VB3 robots.txt that really blockes as much as possible.
Joseph777
Sun 19th Sep '04, 8:54pm
Yes, I would like to see an updated version as well. Maybe this can be put somewhere as a sticky or an official 'How do I'?
I am looking for an updated VB3 robots.txt that really blockes as much as possible.
I'll proudly join this bandwagon. :)
I, too, am ever-so-patiently waiting for a list. (Why? ... You may ask. Because I am a moron and couldn't compile one on my own if my life depended on it. :D)
Thank You.
Mechanical Mind.
cinq
Sun 26th Dec '04, 6:26am
Is there an updated list for vb3.0.3 ? :)
andreas.kemalis
Tue 15th Feb '05, 12:29pm
ok, thanks for all... you help me
vBulletin® v3.8.0 Release Candidate 1, Copyright ©2000-2008, Jelsoft Enterprises Ltd.