PDA

View Full Version : Still problem With Yahoo! Slurp Spider


olimpus0
Sun 1st Jul '07, 4:56am
Yahoo Spider to many from them Still How I can not allowed Yahoo! Slurp Spider in my site ?I want to cancel them foreverOnly Yahoo Slurp

slappy
Sun 1st Jul '07, 5:10am
You could try making an .htaccess file which contains an IP block, containing something like:

order allow,deny
deny from .googlebot.com
deny from unknown
deny from .anonymizer.com
deny from .crawl.yahoo.net
allow from all

The one in bold is the yahoo Slurp Spider and you can leave or eliminate any of the others you desire.

Regards,

olimpus0
Sun 1st Jul '07, 5:19am
You can To do me the file Please ?

slappy
Sun 1st Jul '07, 6:02am
Here's what you need to do. Use any text editor, notepad will work. Create a file and name it .htaccess (note the dot "." at the start of htaccess).

You might also be able to create one in your host control panel. Click on file manager and created an .htaccess file then edited it to contain the following:

However you create the file, it should contain the following:

RewriteCond %{HTTP_USER_AGENT} yahoo\.com [NC]
RewriteRule ^.* - [F,L]

Save it as a .txt file.

Whether you create it in notepad or from your control panel, if you put that information in the file and save it, you will have NO slurp spiders at all, but your Forum will also not be indexed by Yahoo!!!

If you created it in your control panel, save it in the same directory as your forums folder. If you create it on your own machine, upload it to your forums folder.

Regards,

olimpus0
Mon 2nd Jul '07, 12:02am
I do not understands Explain me again

hescominsoon
Mon 2nd Jul '07, 9:25pm
Yahoo Spider to many from them Still How I can not allowed Yahoo! Slurp Spider in my site ?I want to cancel them foreverOnly Yahoo Slurp

easy:
http://www.cgalliance.org/robots.txt

we slam yahoo's slurp down to something reasonable with that robots.txt and the others also play nice with that file.

SeanE
Mon 2nd Jul '07, 10:43pm
I'm happy to hear about this... So, to use your robot file above, just rename it as an .htaccess - correct? I want to just slow down the army of spiders on my site - not get rid of them...

Also - hescominsoon - what mod are you using on your front end? Is that a hack (LDM?) - or is that something you did? I run www.EricsonYachts.org (http://www.EricsonYachts.org), and am considering something like LDM...

Thanks,
//sse

slappy
Mon 2nd Jul '07, 10:56pm
You LEAVE a "robot.txt" file named "robot.txt." An ".htaccess" file is something different. A "robot.txt" is a file the bots are supposed to "read" and follow, but some don't and some only "read" them every so often. An .htaccess file is something which controls "access." In other words, a bot can not "access" the site AT ALL if they are "excluded." A "robot.txt" simply puts some "conditions" on their behaviour, AFTER they access your site, IF they actually follow the "robot.txt" rules.

Hope that's clear.

Regards,

SeanE
Tue 3rd Jul '07, 12:37am
Great, thanks. Where should robats.txt be placed - in the root directory, or in the forums root directory?

Thanks,
//sse

z0diac
Thu 12th Jul '07, 3:22am
easy:
http://www.cgalliance.org/robots.txt

we slam yahoo's slurp down to something reasonable with that robots.txt and the others also play nice with that file.

THANK YOU for this example. I cut and pasted it right into my robots file. I don't want to slow down Google at all, since it only uses one instance, and 95% of my traffic comes from Google.

I just don't want Yahoo locking up my server (or wrecking my max. users online stat)