First of all, our traffic took a sudden huge jump in the past week. We now have 650-700 users online, vs. having just 500-550 during peak hours in the immediate past. After noticing many similar IP addresses, I looked at my access logs and found out that the new culprit is "Yahoo! Slurp". I added this to the spider identification settings in AdminCP, and now I can see just how many of these are visiting the forum.
In the "guest" pages, Yahoo Slurp is now accounting for 7 to 10 out of 20 guest listings per page. WAY excessive IMHO!
I do not want to block these addresses, but I think the spider is overly greedy. I added this to robots.txt a few minutes ago (it's a Slurp-specific extension), but I don't know if it will limit the number of accesses:
User-agent: Slurp
Crawl-delay: 10
Is there any other way to trim these accesses back?
My second question is about the forum /archive directory, which is the stripped-down view. I noticed that Slurp was trying to access user profiles, so that means it was looking in our main forum, not on our archive pages. Does anyone know of a way to force these spiders to use the archive directories rather than our active forum? Can't disallow our /forums directory since the /archive directory is a subdirectory of /forums. I also don't know how reliable indexing would be if I used htaccess to rewrite the URL so I could disallow /forums in robots.txt and yet still allow access to the /archive. I'd thought one of the ideas of the archive was to keep spider access from slowing down the forum in the "full view".
Any ideas here?
In the "guest" pages, Yahoo Slurp is now accounting for 7 to 10 out of 20 guest listings per page. WAY excessive IMHO!
I do not want to block these addresses, but I think the spider is overly greedy. I added this to robots.txt a few minutes ago (it's a Slurp-specific extension), but I don't know if it will limit the number of accesses:
User-agent: Slurp
Crawl-delay: 10
Is there any other way to trim these accesses back?
My second question is about the forum /archive directory, which is the stripped-down view. I noticed that Slurp was trying to access user profiles, so that means it was looking in our main forum, not on our archive pages. Does anyone know of a way to force these spiders to use the archive directories rather than our active forum? Can't disallow our /forums directory since the /archive directory is a subdirectory of /forums. I also don't know how reliable indexing would be if I used htaccess to rewrite the URL so I could disallow /forums in robots.txt and yet still allow access to the /archive. I'd thought one of the ideas of the archive was to keep spider access from slowing down the forum in the "full view".
Any ideas here?
Comment