Search Engines and the vB archive

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • RedWingFan
    Senior Member
    • Sep 2004
    • 371
    • 4.0.0

    Search Engines and the vB archive

    First of all, our traffic took a sudden huge jump in the past week. We now have 650-700 users online, vs. having just 500-550 during peak hours in the immediate past. After noticing many similar IP addresses, I looked at my access logs and found out that the new culprit is "Yahoo! Slurp". I added this to the spider identification settings in AdminCP, and now I can see just how many of these are visiting the forum.

    In the "guest" pages, Yahoo Slurp is now accounting for 7 to 10 out of 20 guest listings per page. WAY excessive IMHO!

    I do not want to block these addresses, but I think the spider is overly greedy. I added this to robots.txt a few minutes ago (it's a Slurp-specific extension), but I don't know if it will limit the number of accesses:

    User-agent: Slurp
    Crawl-delay: 10

    Is there any other way to trim these accesses back?


    My second question is about the forum /archive directory, which is the stripped-down view. I noticed that Slurp was trying to access user profiles, so that means it was looking in our main forum, not on our archive pages. Does anyone know of a way to force these spiders to use the archive directories rather than our active forum? Can't disallow our /forums directory since the /archive directory is a subdirectory of /forums. I also don't know how reliable indexing would be if I used htaccess to rewrite the URL so I could disallow /forums in robots.txt and yet still allow access to the /archive. I'd thought one of the ideas of the archive was to keep spider access from slowing down the forum in the "full view".

    Any ideas here?
  • RedWingFan
    Senior Member
    • Sep 2004
    • 371
    • 4.0.0

    #2
    Sheesh...checking right now, doing a rough random count of pages: about half of the number of "guest" users on our site are that @&$#*& Yahoo Slurp spider! Way excessive, don't you think?? We normally run 50/50 members/guests. We're almost at 800 users online right now, 533 of them guests, and about half of those being Slurp.

    Comment

    • Zachery
      Former vBulletin Support
      • Jul 2002
      • 59097

      #3
      So is it really a problem that your site is being indexed by one of the biggest SE's ?? and that your content might showup on their search engines in the very near future?

      I'd personally be thrilled.

      Comment

      • RedWingFan
        Senior Member
        • Sep 2004
        • 371
        • 4.0.0

        #4
        It's only a problem in that I think Slurp is just sending way too many bots into our site all at one time. Google never sends anywhere near that many. Using the Currently Online Users link, I set it to display only the search bots, and there were over 400 search bots on our site, the majority of them are Yahoo Slurp. In fact, I just counted: 411 search bots. 37 of them are Google. ALL the rest (374 of them) are Slurp--no other identified bots are on our forum right now.

        I've read that with Google you can report back to them if you feel their 'bots are too aggressive.

        The good thing is that we were almost near our record number of users online again, and with the new my.cnf file from George and changing three tables to InnoDB, the forum is operating just fine...on a single dedicated server! Even so, I'd rather save the bandwidth for our members...and I don't think Slurp needs to be so aggressive with indexing sites. Just my two cents, anyway.

        Comment

        widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
        Working...