250 individual Yahoo Slurp Spiders!!!!

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Mike Warner
    Senior Member
    • Jun 2001
    • 521
    • 4.0.x

    250 individual Yahoo Slurp Spiders!!!!

    My site has been running a little slow over the last few days, so I looked at WOL and spotted over 250 individual yahoo spiders!!! This isn't normal right?

    How can yahoo do this? Surely they should somehow limit the number of spiders on a single site at any one time?

    Hopefully this will help my yahoo listing, but at the same time, my users are getting sql errors and slow page loads.

    Any advice?
    Mike Warner
    MIGWeb - a Vauxhall Site for Enthusiasts of all Vauxhalls
  • Cole2026
    Senior Member
    • Apr 2004
    • 478
    • 3.6.x

    #2
    Try putting this in a robots.txt file in your public_html directory.

    Code:
    User-agent: Slurp
    Crawl-delay: 20

    Comment

    • Mike Warner
      Senior Member
      • Jun 2001
      • 521
      • 4.0.x

      #3
      Isn't that a delay for each individual spider though - or for all of them?
      Originally posted by bfoot045
      Try putting this in a robots.txt file in your public_html directory.

      Code:
       User-agent: Slurp
       Crawl-delay: 20
      Mike Warner
      MIGWeb - a Vauxhall Site for Enthusiasts of all Vauxhalls

      Comment

      • Ted S
        Senior Member
        • Dec 2003
        • 332
        • 3.5.x

        #4
        I've had the same thing occuring every week or two for the past month. The Crawl-delay command does help if set to 20 or 30 but I still get a ton of their bots... contacting yahoo has prooven unsucessful so far so my advice is to set the robot setting high and deal with a little extra load (if you have to you can use htaccess to ban the bot but that won't help you search rankings).
        Ted & the Giftery.me Team

        New features released for Product Review Forums 1.3 (4.0)
        Helpful Answers (3.8) (4.0) | Limited Guest Viewing (3.8) (4.0)

        Comment

        • Mike Warner
          Senior Member
          • Jun 2001
          • 521
          • 4.0.x

          #5
          Is the delay in the second post for each Spider IP, or all from a particular company?
          Mike Warner
          MIGWeb - a Vauxhall Site for Enthusiasts of all Vauxhalls

          Comment

          • z0diac
            Senior Member
            • Oct 2006
            • 444

            #6
            Originally posted by bfoot045
            Try putting this in a robots.txt file in your public_html directory.

            Code:
            User-agent: Slurp
            Crawl-delay: 20
            I have a question about that - will that affect ALL search engines or just Slurp? Because I don't want Google to be delayed, since it crawls the *correct* way and uses only one instance on the forum at a time.

            I do however want to make Slurp wait forever between requests (I wish I could just ban Slurp altogether)

            Comment

            • z0diac
              Senior Member
              • Oct 2006
              • 444

              #7
              Originally posted by Ted S
              I've had the same thing occuring every week or two for the past month. The Crawl-delay command does help if set to 20 or 30 but I still get a ton of their bots... contacting yahoo has prooven unsucessful so far so my advice is to set the robot setting high and deal with a little extra load (if you have to you can use htaccess to ban the bot but that won't help you search rankings).
              Oooh ooh !!! How!? I don't care to be listed in Yahoo anyway, since 95% of my search trafffic is google (and the rest probably MSN).

              Do you know of an .htaccess code that I could copy and paste?

              I will be your loyal subject forever!!!

              Comment

              • MRGTB
                Senior Member
                • May 2005
                • 5454

                #8
                Must admit I'm getting a bit miffed off also with Yahoo myself, they slow the site down to a halt when they come in droves. And I just don't see any hits coming from Yahoo search engine worth even mentioning, It's about 96% all Google, a few MSN and other smaller search engines here and there, that are matching Yahoo for hits.

                If things continue that way, I'm seriously thinking about banning Yahoo altogether and just be done with them causing site freezes when go rapant.

                Comment

                • z0diac
                  Senior Member
                  • Oct 2006
                  • 444

                  #9
                  And it's not just that a small fraction of hits comes from Yahoo - even with all their "slurping" - I've run seach strings on there that should make my site show up, and it's nowhere to be found! Originally I thought, ok, they're screwing up my stats and bogging down the server, but it'll be worth it because obviously my ranking should be pretty high on Yahoo because of all this. NOPE. My site is very hard to find on Yahoo (but comes up on the first result page on Google).

                  They must have had numerous complaints about the way their Slurper goes about doing things because they specifically state how to limit it using robots.txt - so it seems like they've pretty much conceeded to Google and have admitted defeat, and have nothig to loose as far as their reputation goes when it comes to flooding sites with their crawler.

                  I'm seriously thinking about adding an IP mask to block Yahoo altogether (if I remember corrrectly it's Slurp crawler uses 74.x.x.x )

                  Comment

                  • z0diac
                    Senior Member
                    • Oct 2006
                    • 444

                    #10
                    It's on right now (actually, I can't remember the last time I viewed who was online, and it WASN'T on)...

                    Yahoo! Slurp Spider Viewing Calendar
                    Default Calendar
                    74.6.23.202


                    I did however manage to finally get my max. users online reset.

                    Comment

                    • MRGTB
                      Senior Member
                      • May 2005
                      • 5454

                      #11
                      I thought this was a good read about them. And I really agree about the part that people are being asked to upgrade there hostsing accounts and pay more money just because of yahoo spiders coming in there numbers from different IP's, which is being picked up as multiple sessions.

                      IPSBeyond _ IPB Assistance _ Increased Guests Online?
                      Posted by: bfarber Jun 29 2007, 09:06 AM
                      This topic highlights a large reason why you may be seeing an increased number of users (or more specifically, Slurp/Yahoo spiders) online recently.

                      http://www.ipsbeyond.com/forums/index.php?...st&p=145039 (IPB customer only)

                      To highlight the important parts

                      1) Yahoo is now spidering sites from multiple IPs at the same time, which causes most guest tracking (including IPB's) to treat the requests as different sessions. As such whereas you may have seen 100 users online previously, you may now see 1000. This is actually working as IPB is intended, because you indeed have 1000 different computers accessing your site (and your server resources ARE being used to serve 1000 requests) however it is "new" because no other spiders do this, and yahoo did not used to do this.

                      2) Yahoo IS actually sending out hundreds (and sometimes thousands) of spiders to sites at once, which is crippling many people's servers. Our company forums have 1500 guests online last night when I looked, and I estimated about 1200 were Yahoo spiders. It's not fair for users who get 50 real visitors, and maybe 50 guest visitors to have to pay and accomodate 1000 users because of Yahoo spiders.

                      Please see this Digg article if you wish to voice your support: http://www.digg.com/software/Yahoo_Spider_..._guest_tracking

                      and PLEASE vote for this suggestion for Yahoo to fix their problem if you too are unhappy with these recent changes: http://suggestions.yahoo.com/detail/?prop=...r&fid=31431

                      I mean, just thinking of OUR shared hosting customers, once they hit a certain number of users online we have to recommend to upgrade their hosting plan, and I think it's entirely unfair to them that they should do this (and pay more, of course) just because of Yahoo.

                      There are suggestions that can help with this issue. One involves throttling Yahoo's requests, and others involve banning their IP/spider so they do not index you at all. While it is never recommended to ban a search engine spider, at this rate that may be the only thing users can do to protect their sites.

                      Posted by: bfarber Jun 29 2007, 09:07 AM
                      I am pinning this because I think it deserves a higher level of attention. I urge everyone to visit the suggestion thread at yahoo's board and click the "Rate this" icon. Show Yahoo that you don't want your server crippled so that they can add a few more pages to their index.

                      Posted by: Axel Wers Jun 29 2007, 09:50 AM
                      If somebody thinks that Yahoo is too active at your boards you can try http://help.yahoo.com/help/us/ysearch/slurp/slurp-03.html

                      Posted by: bfarber Jun 29 2007, 01:57 PM
                      I noticed the link to that suggestion on Yahoo was posted on vbulletin's forums today too. Their users are reporting the same problems with Yahoo.

                      Yahoo's latest response was something to the effect of "We're looking at some documentation and we'll be replying soon.". Heh, in the mean time, I say everyone just ban Yahoo's spiders. Perhaps they will wake up when their spiders are unable to index anything because they're being resource hogs.

                      We just banned Yahoo on one of our server setups. I cleared out 817 out of 917 guest sessions from Yahoo and load dropped from 22 to 9 in about 60 seconds.

                      I mean load of 22 (higher yesterday) down to 2.54????

                      Yahoo == evil

                      Posted by: bfarber Jun 29 2007, 03:39 PM
                      We did it in iptables today

                      iptables -I INPUT -m iprange --src-range 74.6.0.0-74.6.255.255 -j DROP

                      This ENTIRE range is owned by Inktomi (Yahoo) so you won't be banning any legit users. The load on the box went from 22 (it was closer to 40 yesterday) to 2 in about 5-10 minutes. Guests online went from over 1000 to 169.
                      Last edited by MRGTB; Sat 14 Jul '07, 2:38am.

                      Comment

                      • Bober42
                        Member
                        • Feb 2006
                        • 72
                        • 3.6.x

                        #12
                        Total guests that have visited the forum today: 771

                        Visitors (74), Yahoo! Slurp Spiders (687), Google Spiders (1), MSNBot Spiders (7), YahooFeedSeeker Spiders (1), Alexa Spiders (1)

                        This is ours today, pesky things.

                        Comment

                        • Joe Gronlund
                          Senior Member
                          • Nov 2001
                          • 5789
                          • 3.8.x

                          #13
                          Visitors (40), Yahoo! Slurp Spiders (617), MSNBot Spiders (9), Google Spiders (1)
                          MCSE, MVP, CCIE
                          Microsoft Beta Team

                          Comment

                          • FreshFroot_
                            Senior Member
                            • Jul 2005
                            • 1420
                            • 3.8.x

                            #14
                            yeah I get TONS of these slurp spiders too.. I think I ahd 800 one day.. which caused a massive slow down for the forum.. took a long time to load threads..

                            Comment

                            • MRGTB
                              Senior Member
                              • May 2005
                              • 5454

                              #15
                              Well at the monent I've decided to use the Spider Delay for them set to 20, to see how things go from there first. There's no way I'm allowing them to just run rampant anymore, as you say. They just bring your site to a crawl at time.

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...