Prevent Robots from using bandwidth

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • Scott MacVicar
    Former vBulletin Developer
    • Dec 2000
    • 13286

    Prevent Robots from using bandwidth

    Here is a nice simple way to stop the majority of robots spidering files that you don't want to or they shouldn't need to.

    Place the following code in robots.txt and upload it to your domain root so when you go to http://forums.site.com/robots.txt you get this file

    Code:
    User-agent: *
    Disallow: attachment.php
    Disallow: avatar.php
    Disallow: editpost.php
    Disallow: member.php
    Disallow: member2.php
    Disallow: misc.php
    Disallow: moderator.php
    Disallow: newreply.php
    Disallow: newthread.php
    Disallow: online.php
    Disallow: poll.php
    Disallow: postings.php
    Disallow: printthread.php
    Disallow: private.php
    Disallow: private2.php
    Disallow: report.php
    Disallow: search.php
    Disallow: sendtofriend.php
    Disallow: threadrate.php
    Disallow: usercp.php
    Disallow: /admin/
    Disallow: /images/
    Disallow: /mod/
    This will stop them from trying to access files that won't have anything intresting to spider. Also stops them getting images which should save on bandwidth when you have lots of spiders on your forums.

    Note: The majority of spiders check for these not all do.
    Scott MacVicar

    My Blog | Twitter
  • hypedave
    Senior Member
    • Nov 2001
    • 692
    • 2.3.2

    #2
    wow this is cool, im gonna try this out

    Comment

    • Scott MacVicar
      Former vBulletin Developer
      • Dec 2000
      • 13286

      #3
      If you want to check, look at your error logs you'll probably see a few 404 errors for /robots.txt in the log. Its spiders like google looking for the file to see if you have any rules it has to follow like the ones posted.
      Scott MacVicar

      My Blog | Twitter

      Comment

      • hypedave
        Senior Member
        • Nov 2001
        • 692
        • 2.3.2

        #4
        cool, I just modified this to fit with my vbp directory

        Comment

        • TObject
          Senior Member
          • Apr 2002
          • 104

          #5
          Sounds Like a very good idea. Thank you!

          For folks, whose board is in a subfolder, make sure that you do not forget to put that folder name, before the php files in the list. The robots.txt has to be in the "/" folder of the web site.

          For example, http://www.vbulletin.com/robots.txt file, would look something like this:

          Code:
          User-agent: *
          Disallow: /forum/attachment.php
          Disallow: /forum/avatar.php
          Disallow: /forum/editpost.php
          Disallow: /forum/member.php
          Disallow: /forum/member2.php
          Disallow: /forum/misc.php
          Disallow: /forum/moderator.php
          Disallow: /forum/newreply.php
          Disallow: /forum/newthread.php
          Disallow: /forum/online.php
          Disallow: /forum/poll.php
          Disallow: /forum/postings.php
          Disallow: /forum/printthread.php
          Disallow: /forum/private.php
          Disallow: /forum/private2.php
          Disallow: /forum/report.php
          Disallow: /forum/search.php
          Disallow: /forum/sendtofriend.php
          Disallow: /forum/threadrate.php
          Disallow: /forum/usercp.php
          Disallow: /forum/admin/
          Disallow: /forum/images/
          Disallow: /forum/mod/
          The only thing that worries me is that a hacker, would read the robots.txt file, and know exactly what php files you have where. But since structure of vBulletin is not exactly secret anyway, it is probably not that big of a deal…

          For more info on robots.txt see this:

          Comment

          • Millward
            Senior Member
            • Apr 2002
            • 127

            #6
            ok, i've done all that, just one question. Ummm, what's a robot?

            Comment

            • George L
              Former vBulletin Support
              • May 2000
              • 32996
              • 3.8.x

              #7
              my current robot.txt file for my forums has only

              Code:
              User-Agent: Googlebot-Image
              Disallow: /
              :: Always Back Up Forum Database + Attachments BEFORE upgrading !
              :: Nginx SPDY SSL - World Flags Demo [video results]
              :: vBulletin hacked forums: Clean Up Guide for VPS/Dedicated hosting users [ vbulletin.com blog summary ]

              Comment

              • TObject
                Senior Member
                • Apr 2002
                • 104

                #8
                Originally posted by Millward
                ok, i've done all that, just one question. Ummm, what's a robot?
                Have you ever seen Futurama? Robots like to steel things, like bandwidth, and images...

                Comment

                • Scott MacVicar
                  Former vBulletin Developer
                  • Dec 2000
                  • 13286

                  #9
                  but george that stops them spidering your forums and most people like to have there forum spidered, well I think its useful at least you appear in search engines.
                  Scott MacVicar

                  My Blog | Twitter

                  Comment

                  • George L
                    Former vBulletin Support
                    • May 2000
                    • 32996
                    • 3.8.x

                    #10
                    Originally posted by PPN
                    but george that stops them spidering your forums and most people like to have there forum spidered, well I think its useful at least you appear in search engines.
                    look again it only prevents google from grabbing my images for indexing.. http://www.google.com/remove.html#images
                    Last edited by George L; Sat 11 May '02, 1:41pm.
                    :: Always Back Up Forum Database + Attachments BEFORE upgrading !
                    :: Nginx SPDY SSL - World Flags Demo [video results]
                    :: vBulletin hacked forums: Clean Up Guide for VPS/Dedicated hosting users [ vbulletin.com blog summary ]

                    Comment

                    • Scott MacVicar
                      Former vBulletin Developer
                      • Dec 2000
                      • 13286

                      #11
                      oh hehe

                      I never noticed what the useragent was.
                      Scott MacVicar

                      My Blog | Twitter

                      Comment

                      • Millward
                        Senior Member
                        • Apr 2002
                        • 127

                        #12
                        lol, ah i see why it called robots now, sorry. I still dont get where they come from.......... im thick arn't i?

                        Comment

                        • TObject
                          Senior Member
                          • Apr 2002
                          • 104

                          #13
                          They usually come from all kinds of search engines: Altavista, Google, etc...

                          Comment

                          • Millward
                            Senior Member
                            • Apr 2002
                            • 127

                            #14
                            is it sort of like hotlinking where some one puts an <img> tag in their page that calls for an image on a different server?

                            Comment

                            • Scott MacVicar
                              Former vBulletin Developer
                              • Dec 2000
                              • 13286

                              #15
                              robots spider you webpage and gather the documents, they index these so when you go to a search engine you can type in a word and it matches your site if it found the word.

                              Spiders following links on web pages to other pages, but they also follow images which is a bad thing sometimes as it uses your bandwidth especially if you have a big board and its getting spidered once a day by many search engines.
                              Scott MacVicar

                              My Blog | Twitter

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...