Robots.txt help :(

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • darsalles
    Member
    • Oct 2009
    • 49
    • 3.8.x

    Robots.txt help :(

    I created a file named robots.txt on my forum index with this code:

    Code:
    User-agent: Mediapartners-Google*
    Disallow:
    User-agent: *
    Disallow: /referrers.php
    Disallow: /ajax_cron.php
    Disallow: /ajax.php
    Disallow: /attachment.php
    Disallow: /calendar.php
    Disallow: /cron.php
    Disallow: /editpost.php
    Disallow: /global.php
    Disallow: /image.php
    Disallow: /inlinemod.php
    Disallow: /joinrequests.php
    Disallow: /login.php
    Disallow: /member.php
    Disallow: /memberlist.php
    Disallow: /misc.php
    Disallow: /moderator.php
    Disallow: /newattachment.php
    Disallow: /newreply.php
    Disallow: /newthread.php
    Disallow: /online.php
    Disallow: /poll.php
    Disallow: /postings.php
    Disallow: /printthread.php
    Disallow: /private.php
    Disallow: /profile.php
    Disallow: /register.php
    Disallow: /report.php
    Disallow: /reputation.php
    Disallow: /search.php
    Disallow: /sendmessage.php
    Disallow: /showgroups.php
    Disallow: /subscription.php
    Disallow: /threadrate.php
    Disallow: /usercp.php
    Disallow: /usernote.php
    Disallow: /member/
    Disallow: /members/
    Disallow: /tags/
    I want to disallow google to index that pages of my forum but it doesnt work someone have some idea of what am i doing wrong??

  • Clickfinity
    Member
    • Jul 2007
    • 90

    #2
    Is "/" the correct root path for these files?

    They're not located in "/forum/" or "/forums/" by any chance are they?

    If so, you'll need to add the full path for them to be excluded.

    Cheers,
    Shaun
    --
    CycleChat | GeeksChat | Clickfinity

    Comment

    • nforums
      Banned
      • Jun 2009
      • 619
      • 3.8.x

      #3
      where did you place it? It has to go in your document root (public_html) or if you have a subdomain it needs to go in the root of the subdomain.

      Also unless you have vBSEO installed the last 3 are incorrect.

      Also, it takes a while for Google and other engines to pick up and start to follow the robots.txt, and further more if you already have pages indexed from the ones you're trying to block, it will take up to a month for them to drop from the search index.

      Comment

      • bigwater
        Senior Member
        • Jan 2007
        • 592

        #4
        If you want to disallow all searching on your entire site, all you need to do is create a robots.txt in your home directory containing the following

        Code:
        User-agent: *
        Disallow: /
        However, not all robots will pay attention to the robots.txt file. This will prevent well behaved robots (google, yahoo, etc), but many of the "rougue" bots will ignore it completely. There's not much you can do about those bots except IP block them at the server level.
        Anybody who says "it can't be done" will usually be interrupted by somebody who is already doing it.

        Comment

        • waynem
          Senior Member
          • Dec 2009
          • 165

          #5
          Take a read here and see if this helps you: http://www.robotstxt.org/robotstxt.html

          Comment

          • darsalles
            Member
            • Oct 2009
            • 49
            • 3.8.x

            #6
            Ok i'll provide more information.

            -I dont have Vbseo
            -Im using my forum with a subdomain: foro.xxxxxx.net i have the robot at this path foro.xxxxxxxxx.net/robots.txt
            -I dont want to disallow robots for all my pages

            Comment

            • doctor death
              Member
              • Oct 2005
              • 97
              • 3.5.x

              #7
              I use:

              User-agent: *
              Disallow: /


              Works great.

              Comment

              • bigwater
                Senior Member
                • Jan 2007
                • 592

                #8
                Then as CycleChat indicated, you'll need to specify a full path. The way it's formed now you are disallowing those specific files from being searched in the home directory only. A leading slash before a filename will always indicate that the file in question is in the home directory. Point it down the line to the actual file you're wanting blocked. /public_html/forums/referrers.php or whatever your path might be.
                Anybody who says "it can't be done" will usually be interrupted by somebody who is already doing it.

                Comment

                • darsalles
                  Member
                  • Oct 2009
                  • 49
                  • 3.8.x

                  #9
                  Originally posted by bigwater
                  Then as CycleChat indicated, you'll need to specify a full path. The way it's formed now you are disallowing those specific files from being searched in the home directory only. A leading slash before a filename will always indicate that the file in question is in the home directory. Point it down the line to the actual file you're wanting blocked. /public_html/forums/referrers.php or whatever your path might be.
                  And if I move the robots.txt to the home path, i have to indicate public_html too? or just leave it like:

                  /Forum/referrers.php

                  ...

                  Comment

                  • bigwater
                    Senior Member
                    • Jan 2007
                    • 592

                    #10
                    Robots.txt needs to be in the top level publically accessible directory of your domain, and needs to drill down to the specific file you want ignored. The specific pathing to your files, as well as your structuring (symlinks, etc) is unknown to me, so I can't really tell you exactly what syntax to use, but the bottom line is that if somebody goes to your domain...the directory that they hit first... probably the directory that contains your main web page... will need to have the robots.txt file in it. Forward slashes in the begining of the directive will redirect the file list from the home directory forward, so in the case you listed above, "/Forum/referrers.php" is most likely incorrect. In that case you are telling the server to go back to the home directory and look for Forum from there. Forum doesn't exist in home, so the bots just ignore the directive. You need to drop the leading forward slash and write it as "Forum/referrers.php" so that the server knows to look for the subdirectory Forum forward from the directory it's reading the robots.txt file from, and not backwards to home. If you want to use the leading slash you'll have to write it as a directive from home, such as /home/public_html/Forum/referrers.php (assuming that's the actual path to the file, again... I don't know your file structure.)

                    By using / as a directory, you're telling it to ignore your entire site, but if you put anything after that slash you are specifying what directories and files below / you want to be skipped.
                    Last edited by bigwater; Thu 7 Jan '10, 11:19am.
                    Anybody who says "it can't be done" will usually be interrupted by somebody who is already doing it.

                    Comment

                    • Dunhamzzz
                      Senior Member
                      • Jul 2008
                      • 140
                      • 3.8.x

                      #11
                      Originally posted by darsalles
                      Ok i'll provide more information.

                      -I dont have Vbseo
                      -Im using my forum with a subdomain: foro.xxxxxx.net i have the robot at this path foro.xxxxxxxxx.net/robots.txt
                      -I dont want to disallow robots for all my pages
                      This is all correct. Ignore what bigwater said. Obviously though, this robots.txt will not affect whatever you have in your top-level-non-subdomain site.

                      The bottom 3 however are invalid, /tags/, /member/, and /members/ do not exist in a default vbulletin setup.

                      You may want to replace the user agent from"mediapartners-google" with a * so it affects all bots.

                      Comment

                      • bigwater
                        Senior Member
                        • Jan 2007
                        • 592

                        #12
                        Great. I give a thorough and detailed explanation on how robots.txt works, and your advice is "ignore it". Makes me want to quit offering help on this site. I get paid for doing this stuff in real life, yet I do it for free here. Screw it. Handle it yourself.
                        Anybody who says "it can't be done" will usually be interrupted by somebody who is already doing it.

                        Comment

                        • Dunhamzzz
                          Senior Member
                          • Jul 2008
                          • 140
                          • 3.8.x

                          #13
                          Originally posted by bigwater
                          Great. I give a thorough and detailed explanation on how robots.txt works, and your advice is "ignore it". Makes me want to quit offering help on this site. I get paid for doing this stuff in real life, yet I do it for free here. Screw it. Handle it yourself.
                          Well what you said was wrong...

                          Comment

                          • nforums
                            Banned
                            • Jun 2009
                            • 619
                            • 3.8.x

                            #14
                            Originally posted by bigwater
                            Then as CycleChat indicated, you'll need to specify a full path. The way it's formed now you are disallowing those specific files from being searched in the home directory only. A leading slash before a filename will always indicate that the file in question is in the home directory. Point it down the line to the actual file you're wanting blocked. /public_html/forums/referrers.php or whatever your path might be.
                            But if its in the subdomain, the / will act as the subdomain root and not the public_html root so his robots.txt should be fine in the forum subdomain.

                            Comment

                            • bigwater
                              Senior Member
                              • Jan 2007
                              • 592

                              #15
                              For the last time, a leading / is going to instruct the server to start at the home directory with it's search of the path of the file in question. Without a leading slash it will start it's search from the directory the robots.txt file resides in. Argue with me all you want, tell me I'm wrong, but when your robots.txt doesn't work, don't cry. It's a simple fact.
                              Anybody who says "it can't be done" will usually be interrupted by somebody who is already doing it.

                              Comment

                              Related Topics

                              Collapse

                              Working...