Large increase in the number of indexed pages

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • lelandmcfarland
    New Member
    • Feb 2012
    • 14
    • 3.0.9

    [Forum] Large increase in the number of indexed pages

    I have ran into a bit of a problem with my forum. I recently check my Google webmaster tools and found that the number of pages indexed increased from 28,000 to 567,000 within 3 months. I traced the problem to the forums that I installed on the site but that is far as I got. My forum url is: http://smallbiztrends.com/members/.
  • Andy
    Senior Member
    • Jan 2002
    • 5886
    • 4.1.x

    #2
    I suggest removing the Archive, it's not needed.

    Comment

    • lelandmcfarland
      New Member
      • Feb 2012
      • 14
      • 3.0.9

      #3
      That is a good suggestion. I plan on getting rid of it. That could cause some really bad duplicate content issues and direct search traffic to a place where the user can not respond to the post without clicking another link.

      I did a little digging and found that the members section is my problem. If you do a site search and drill down to a specific user, they have a very large number of search results. Example: https://www.google.com/search?q=site...350-Garnetjo67. The bad part is that some of these pages are blank. In the example I provided, this user has no friends, no photos, and no activity but he has 143+ pages of blank activity pages indexed and 161+ pages of blank friends activity.

      My solution for all of this is going to have to be blocking the search engines from these 2 section until vBulletin handles these sections a little better.

      Comment

      • Andy
        Senior Member
        • Jan 2002
        • 5886
        • 4.1.x

        #4
        Nice detective work lelandmcfarland.

        On my forum, I don't have features like friends, photos and activity. I suggest turning off these functions. Another thing I do is use the Standard URLs. I think Google has an easier time with them. Also using Standard URLs you never have an issue with thread title changes and URLs showing the old thread title.

        Comment

        • djbaxter
          Senior Member
          • Aug 2006
          • 1418
          • 4.2.5

          #5
          This isn't a vBulletin issue. Those are features that can be disabled or, if you want to keep the features, simply create an appropriate robots.txt file. (You can also make the members list invisible to guests.)

          Code:
          User-agent: *
          Disallow: /archive/
          Disallow: /calendar.php
          Disallow: /cron.php
          Disallow: /editpost.php
          Disallow: /joinrequests.php
          Disallow: /login.php
          Disallow: /member.php
          Disallow: /misc.php
          Disallow: /moderator.php
          Disallow: /newreply.php
          Disallow: /newthread.php
          Disallow: /online.php
          Disallow: /printthread.php
          Disallow: /private.php
          Disallow: /profile.php
          Disallow: /register.php
          Disallow: /search.php
          Disallow: /sendmessage.php
          Disallow: /showgroups.php
          Disallow: /subscription.php
          Disallow: /subscriptions.php
          Disallow: /threadrate.php
          Disallow: /usercp.php
          Last edited by djbaxter; Wed 29 Aug '12, 12:27pm.
          Psychlinks Web Services Affordable Web Design & Site Management
          Specializing in Small Businesses and vBulletin/Xenforo Forums

          Comment

          • Zachery
            Former vBulletin Support
            • Jul 2002
            • 59097

            #6
            Don't list folders that won't normally be crawled in your robots.txt

            There are no public facing links to an /admincp/ /modcp/ /backup/ folders in your software as it is. Don't give hackers direct knowledge of these locations.

            Comment

            • Zachery
              Former vBulletin Support
              • Jul 2002
              • 59097

              #7
              Don't list folders that won't normally be crawled in your robots.txt

              There are no public facing links to an /admincp/ /modcp/ /backup/ folders in your software as it is. Don't give hackers direct knowledge of these locations.

              Comment

              • djbaxter
                Senior Member
                • Aug 2006
                • 1418
                • 4.2.5

                #8
                Good point. Twice.
                Psychlinks Web Services Affordable Web Design & Site Management
                Specializing in Small Businesses and vBulletin/Xenforo Forums

                Comment

                • lelandmcfarland
                  New Member
                  • Feb 2012
                  • 14
                  • 3.0.9

                  #9
                  @djbaxter thank you for the list for robot.txt. It is a great help.

                  Comment

                  • djbaxter
                    Senior Member
                    • Aug 2006
                    • 1418
                    • 4.2.5

                    #10
                    See modified version per comments by Zachery above.
                    Psychlinks Web Services Affordable Web Design & Site Management
                    Specializing in Small Businesses and vBulletin/Xenforo Forums

                    Comment

                    • Andy
                      Senior Member
                      • Jan 2002
                      • 5886
                      • 4.1.x

                      #11
                      Originally posted by djbaxter
                      This isn't a vBulletin issue. Those are features that can be disabled or, if you want to keep the features, simply create an appropriate robots.txt file. (You can also make the members list invisible to guests.)

                      Code:
                      User-agent: *
                      Disallow: /archive/
                      Disallow: /calendar.php
                      Disallow: /cron.php
                      Disallow: /editpost.php
                      Disallow: /joinrequests.php
                      Disallow: /login.php
                      Disallow: /member.php
                      Disallow: /misc.php
                      Disallow: /moderator.php
                      Disallow: /newreply.php
                      Disallow: /newthread.php
                      Disallow: /online.php
                      Disallow: /printthread.php
                      Disallow: /private.php
                      Disallow: /profile.php
                      Disallow: /register.php
                      Disallow: /search.php
                      Disallow: /sendmessage.php
                      Disallow: /showgroups.php
                      Disallow: /subscription.php
                      Disallow: /subscriptions.php
                      Disallow: /threadrate.php
                      Disallow: /usercp.php
                      Almost all of these would not be seen by a spider, so I don't understand why you would have them in your robots.txt file.

                      Comment

                      • djbaxter
                        Senior Member
                        • Aug 2006
                        • 1418
                        • 4.2.5

                        #12
                        @Andy

                        From Wayne Luke:



                        See also:




                        - - - Updated - - -

                        Wayne's version is probably an improvement:

                        Code:
                        User-agent: *
                        Disallow: *.js
                        Disallow: /clientscript/
                        Disallow: /cpstyles/
                        Disallow: /customavatars/
                        Disallow: /customprofilepics/
                        Disallow: /images/
                        Disallow: /ajax.php
                        Disallow: /attachment.php
                        Disallow: /calendar.php
                        Disallow: /cron.php
                        Disallow: /editpost.php
                        Disallow: /global.php
                        Disallow: /image.php
                        Disallow: /inlinemod.php
                        Disallow: /joinrequests.php
                        Disallow: /login.php
                        Disallow: /member.php
                        Disallow: /memberlist.php
                        Disallow: /misc.php
                        Disallow: /moderator.php
                        Disallow: /newattachment.php
                        Disallow: /newreply.php
                        Disallow: /newthread.php
                        Disallow: /online.php
                        Disallow: /poll.php
                        Disallow: /post.php
                        Disallow: /postings.php
                        Disallow: /printthread.php
                        Disallow: /private.php
                        Disallow: /profile.php
                        Disallow: /register.php
                        Disallow: /report.php
                        Disallow: /reputation.php
                        Disallow: /search.php
                        Disallow: /sendmessage.php
                        Disallow: /showgroups.php
                        Disallow: /showpost.php
                        Disallow: /subscription.php
                        Disallow: /threadrate.php
                        Disallow: /usercp.php
                        Disallow: /usernote.php
                        Psychlinks Web Services Affordable Web Design & Site Management
                        Specializing in Small Businesses and vBulletin/Xenforo Forums

                        Comment

                        • Andy
                          Senior Member
                          • Jan 2002
                          • 5886
                          • 4.1.x

                          #13
                          Hi djbaxter,

                          I suppose one should see just how many of those disallows are being indexed.

                          One can enter the following into Google to see, here's an example:

                          site:vbulletin.com/forum/subscription.php

                          or

                          site:vbulletin.com/forum/member.php

                          Some things you would want Google to index, for example the member.php pages. On my forum I have about 7,000 members and Google indexed about 10,000 member.php pages. Not sure where the extra 3,000 results are.

                          Comment

                          • djbaxter
                            Senior Member
                            • Aug 2006
                            • 1418
                            • 4.2.5

                            #14
                            Why would anyone want the member.php pages indexed? What value is that adding to search or to your forum?

                            As for your search suggestion, that won't work - it isn't member.php itself that is indexed. A spider would follow member.php to index pages returned by that executable. Using your own example, there are not 7000 or 10000 copies of member.php on your vBulletin site or any other vBulletin site.
                            Psychlinks Web Services Affordable Web Design & Site Management
                            Specializing in Small Businesses and vBulletin/Xenforo Forums

                            Comment

                            • Andy
                              Senior Member
                              • Jan 2002
                              • 5886
                              • 4.1.x

                              #15
                              Originally posted by djbaxter
                              Why would anyone want the member.php pages indexed? What value is that adding to search or to your forum?
                              The member.php pages are those like this one:



                              Why wouldn't you want Google indexing your members, it's important information. For example if I wanted to see if djbaxter is on any other forums or a spammer I can get that information if forums allowed the member.php to be index by search engines.

                              Originally posted by djbaxter
                              As for your search suggestion, that won't work - it isn't member.php itself that is indexed. A spider would follow member.php to index pages returned by that executable. Using your own example, there are not 7000 or 10000 copies of member.php on your vBulletin site or any other vBulletin site.
                              Each member is indexed. So if you have 7,000 members theoretically Google should index 7,000 times the member.php page.

                              Comment

                              widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
                              Working...