BoardReader.com

Collapse
This topic is closed.
X
X
 
  • Time
  • Show
Clear All
new posts
  • Paul J
    Senior Member
    • Sep 2001
    • 1053

    #16
    I really don't know but google's cache leaves the page intact. It doesn't put it's own label and format around another site's content.

    I think the idea of boardreader.com is great, but the cache thing really strikes me as having the possibility of abuse.

    Comment

    • JimF
      Senior Member
      • May 2000
      • 1988

      #17
      Originally posted by Paul J
      I really don't know but google's cache leaves the page intact. It doesn't put it's own label and format around another site's content.

      I think the idea of boardreader.com is great, but the cache thing really strikes me as having the possibility of abuse.
      'zactly.

      Comment

      • spurdon
        Member
        • Jul 2001
        • 41

        #18
        Hello All,

        One of the members from this message board emailed me and asked me to answer some of your questions.

        First, thank you to those who enjoy boardreader.com, and to the others who wonder of our intentions- we have no intentions other then serving/helping the message board community to interact with one another.

        The real burning question that I see here is the cache. What is right or wrong? I'm not sure what to do with this very issue or with the issue of Boardreader. During the first stages of Boardreader we asked the question: Do we have a cache? Or put it another way: Do we show users the cache?

        The problem was that so many forums drop threads over time that we were linking back to nothing. The threads were gone. We can only index our database so fast so by the time we published something new it was already deleated in many cases. Another reason for the cache was because the location of message boards change a lot so we were linking back to redirected message board locations which didn't help individuals who wanted to read something. They our dynamic pages both in text and in location. So again, what to do? We decided from this to show our cache.

        The final reason we decided to do it was because Google does it. We crawl sites just like Google does. We index a web page and then search it. Google searches message boards some times but Boardreader does it exculsively because we concentrate on these types of dynamic pages. We try and pride our search on being able to search dynamic pages with as little problems as possible (we are always working on getting better at this hopefully).

        The main difference between what we do with the cache and what Google does with its cache is the fact that they have the money to pay the bandwidth to crawl sites and take back html. We do not take back html and therefore we can not show the cache like Google shows cache. The other problem with showing the cache like Google does is because dynamic pages are very intensive on bandwidth as it stands currently and if you added html to our crawl as well it would be even more bandwidth for the webmaster and the crawler that it would not make it worth searching message boards at all!

        Maybe Boardreader has seen its day and we should just let Google continue to rule the search engine world? Maybe that would help everyone. We can not pull back the html (pure economics) but Google can and that is why there cache is so nice. I wish we could so that our cache would look like Googles. I hope this helps a little!!!

        I look forward to more questions if you want to address Boardreader.

        Best,
        Scott

        Comment

        • JimF
          Senior Member
          • May 2000
          • 1988

          #19
          Scott,

          Thanks for taking the time to answer some questions. I think Boardreader is a great service as far as searching message boards. Economics, as much as I understand the reasoning, doesn't justify violating copyrights and whatnot.

          As far as comparisons to Google, I don't think it's their cache that makes them great. I think it's their relevency. Therefore, if a link is removed or moved, it doesn't show up. What good does showing a user a page that might be outdated do?

          If I want to search message boards on a topic, I want the most recent and relevent results I can get. I don't want results from a site that has moved, or felt the thread was unimportant enough to delete.

          -jim

          Comment

          • spurdon
            Member
            • Jul 2001
            • 41

            #20
            JimF,

            Thank you for the comment. Your comments are well taken especially the comment about searching the most recent threads. Currently we are looking at getting rid of old data and just showing new data. I guess the reason we did not do this earlier was we wanted to be like a deja of message boards (keeping old info alive for people to read and learn from).

            As far as the copyright we try and link back to the site in the cache (on the top of the cache we link to the http://www.....) and show that this is not our data but I do see why you might take this as copyright violation. What I can do is remove the cache. I will look into this. Would this be better?

            Best,

            Scott

            Comment

            • irc
              Senior Member
              • Jan 2001
              • 404

              #21
              I personally like the deja like quality of seeing old threads even if they have been purged elsewhere.

              As for the copyright implications. First off, putting things on the internet means that it's going to show up on other sites in various forms. As long as there are clear links to the original and no attempt is made to take credit for the work, I think this is an example of fair use. The disclaimer on the top of cached pages is quite clear that it is not original content.

              I would hate to simply see all that info disappear...

              One criticism of the cache version is that there should be a very prominent link to the original at the top of the cache display by the disclaimer.

              Comment

              • spurdon
                Member
                • Jul 2001
                • 41

                #22
                irc,

                Good point on the cached link back to the site. I will make our link more clear and see what people think before I try removing all the cached info.

                Best,
                Scott

                Comment

                • SkuZZy
                  Senior Member
                  • Aug 2002
                  • 447

                  #23
                  I don't think any copyright laws are really being violated just by not keeping the html intact. It's stated very clearly that the content isn't their own at the top of the cached pages and it shows the original URL. I think it's a great service. I did submit my site earlier (www.battleforums.com), so i'll wait and see what kind of results I get from it. Personally though, I have no problem with their cache. I do have a suggestion though. On the application form when sites apply to have their boards listed, perhaps you could add additional fields that allowed submitted sites to choose the color/layout/logo at the top of their cached pages? This would allow us to choose what color our cached background is and the font colors that are used, as well as our logo. This would definately help distinguise sites apart from your own, without causing you any excess bandwidth costs. I realize, however, that the majority of complaints will probably come from sites that didn't submit to your engine and were spidered by you without their permission, so maybe you should make take further measures to ensure that sites which are spidered without your permission keep their main logo or background color intact, otherwise you'll get some guy who has never heard of your site thinking your stealing his posts (like we've seen already) and you might end up getting into trouble. By the way, I submitted my site yesterday (www.battleforums.com). Any idea when it will be spidered, spurdon?

                  scsi
                  Last edited by SkuZZy; Wed 2 Oct '02, 3:51pm.
                  -
                  Visit the Web Scripts Directory @ http://www.scriptz.com
                  -
                  PHP, CGI, Perl, ASP, JavaScript, CFML, Python and more!

                  -

                  Comment

                  • spurdon
                    Member
                    • Jul 2001
                    • 41

                    #24
                    SkuZZy,

                    Great ideas on the site submissions and the cache layouts. I'm not sure how much work that would be though. As far as our crawling we try and only search open public sites that submit their site but sometimes it is hard to varify who is submitting those sites. We do however respect robots.txt so hopefully that helps some. I still think that maybe closing the cache would be the best way to stay away from harming peoples copyright. We will see what happens with that. I'm going to try a few different things to the cache to see if that solves some of the problems.

                    I just submitted your site just now. It will take about a week before you can see your site on Boardreader. I am back logged with site submission now so when I get to yours you may find an email telling you it has been done again.

                    Thanks for the comments,

                    Scott

                    Comment

                    • chrispadfield
                      Senior Member
                      • Aug 2000
                      • 5366

                      #25
                      Personally i think what you should do is have your own robots.txt command you will obey; essentially a no cache one just like google has. If a site dosen't want to be cached then it won't be.
                      Christopher Padfield
                      Web Based Helpdesk
                      DeskPRO v3.0.3 Released - Download Demo Now!

                      Comment

                      • spurdon
                        Member
                        • Jul 2001
                        • 41

                        #26
                        GOOD POINT!

                        Comment

                        • chrispadfield
                          Senior Member
                          • Aug 2000
                          • 5366

                          #27
                          And you also need a method for a site to have their cache deleted. Once you have done that I think you have pretty much every base covered. Generally I think someone is going to visit the site rather than look at the cache and only use the cache if the site is down. That is certainly how i use google's cache anyway.
                          Christopher Padfield
                          Web Based Helpdesk
                          DeskPRO v3.0.3 Released - Download Demo Now!

                          Comment

                          • spurdon
                            Member
                            • Jul 2001
                            • 41

                            #28
                            chrispadfield,

                            Thanks again for the ideas. It is getting to be a lot to do to keep the cache but I will see what I can do. I am thinking now it might be just better to delete the cache and if info is there at a site then great if not then maybe that person can email the original sites webmaster to try and get that info. Don't take this the wrong way it is just hard to please everyone out there on this cache topic. Lets see what happens but thanks again for the input!!!

                            Scott

                            Comment

                            • SkuZZy
                              Senior Member
                              • Aug 2002
                              • 447

                              #29
                              Originally posted by spurdon
                              chrispadfield,

                              Thanks again for the ideas. It is getting to be a lot to do to keep the cache but I will see what I can do. I am thinking now it might be just better to delete the cache and if info is there at a site then great if not then maybe that person can email the original sites webmaster to try and get that info. Don't take this the wrong way it is just hard to please everyone out there on this cache topic. Lets see what happens but thanks again for the input!!!

                              Scott
                              I think as long as you provide the ability for sites to stop spidering them (robots.txt) then you shouldn't have a problem. However, if you want to beable to collect info for your cache without sites being able to do anything about it, then from what i've heard, you can't really modify the source and just take out the content, you have to leave the page intact (which is how google gets away with it without getting tons of copyright infringement threats). Like I said above though, you could probably continue doing what you're doing now, but you should provide public information about how webmasters can stop your script from spidering their site using robots.txt (in your F.A.Q or something).

                              You may also run into problems with sites that use robots.txt, but have already had their sites spidered (before they put the robots.txt file on) and they want to have their content removed from your site. This could cause some problems, since there really isn't a way to verify whether the person requesting to have the content removed is actually the owner of the site. You could go about it the same that google does (read http://www.google.com/remove.html) and have the same sort of process they do. Bottom line is, if you can't cache the entire pages (images and html) then you'll have to implement other steps that will help avoid confusion.

                              SkuZZy

                              (P.S.) - Thanks for accepting my site, i'll look forward to it spidering my forums and when it's done i'll post a review here about how good of a job it did
                              -
                              Visit the Web Scripts Directory @ http://www.scriptz.com
                              -
                              PHP, CGI, Perl, ASP, JavaScript, CFML, Python and more!

                              -

                              Comment

                              • spurdon
                                Member
                                • Jul 2001
                                • 41

                                #30
                                "You could go about it the same that google does (read http://www.google.com/remove.html) and have the same sort of process they do. Bottom line is, if you can't cache the entire pages (images and html) then you'll have to implement other steps that will help avoid confusion."

                                SkuZZy,

                                I will read the Google rules on the robots.txt asap. Thanks.

                                Also just so you know the first crawl of your battleforums is the most intensive just so you know, but after the first index it should be just fine. Let me know how it works and if it goes well please submit your other boards.

                                Scott

                                Comment

                                Related Topics

                                Collapse

                                • alex2220
                                  seo issues
                                  by alex2220
                                  Hi all,
                                  Running this cloud version for a few months nou make me thinks that there is something slow in seo
                                  Good traffic, good rank for the age but still shows in page 4,5,6 on google searches...
                                  Sat 4 Mar '17, 2:02am
                                • dnnglobal
                                  Sitemap Not Working
                                  by dnnglobal
                                  Hello,

                                  I noticed I possibly don't have a sitemap. I'm wondering how I can create one so my site is easily readable and crawlable by Google. Do i need to manually create a sitemap? My search...
                                  Fri 28 Sep '18, 4:06am
                                Working...