Russian Yandex Crawler

Collapse
X
 
  • Time
  • Show
Clear All
new posts
  • oldengine
    Senior Member
    • Oct 2004
    • 342
    • 3.7.x

    Russian Yandex Crawler


    Yandex Crawler

    The Yandex Russian crawler is trying to crawl me like this...

    This is a discussion forum powered by vBulletin. To find out about vBulletin, go to http://www.vbulletin.com/ .


    Vbulletin.org is showing the same effect as does my board.

    The full URL looks like this on vbulletin.com ...

    https://vbulletin.com/forum/index.ph...ve/archive/arc

    Code:
    https://vbulletin.com/forum/index.php/read/72116&expand/archive/chat/chat/chat/chat/chat/archive/archive/chat/chat/archive/chat/chat/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/archive/arc
    Any clue about what they are attempting to accomplish?

    Obviously vBulletin 5 has a solution to this.

    IPs
    141.8.43.172
    52.90.254.43
    100.43.90.161
    5.255.250.121





    Last edited by oldengine; Mon 23 Apr '18, 10:38pm.
  • William Thomas Jr
    Senior Member
    • Nov 2014
    • 526
    • 5.1.x

    #2
    Originally posted by oldengine
    Yandex Crawler

    The Yandex Russian crawler is trying to crawl me like this...



    Vbulletin.com is showing the same effect as does my board.

    The full URL looks like this...


    Code:


    Any clue about what they are attempting to accomplish?

    Any way to halt this behavior?

    IPs
    141.8.43.172
    52.90.254.43
    100.43.90.161
    5.255.250.121
    Isn't Yandex a Russian Search Engine? I think, if so, they are simply crawling your site to include it in their search listing. I'm not sure whether Yandex crawlers respect Robot.txt. You may want to search online to see whether anyone has successfully made a Robot.txt which commands Yandex not to crawl the site. If not, you'll have to ban by IP.

    Comment

    • oldengine
      Senior Member
      • Oct 2004
      • 342
      • 3.7.x

      #3
      The links in what you quoted are all messed up. Try the links in my post again as vbulletin.org displays the problem the same way as on my site.

      Comment

      • William Thomas Jr
        Senior Member
        • Nov 2014
        • 526
        • 5.1.x

        #4
        Originally posted by oldengine
        The links in what you quoted are all messed up. Try the links in my post again as vbulletin.org displays the problem the same way as on my site.
        If Yandex crawlers respond to robots.txt you can tell it to exclude certain search urls paths like:

        Disallow: /forum/index.php/read/72116&expand/archive/chat/*

        You'll need to put the appropriate url path in your robots.txt.

        I really wouldn't worry about it if I were you. However, the robots.txt can be used to disallow certain url paths or directories to crawlers which actually read it.

        Comment

        widgetinstance 262 (Related Topics) skipped due to lack of content & hide_module_if_empty option.
        Working...