Announcement

Collapse
No announcement yet.

My host indicate high load cpu of bots

Collapse
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • [Forum] My host indicate high load cpu of bots

    Hello.
    Its strange, I have a VPS and today writte to me that indicate this:
    Code:
    Unfortunately your VPS is consuming more than its fair share of CPU resources. Our Acceptable Use Policy defines CPU abuse as occurring when a VPS uses "25% or more of system CPU resources for longer then 90 seconds."
    
    Your VPS has been sporadically using significant CPU resources on the node:
    ====
    00:00:02 runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
    01:20:01 8 238 3.06 2.75 2.40 1
    02:20:01 8 228 3.44 3.02 2.55 0
    02:30:01 7 223 3.20 3.08 2.76 0
    03:30:01 7 234 3.58 2.72 2.40 0
    04:00:01 11 269 3.01 2.59 2.33 0
    05:00:01 8 233 3.45 2.90 2.51 0
    05:10:01 9 228 3.33 3.49 3.00 0
    12:20:01 10 223 3.15 2.66 2.33 0
    12:40:01 6 222 3.14 2.94 2.66 0
    13:50:01 9 266 3.35 2.29 2.10 0
    ====
    
    It looks like you have some over-aggressive bot crawling, based on user agent counts from the logs of mydomain.com:
    ====
    # grep '28\/Dec\/2018:1[45678]:..:..' /home/mydomain/access-logs/mydomain.com-ssl_log | rev | cut -d'"' -f2 | rev | sort | uniq -c | sort -n | awk '$1 > 100 {print}' | grep -i 'bot\|crawl\|scrape\|proxy'
    110 magpie-crawler/1.1 (U; Linux amd64; en-GB; +http://www.brandwatch.net)
    119 Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:49.0) Gecko/20100101 Firefox/49.0 (FlipboardProxy/1.2; +http://flipboard.com/browserproxy)
    129 Mozilla/5.0 (compatible; GrapeshotCrawler/2.0; +http://www.grapeshot.co.uk/crawler.php)
    202 Mozilla/5.0 (compatible; SEOkicks; +https://www.seokicks.de/robot.html)
    553 Mozilla/5.0 (Windows; U; Windows NT 6.0; en-GB; rv:1.0; trendictionbot0.5.0; trendiction search; http://www.trendiction.de/bot; please let us know of any problems; web at trendiction.com) Gecko/20071127 Firefox/50.0
    767 Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)
    771 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
    1056 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    2310 Mozilla/5.0 (compatible; musobot/1.0; info@muso.com; +http://www.muso.com)
    4442 Mozilla/5.0 (compatible; bingbot/2.0; +http://www.bing.com/bingbot.htm)
    5042 Mozilla/5.0 (compatible; SemrushBot/3~bl; +http://www.semrush.com/bot.html)
    7785 Mozilla/5.0 (compatible; BLEXBot/1.0; +http://webmeup-crawler.com/)
    167660 Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    ====
    
    We recommend reviewing your configurations for bot crawling and adjust for much less aggressive trawling.
    
    If you're on a legacy VPS plan, I recommend considering one of our new SSD-backed MVPS plans. The new plans have stricter core limits which should help normalize CPU usage, making unintentional CPU abuse less likely. You can find pricing details here, and please don't hesitate to contact xxx if you have any questions about upgrading: xxx
    
    Please understand that we are attempting to bring this to your attention so that it can be corrected. We appreciate your continued patronage with KnownHost. We will need to hear back from you in regards to the issue(s) described in this Abuse ticket within 24 hours.
    
    *** IMPORTANT: Our Abuse representatives are available during daytime hours in the EST/EDT timezone. For immediate technical assistance, or for general inquiries outside business hours, please open a ticket with our Support department at xxxxxxxx
    Yes, I have a legacy VPS I think that, its for change my actual legacy VPS to the new SSD VPS, but, its possible that I have this load cpu by bots?
    Its possible solved this?

    I disable the site for 2 or 3 hours, and the load of cpu are about 1.60 / 1.80
    I think its much load when I have disabled my site.

    I have a robots.txt:

    Code:
    User-agent: Mediapartners-Google
    Disallow:
    
    User-agent: https://webproxy.vpnbook.com/
    Disallow:
    
    User-agent: Baiduspider
    Disallow: /
    
    User-agent: *
    Disallow: *.js
    Disallow: /cable/
    Disallow: /test/
    Disallow: /images/
    Disallow: /foro/mybackup/
    Disallow: /foro/madp/
    Disallow: /foro/admincp/
    Disallow: /foro/attachments/
    Disallow: /foro/clientscript/
    Disallow: /foro/cpstyles/
    Disallow: /foro/customavatars/
    Disallow: /foro/customgroupicons/
    Disallow: /foro/customprofilepics/
    Disallow: /foro/downloads/
    Disallow: /foro/images/
    Disallow: /foro/install/
    Disallow: /foro/modcp/
    Disallow: /foro/ajax.php
    Disallow: /foro/attachment.php
    Disallow: /foro/calendar.php
    Disallow: /foro/cron.php
    Disallow: /foro/editpost.php
    Disallow: /foro/global.php
    Disallow: /foro/image.php
    Disallow: /foro/inlinemod.php
    Disallow: /foro/joinrequests.php
    Disallow: /foro/login.php
    Disallow: /foro/member.php
    Disallow: /foro/memberlist.php
    Disallow: /foro/misc.php
    Disallow: /foro/moderator.php
    Disallow: /foro/newattachment.php
    Disallow: /foro/newreply.php
    Disallow: /foro/newthread.php
    Disallow: /foro/online.php
    Disallow: /foro/postings.php
    Disallow: /foro/report.php
    Disallow: /foro/reputation.php
    Disallow: /foro/search.php
    Disallow: /foro/sendmessage.php
    Disallow: /foro/showgroups.php
    Thanks in advance.

  • #2
    The only bot that will use your robots.txt file is a bot that won't flood your site to begin with. Bots designed to misbehave won't use it. Heck, Googlebot won't always abide by the robots.txt file and it is one of the better bots out there. However, for your robots.txt file to be effective, it has to be in the root directory of your site (public_html). If you have vBulletin in a sub-directory and the file is there, it won't have any effect.

    The only way to control bad operators like bots that use too many resources on your server is to ban them at the server level. You can attempt to do this in an .htaccess file. Otherwise the server and/or network router of the datacenter needs to be modified. Since it seems you have a VPS, you should be able to edit the server configuration files and/or add server modules to control bots before they get to the vBulletin application. The vBulletin application has no ability to control bots without using resources. Even if you ban their IP address, there will be a number of SQL queries run and resulting PHP processing to show them a "No Permission" page.
    Translations provided by Google.

    Wayne Luke
    The Rabid Badger - a vBulletin Cloud customization and demonstration site.
    vBulletin 5 Documentation - Updated every Friday. Report issues here.
    vBulletin 5 API - Full / Mobile
    I am not currently available for vB Messenger Chats.

    Comment


    • #3
      Ok, thanks for your explanation, I understand what you tell me.
      Actually my forum is in a subfolder, but I have the robots.txt in public_html.
      I understand that I can block access to them from .htaccess but for this it is my host who must tell me what bots are generating the consumption of resources, they told me that googlebot had done in one day 17k queries, something that surprised me a lot ...
      I want Google to index my site but ... can bot access really generate that load on the server?
      The load on the server goes up by an average of 2 points, I see it a lot for the bots.
      I ask this because, I have a VPS, but it is a legacy plan, and I have already been offered to change to a new VPS plan with SSD where, according to them, there is better control over CPU consumption and restrictions on it, not being the shared CPU .
      This surprised me, and that is why I thought that they are indicating this to update my VPS plan.
      For years, cpu consumption has not exceeded on average 1.00 and now, the average is 2.5 surpassing 3.0 sometimes. My site does not have more traffic than last month or two months ago or 3 months ago ...
      That's why I'm surprised that the bots generate that load on the cpu.
      But if you inform me that this may be so, then I will look to see if I can block them by htaccess.
      Thank you and happy new year.

      Comment

      Related Topics

      Collapse

      Working...
      X