• Resolved Melissa Davis

    (@melissa-davis)


    I have three questions on blocking bots that abuse web hosting resources.

    1. For a website that does not use a CDN like CloudFlare, can WordFence be setup to block Bots like Bytespider, PetalBot, AhrefsBot, and MJ12Bot?
    2. How does the process work?
    3. Is this feature available in the free or paid version of WordFence?

    Thanks!

Viewing 2 replies - 1 through 2 (of 2 total)
  • Plugin Support wfpeter

    (@wfpeter)

    Hi @melissa-davis, thanks for getting in touch! The forums are only for support regarding the free version of Wordfence, so the following information is appropriate for all versions of the plugin.

    You can often check whether genuine bots from these services will observe robots.txt disallow / type rules as they publicly state that kind of information, along with user-agent and/or IP ranges. There can be issues with SEO rankings amongst other things if you flat-out decide to block certain crawlers, but of course this is at your discretion.

    For example, Ahrefs uses a user-agent that contains the string, “AhrefsBot” amongst other text so you can use wildcards in the Wordfence > Blocking > Custom Pattern section, inputting *AhrefsBot* in the “Browser User Agent” field. However, as Ahrefs is considered to be observant of robots.txt, my example may be more appropriate for ones that aren’t, or are pretending to be AhrefsBot.

    Our process works by serving a Wordfence-branded block page (before any site content is loaded if the Wordfence firewall is optimized) to any site visitors that trigger your custom blocking rule.

    For your information, general treatment of crawlers can also be set in the Rate Limiting section of Wordfence > All Options. I set my Rate Limiting Rules to these values to start with:
    Rate Limiting Screenshot

    • If anyone’s requests exceed – 240 per minute
    • If a crawler’s page views exceed – 120 per minute
    • If a crawler’s pages not found (404s) exceed – 60 per minute
    • If a human’s page views exceed – 120 per minute
    • If a human’s pages not found (404s) exceed – 60 per minute
    • How long is an IP address blocked when it breaks a rule – 30 minutes

    I also always set the rule to Throttle instead of Block. Throttling is generally better than blocking because any good service understands what happened if it is mistakenly blocked and your site isn’t penalized because of it. Make sure and set your Rate Limiting Rules realistically and set the value for how long an IP is blocked to 30 minutes or so.

    Thanks,
    Peter.

    Thread Starter Melissa Davis

    (@melissa-davis)

    Thanks for your detailed reply and helpful suggestion on using the Block and Throttle features, @wfpeter. I have a couple of questions:

    On my cPanel Raw Access Log, I see lines of the following type:

    Item 1

    107.180.58.67 – – [18/May/2023:19:18:24 -0700] POST /?wordfence_syncAttackData=1684462704.8768 HTTP/1.1 200 – https://www.domainname.com/?wordfence_syncAttackData=1684462704.8768 WordPress/5.2.17; https://www.domainname.com 1037 **1/1037037**

    Can you please explain what this means and why this has a 200 code?

    Item 2

    I also see the following lines:

    [18/May/2023:19:18:12 -0700] 200 23195 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 33 **0/33644** [18/May/2023:19:18:14 -0700] 200 21466 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 149 **0/149069** [18/May/2023:19:18:30 -0700] 200 22019 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 139 **0/139465** [18/May/2023:19:18:32 -0700] 200 21941 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 32 **0/32340** [18/May/2023:19:18:43 -0700] 200 23563 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 154 **0/154326** [18/May/2023:19:18:46 -0700] 200 23597 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 22 **0/22308** [18/May/2023:19:18:52 -0700] 200 21822 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 38 **0/38481** [18/May/2023:19:19:06 -0700] 200 21803 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 161 **0/161123** [18/May/2023:19:19:08 -0700] 200 22629 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 144 **0/144497** [18/May/2023:19:19:10 -0700] 200 22495 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 32 **0/32354** [18/May/2023:19:19:15 -0700] 200 22610 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 248 **0/248361** [18/May/2023:19:19:18 -0700] 200 24848 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 159 **0/159826** [18/May/2023:19:19:23 -0700] 200 23175 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 39 **0/39945** [18/May/2023:19:19:25 -0700] 200 24960 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 30 **0/30143** [18/May/2023:19:19:32 -0700] 200 24960 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 53 **0/53630** [18/May/2023:19:19:42 -0700] 200 25029 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 37 **0/37750** [18/May/2023:19:19:46 -0700] 200 25270 – Mozilla/5.0 (compatible; BLEXBot/1.0; +https://webmeup-crawler.com/) 34 **0/34441**

    I have currently set the crawler to be throttled if the page views exceed 120 per minute as you suggested.

    Would this throttling rate be enough to deter webmeup-crawler.com? Do I need to lower the crawler throttling threshold from 120 per minute to say 5 per minute?

    Thanks!

Viewing 2 replies - 1 through 2 (of 2 total)
  • The topic ‘Blocking bots abusing web hosting resources’ is closed to new replies.