• Resolved ericr23

    (@ericr23)


    Blocked for Exceeded the maximum global requests per minute for crawlers or humans:

    IP: 66.249.80.25 Hostname: google-proxy-66-249-80-25.google.com
    Human/Bot: Bot
    Google-Raichu

    IP: 74.125.215.19 Hostname: google-proxy-74-125-215-19.google.com
    Human/Bot: Bot
    Google-Raichu

    I can’t find any information about a Google crawler named “Raichu”, but reverse look-ups check out. (I use the “Verified Google crawlers have unlimited access to this site” option.) Please explain.

    • This topic was modified 3 years, 3 months ago by ericr23.
Viewing 7 replies - 1 through 7 (of 7 total)
  • Plugin Support wfpeter

    (@wfpeter)

    Hi @ericr23, thanks for getting in touch.

    Are these results directly from filtering your Live Traffic page in Wordfence > Tools? The output looks consistent with that but without the exact same format with date/time and first line I just wanted to check.

    Server access logs, PHP error logs, and logs from mod_security from around the date and time this occurred would be good areas for you to investigate, and let me know if anything there sticks out that might point to an error or other event that might have misidentified these crawlers as something that should be blocked.

    Could you please also send us a diagnostic report if you to wftest @ wordfence . com?.You can find the link to do so at the top of the Wordfence > Tools > Diagnostics page. Then click on “Send Report by Email”. Please add your forum username where indicated and respond here after you have sent it.

    Note: For the fastest response time, please make sure and add any information or questions directly to this topic and not the email address above unless asked.

    Thanks,

    Peter.

    Thread Starter ericr23

    (@ericr23)

    Sorry, yes, that’s the Live Traffic report. They just appeared yesterday again:

    United States was blocked: Exceeded the maximum global requests per minute for crawlers or humans. at https://…/2021/08/10/miami-county-proposing-clean-up-of-wind-ordinance/
    8/17/2021 3:09:57 PM (16 hours 36 mins ago)
    IP: 66.249.80.25 Hostname: google-proxy-66-249-80-25.google.com
    Human/Bot: Bot
    Google-Raichu
    Type: Blocked

    Irvine, California, United States was blocked for Exceeded the maximum global requests per minute for crawlers or humans. at https://…/2021/08/12/californias-clean-grid-may-lean-on-oil-gas-to-avoid-summer-bla…
    8/17/2021 8:27:03 AM (23 hours 19 mins ago)
    IP: 74.125.215.19 Hostname: google-proxy-74-125-215-19.google.com
    Human/Bot: Bot
    Google-Raichu

    I have Wordfence set to throttle all requests exceeding 120 per minute (except Googlebots). Oddly, whereas the burst of 350 requests in 3 minutes at 08:25-27 was blocked 60 times, only 1 of the 350 requests in 2 minutes at 15:07-09 was blocked. But there were 74 requests for that same page.

    Wordpress was updated yesterday to v7.5.5.

    Thread Starter ericr23

    (@ericr23)

    (I just realized that of course, 350 requests in 3 minutes is less than 120/min if they’re evenly spaced. But the pattern for this Google-Raichu bot is a burst of 350 requests as 50 requests within 1 second every 15-20 seconds. There were in fact several such bursts between the above 2 for which Wordfence blocked for too many requests.) But that’s a side-issue of interest; the question remains: why doesn’t WordPress recognize this apparently legit googlebot?

    Thread Starter ericr23

    (@ericr23)

    Or maybe the update to 7.55 took care of it.

    Plugin Support wfpeter

    (@wfpeter)

    Hi @ericr23,

    We’ve done a little digging on this. Google domains with google-proxy in the reverse lookup are not regular Google crawlers used for indexing the site despite rate-limited-proxy-66-249-90-77.google.com being an example looking fairly close on the Verifying Googlebot documentation. They can however be used to proxy other users requests when using Google Translate or possibly other Google features.

    Google Cloud Platform hosted servers don’t look quite like this, but if outbound requests go through a proxy with a different IP, like at some other hosts, it could still be an attacker where the Google proxy origin makes it look slightly legitimate.

    We’ve not seen one with the user-agent Google-Raichu before either, and there’s not much useful information out there about it. It may be possible that an attacker can craft requests to send through a Google tool, which may be dangerous, or could be used for scraping a site’s content (and bypassing rate limiting), so we wouldn’t recommend allowing it.

    Thanks again,

    Peter.

    Thread Starter ericr23

    (@ericr23)

    Many thanks for the informative explanation. I trust your judgement!

    Plugin Support wfpeter

    (@wfpeter)

    No worries @ericr23, always happy to help.

    If you have any Wordfence questions in future, simply start a new topic and we’ll assist you anytime.

    Peter.

Viewing 7 replies - 1 through 7 (of 7 total)
  • The topic ‘Googlebots not being recognized?’ is closed to new replies.