• I have this problem, here is an excerpt from the Nginx access log of the server:

    66.249.66.6 – – [19/Sep/2022:03:57:27 +0200] “GET /wp-json/wp-statistics/v2/hit?_=1660875809&_wpnonce=c70dfa0d9f&wp_statistics_hit_rest=yes&browser=Googlebot&platform=Unbekannte&version=2.1&device=bot&model=Unknown&referred=https%3A%2F%2Ftimreeves.de&ip=66.249.66.148&exclusion_match=yes&exclusion_reason=CrawlerDetect&ua=Mozilla%2F5.0+%28compatible%3B+Googlebot%2F2.1%3B+%2Bhttp%3A%2F%2Fwww.google.com%2Fbot.html%29&track_all=1&timestamp=1660883009&current_page_type=post_tag&current_page_id=22&search_query&page_uri=/internet-technologie/tag/nginx/&user_id=0 HTTP/1.1” 403 105 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.102 Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)”

    The HTTP code is 403, the reason is clear: exclusion_reason=CrawlerDetect

    This is quite understandable that you do this. BUT the big problem is when the access log is being monitored by fail2ban, it sees all the 403’s and blocks the “offender” in the linux firewall – in this case, Googlebot gets blocked. Oops.

    For such cases an option would be very useful to return “401 Unauthorized Request ” instead of a 403, as the latter will normally not be monitored by fail2ban, as it does not signify a real error.

    Interested to hear your thoughts, and thanks for a great plugin!

    Tim

Viewing 8 replies - 1 through 8 (of 8 total)
  • Reza

    (@reventon94)

    Hi,

    Thank you for reporting this issue.

    Our developers will work on this matter to find a solution to solve it.

    Best Regards

    Plugin Author Mostafa Soufi

    (@mostafas1990)

    Hi Tim,

    Your detailed report is much appreciated, thank you!

    As I simulated your request with the same payload, the HTTP code in the response was 200, since the plugin just returns the 200 in the response.

    https://capture.dropbox.com/EJ0O21fCBdeMIpkV
    https://capture.dropbox.com/ptnVsMMgAUon7knl
    https://capture.dropbox.com/CrstouqyhE9hcsLb

    Btw, I guess you’re using the old version of the WP Statistics since the browser, platform, version, IP, and timestamp are removed from the payload in the latest version

    https://github.com/wp-statistics/wp-statistics/commit/965545af177ccc68d6b133e03d4ad8ddcc5db1f9

    Looking forward to hearing from you!

    Thread Starter Tim Reeves

    (@tim-reeves)

    Hi Mostafa,

    just got round to looking at this again – prompted by my access logs still being too long. I’m using the current version 13.2.15 (and nginx btw).

    66.249.76.224 – – [21/Jan/2023:10:32:59 +0100] “GET /wp-json/wp-statistics/v2/hit?_=1657532437&_wpnonce=a497ad569d&wp_statistics_hit_rest=yes&browser=Googlebot&platform=Unbekannte&version=2.1&device=bot&model=Unknown&referred=https%3A%2F%2Ftimreeves.de&ip=66.249.64.192&exclusion_match=yes&exclusion_reason=CrawlerDetect&ua=Mozilla%2F5.0+%28compatible%3B+Googlebot%2F2.1%3B+%2Bhttp%3A%2F%2Fwww.google.com%2Fbot.html%29&track_all=1×tamp=1657539637¤t_page_type=post_tag&current_page_id=22&search_query&page_uri=/internet-technologie/tag/nginx/&user_id=0 HTTP/1.1” 403 107 “-” “Mozilla/5.0 (L
    inux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.74 Mobile Safari/53
    7.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)”

    When I call up the website myself, as a normal visitor, Firefox Console shows me a 200 reply, as expected.

    Do you have any ideas? I’ve just added /wp-json/ to robots.txt, which should save the plugin some work, but even so…

    Many thanks for your work for the community, really appreciated!

    Plugin Author Mostafa Soufi

    (@mostafas1990)

    Thank you for giving the useful information.

    Weird indeed, because most of those parameters have been removed in the newer version to make the URLs shorter.

    Also, the page_uri, is encoded to base64

    https://github.com/wp-statistics/wp-statistics/blob/master/includes/class-wp-statistics-helper.php#L1117

    Can you please confirm you’re version is v13.2.15 also please clear the cache.

    Best

    • This reply was modified 1 year, 10 months ago by Mostafa Soufi.
    Thread Starter Tim Reeves

    (@tim-reeves)

    Dear Mostafa, as you can see I’m using W3 Total Cache (which is pretty good at clearing its caches) and WP Cerber – and the current version of WP Statistics.

    I’ve manually cleared the cache anyway and emptied the log file. Now I must wait a bit for a few bots to come by, will give you an update soon.

    Again, many thanks!

    Plugin Author Mostafa Soufi

    (@mostafas1990)

    Sounds good, please let me know once you received something useful in the log.

    Best

    Thread Starter Tim Reeves

    (@tim-reeves)

    66.249.69.186 – – [21/Jan/2023:16:11:59 +0100] “GET /wp-json/wp-statistics/v2/hit?_=1654885129&_wpnonce=b9de9124f8&wp_statistics_hit_rest=yes&browser=Googlebot&platform=Unbekannte&version=2.1&referred=https%3A%2F%2Ftimreeves.de&ip=66.249.64.195&exclusion_match=yes&exclusion_reason=CrawlerDetect&ua=Mozilla%2F5.0+%28compatible%3B+Googlebot%2F2.1%3B+%2Bhttp%3A%
    2F%2Fwww.google.com%2Fbot.html%29&track_all=1×tamp=1654892329¤t_page_type=category¤t_page_id=13&search_query&page_uri=/internet-technologie/kategorie/linux-server/&user_id=0 HTTP/1.1″ 403 107 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.5414.74 Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)”

    I’ve also checked the version of WP Statistics via FTP, downlaoded it to my PC. The files are from 13.01.2023 15:06 and the readme.txt says Stable tag: 13.2.15

    I do have a lot of security stuff in my nginx config, but also this line to let WP Statistics off the hook:
    if ( $request_uri ~ “^/wp-json/wp-statistics” ) { set $susquery 0; }

    In WP Cerber the REST API namespace “wp-statistics” is explicitly allowed. This must work because normal visitors are getting a 200 reply.

    I’m out of ideas…

    Plugin Author Mostafa Soufi

    (@mostafas1990)

    Again not sure you receive the old request payload on the Nginx log since the request on your website is correct

    Best

Viewing 8 replies - 1 through 8 (of 8 total)
  • The topic ‘Problem with 403 in webserver acces log from robot visit’ is closed to new replies.