Problem with 403 in webserver acces log from robot visit
-
I have this problem, here is an excerpt from the Nginx access log of the server:
66.249.66.6 – – [19/Sep/2022:03:57:27 +0200] “GET /wp-json/wp-statistics/v2/hit?_=1660875809&_wpnonce=c70dfa0d9f&wp_statistics_hit_rest=yes&browser=Googlebot&platform=Unbekannte&version=2.1&device=bot&model=Unknown&referred=https%3A%2F%2Ftimreeves.de&ip=66.249.66.148&exclusion_match=yes&exclusion_reason=CrawlerDetect&ua=Mozilla%2F5.0+%28compatible%3B+Googlebot%2F2.1%3B+%2Bhttp%3A%2F%2Fwww.google.com%2Fbot.html%29&track_all=1×tamp=1660883009¤t_page_type=post_tag¤t_page_id=22&search_query&page_uri=/internet-technologie/tag/nginx/&user_id=0 HTTP/1.1” 403 105 “-” “Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.5195.102 Mobile Safari/537.36 (compatible; Googlebot/2.1; +https://www.google.com/bot.html)”
The HTTP code is 403, the reason is clear: exclusion_reason=CrawlerDetect
This is quite understandable that you do this. BUT the big problem is when the access log is being monitored by fail2ban, it sees all the 403’s and blocks the “offender” in the linux firewall – in this case, Googlebot gets blocked. Oops.
For such cases an option would be very useful to return “401 Unauthorized Request ” instead of a 403, as the latter will normally not be monitored by fail2ban, as it does not signify a real error.
Interested to hear your thoughts, and thanks for a great plugin!
Tim
- The topic ‘Problem with 403 in webserver acces log from robot visit’ is closed to new replies.