robots.txt and sitemap.xml blocked
-
Hello,
this plugin blocks robots.txt and sitemaps.xml after clicking on the “secure my site” button. This is CRITICAL as google can’t find these files…
-
Can you present any evidence to backup your observation ?
While not being able to access the robots.txt file Googlebot etc will still be able to crawl and index your website…
dwinden
Sure…
– both were in accessible.
– disabled ithemes, still inaccessible
– went back to the WP minimal .htaccess file: works
– enabled ithemes: still works
– after securing site, both are blocked. I did have the blocklists enabled.Do you want the htaccess file?
both were in accessible.
So the URL https://www.yourdomain.com/robots.txt returns a 403 (forbidden) ?
Are you using an actual robots.txt file which exists in the root of your WordPress install ?
(The reason why I ask is because WordPress by default uses a virtual robots.txt).dwinden
Sorry, the first line should say “both were inaccessible”. Both files returned a 403. I’m using the virtual WP robots.txt file, and Yoast SEO is generating the sitemap.xml.
If there is anything I can do to help debug, please let me know.
Thank you for your feedback.
Did you actually try to manually access the URL:
from a browser (Google Chrome or Mozilla FireFox) ?
Or are those 403s reported in Google Search Console ?
Please upload your .htaccess file after obscuring any sensitive info.
dwinden
Yes, I manually tried them both from firefox. Google search console also showed they were inaccessible. Didn’t try curl or wget, but would have expected the same result. I’ll post the htaccess.
Same error with wget:
$ wget https://mysite.com/robots.txt https://mysite.com/sitemap.xml --2016-08-02 12:07:05-- https://mysite.com/robots.txt Resolving mysite.com (mysite.com)... Connecting to mysite.com (mysite.com)... connected. HTTP request sent, awaiting response... 403 Forbidden 2016-08-02 12:07:05 ERROR 403: Forbidden. --2016-08-02 12:07:05-- https://mysite.com/sitemap.xml Reusing existing connection to mysite.com:80. HTTP request sent, awaiting response... 403 Forbidden 2016-08-02 12:07:05 ERROR 403: Forbidden.
# BEGIN iThemes Security - Do not modify or remove this line # iThemes Security Config Details: 2 # Enable HackRepair.com's blacklist feature - Security > Settings > Banned Users > Default Blacklist # Start HackRepair.com Blacklist RewriteEngine on # Start Abuse Agent Blocking RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*Indy" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Mozilla.*NEWT" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^$" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Maxthon$" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^SeaMonkey$" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Acunetix" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^binlar" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^BlackWidow" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Bolt 0" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^BOT for JCE" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Bot mailto\:craftbot@yahoo\.com" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^casper" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^checkprivacy" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^ChinaClaw" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^clshttp" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^cmsworldmap" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^comodo" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Custo" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Default Browser 0" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^diavol" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^DIIbot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^DISCo" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^dotbot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Download Demon" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^eCatch" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^EirGrabber" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^EmailCollector" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^EmailSiphon" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^EmailWolf" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Express WebPictures" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^extract" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^ExtractorPro" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^EyeNetIE" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^feedfinder" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^FHscan" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^FlashGet" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^flicky" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^g00g1e" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^GetRight" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^GetWeb\!" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Go\!Zilla" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Go\-Ahead\-Got\-It" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^grab" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^GrabNet" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Grafula" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^harvest" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^HMView" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^ia_archiver" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Image Stripper" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Image Sucker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^InterGET" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Internet Ninja" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^InternetSeer\.com" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^jakarta" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Java" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^JetCar" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^JOC Web Spider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^kanagawa" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^kmccrew" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^larbin" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^LeechFTP" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^libwww" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Mass Downloader" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^microsoft\.url" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^MIDown tool" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^miner" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Mister PiX" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^MSFrontPage" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Navroad" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^NearSite" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Net Vampire" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^NetAnts" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^NetSpider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^NetZIP" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^nutch" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Octopus" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Offline Explorer" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Offline Navigator" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^PageGrabber" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Papa Foto" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^pavuk" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^pcBrowser" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^PeoplePal" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^planetwork" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^psbot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^purebot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^pycurl" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^RealDownload" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^ReGet" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Rippers 0" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^sitecheck\.internetseer\.com" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^SiteSnagger" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^skygrid" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^SmartDownload" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^sucker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^SuperBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^SuperHTTP" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Surfbot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^tAkeOut" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Teleport Pro" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Toata dragostea mea pentru diavola" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^turnit" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^vikspider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^VoidEYE" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Web Image Collector" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Web Sucker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebAuto" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebBandit" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebCopier" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebFetch" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebGo IS" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebLeacher" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebReaper" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebSauger" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Website eXtractor" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Website Quester" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebStripper" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebWhacker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WebZIP" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Wget" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Widow" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WPScan" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WWW\-Mechanize" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^WWWOFFLE" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Xaldon WebSpider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^Zeus" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "^zmeu" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "360Spider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "AhrefsBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "CazoodleBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "discobot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "EasouSpider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "ecxi" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "GT\:\:WWW" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "heritrix" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "HTTP\:\:Lite" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "HTTrack" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "ia_archiver" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "id\-search" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "IDBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Indy Library" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "IRLbot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "ISC Systems iRc Search 2\.1" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "LinksCrawler" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "LinksManager\.com_bot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "linkwalker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "lwp\-trivial" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "MFC_Tear_Sample" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Microsoft URL Control" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Missigua Locator" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "MJ12bot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "panscient\.com" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "PECL\:\:HTTP" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "PHPCrawl" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "PleaseCrawl" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "SBIder" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "SearchmetricsBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "SeznamBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Snoopy" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Steeler" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "URI\:\:Fetch" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "urllib" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Web Sucker" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "webalta" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "WebCollage" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "Wells Search II" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "WEP Search" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "XoviBot" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "YisouSpider" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "zermelo" [NC,OR] RewriteCond %{HTTP_USER_AGENT} "ZyBorg" [NC,OR] # End Abuse Agent Blocking # Start Abuse HTTP Referrer Blocking RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?semalt\.com" [NC,OR] RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?kambasoft\.com" [NC,OR] RewriteCond %{HTTP_REFERER} "^https?://(?:[^/]+\.)?savetubevideo\.com" [NC] # End Abuse HTTP Referrer Blocking RewriteRule ^.* - [F,L] # End HackRepair.com Blacklist, https://pastebin.com/u/hackrepair # Protect System Files - Security > Settings > System Tweaks > System Files <files .htaccess> <IfModule mod_authz_core.c> Require all denied </IfModule> <IfModule !mod_authz_core.c> Order allow,deny Deny from all </IfModule> </files> <files readme.html> <IfModule mod_authz_core.c> Require all denied </IfModule> <IfModule !mod_authz_core.c> Order allow,deny Deny from all </IfModule> </files> <files readme.txt> <IfModule mod_authz_core.c> Require all denied </IfModule> <IfModule !mod_authz_core.c> Order allow,deny Deny from all </IfModule> </files> <files install.php> <IfModule mod_authz_core.c> Require all denied </IfModule> <IfModule !mod_authz_core.c> Order allow,deny Deny from all </IfModule> </files> <files wp-config.php> <IfModule mod_authz_core.c> Require all denied </IfModule> <IfModule !mod_authz_core.c> Order allow,deny Deny from all </IfModule> </files> # Disable Directory Browsing - Security > Settings > System Tweaks > Directory Browsing Options -Indexes <IfModule mod_rewrite.c> RewriteEngine On # Protect System Files - Security > Settings > System Tweaks > System Files RewriteRule ^wp-admin/includes/ - [F] RewriteRule !^wp-includes/ - [S=3] RewriteCond %{SCRIPT_FILENAME} !^(.*)wp-includes/ms-files.php RewriteRule ^wp-includes/[^/]+\.php$ - [F] RewriteRule ^wp-includes/js/tinymce/langs/.+\.php - [F] RewriteRule ^wp-includes/theme-compat/ - [F] # Disable PHP in Uploads - Security > Settings > System Tweaks > Uploads RewriteRule ^wp\-content/uploads/.*\.(?:php[1-6]?|pht|phtml?)$ - [NC,F] # Filter Request Methods - Security > Settings > System Tweaks > Request Methods RewriteCond %{REQUEST_METHOD} ^(TRACE|DELETE|TRACK) [NC] RewriteRule ^.* - [F] # Filter Suspicious Query Strings in the URL - Security > Settings > System Tweaks > Suspicious Query Strings RewriteCond %{QUERY_STRING} \.\.\/ [NC,OR] RewriteCond %{QUERY_STRING} ^.*\.(bash|git|hg|log|svn|swp|cvs) [NC,OR] RewriteCond %{QUERY_STRING} etc/passwd [NC,OR] RewriteCond %{QUERY_STRING} boot\.ini [NC,OR] RewriteCond %{QUERY_STRING} ftp\: [NC,OR] RewriteCond %{QUERY_STRING} http\: [NC,OR] RewriteCond %{QUERY_STRING} https\: [NC,OR] RewriteCond %{QUERY_STRING} (\<|%3C).*script.*(\>|%3E) [NC,OR] RewriteCond %{QUERY_STRING} mosConfig_[a-zA-Z_]{1,21}(=|%3D) [NC,OR] RewriteCond %{QUERY_STRING} base64_encode.*\(.*\) [NC,OR] RewriteCond %{QUERY_STRING} ^.*(%24&x).* [NC,OR] RewriteCond %{QUERY_STRING} ^.*(127\.0).* [NC,OR] RewriteCond %{QUERY_STRING} ^.*(globals|encode|localhost|loopback).* [NC,OR] RewriteCond %{QUERY_STRING} ^.*(request|concat|insert|union|declare).* [NC] RewriteCond %{QUERY_STRING} !^loggedout=true RewriteCond %{QUERY_STRING} !^action=jetpack-sso RewriteCond %{QUERY_STRING} !^action=rp RewriteCond %{HTTP_COOKIE} !^.*wordpress_logged_in_.*$ RewriteCond %{HTTP_REFERER} !^https://maps\.googleapis\.com(.*)$ RewriteRule ^.* - [F] # Filter Non-English Characters - Security > Settings > System Tweaks > Non-English Characters RewriteCond %{QUERY_STRING} ^.*(%0|%A|%B|%C|%D|%E|%F).* [NC] RewriteRule ^.* - [F] </IfModule> # Enable the hide backend feature - Security > Settings > Hide Login Area > Hide Backend RewriteRule ^(/)?wplogin/?$ /wp-login.php [QSA,L] RewriteRule ^(/)?wp-register-php/?$ /wplogin?action=register [QSA,L] # END iThemes Security - Do not modify or remove this line # BEGIN WordPress <IfModule mod_rewrite.c> RewriteEngine On RewriteBase / RewriteRule ^index\.php$ - [L] RewriteCond %{REQUEST_FILENAME} !-f RewriteCond %{REQUEST_FILENAME} !-d RewriteRule . /index.php [L] </IfModule> # END WordPress # BEGIN MainWP # END MainWP
Not enabling the Banned Users (Default Blacklist – Enable HackRepair.com’s blacklist feature …and … Ban Lists) keeps it working, so its something there.
@internationalis
Ok, so then it must be the HackRepair.com’s blacklist.
Try and disable it and see what happens.
Note wget and empty User Agents are blocked by the HackRepair.com’s blacklist. Those are known to cause trouble in some cases.
Your Ban Lists are probably empty because I don’t see anything related in your .htaccess file.
dwinden
I would like to know which rule is causing it so I can report it to the blacklist. And, preferably, ithemes should detect these two important files being blocked and at least warn the user. Dropping a sitemap has a big impact on google rankings according to quite a few posts that I read.
Is there a logging that I can use to see which rule triggered, or why robots.txt was blocked that can help track down the problem?
Thanks
I think you should know that this is not a general iTSec plugin issue.
I’m unable to reproduce the issue so it must be a user agent specific issue for your hosting env.You could try and temporarily comment out the line that blocks empty user agents in the .htaccess file like this:
# RewriteCond %{HTTP_USER_AGENT} “^$” [NC,OR]
But this is a bit of a wild guess.
Otherwise try and determin which user agent is being detected.
To do so follow these steps:- First make sure HackRepair.com’s blacklist is disabled.
- Use FireFox with the FireBug extension installed.
- In the FireBug console click on the little arrow right next to the Net menu option. Enable it.
- Then access the robots.txt file from the FireFox browser.
- Click on the + sign right in front of the request.
Since the HackRepair.com’s blacklist is disabled this should result in the content of the virtual robots.txt file being rendered.
In the FireBug console (Net->All) a single GET request for robots.txt will be displayed.This will show you the response and request headers of the request. Look for the value of the User-Agent header in the request headers.
Good luck !
dwinden
Just another thought …
If the website is behind a proxy the proxy server might be changing the user agent …dwinden
- The topic ‘robots.txt and sitemap.xml blocked’ is closed to new replies.