WordPress is blocking Bing Bot from my site
-
For some reason Bing is unable to crawl my wordpress site. I’ve edited the robot.txt to allow Bing, and just about everything else, but for some reason bing webmaster tools returns nothing but 403 errors (although it does seem to have successfully indexed the sitemap). Any idea what I’m doing wrong or how I can fix this issue? The site doesn’t show up in search results at all.
Thank you in advance!
-
WordPress by itself does not block any bots unless you set it to discourage search engines via Settings -> Reading.
What is the URL of your site? Are you running any plugins that specifically block bots or are generally for security?
Are you blocking any bots, IPs, or user-agents via .htaccess?
Is your hosting provider using mod_security?
Are you using CloudFlare?
The URL of the site is https://www.ourjewishcommunity.org
I don’t believe we are running any security plug ins. We used Yoast WordPress for SEO.
Our robot text is (Bing Bot is last):
User-agent: *
Allow: /
Crawl-Delay: 10
Disallow: /wp-admin
Disallow: /wp-includes
Disallow: /wp-content/plugins
Disallow: /wp-content/cache
Disallow: /wp-content/themes
Disallow: /index.php
Disallow: /wp-activate.php
Disallow: /wp-app.php
Disallow: /wp-blog-header.php
Disallow: /wp-comments-post.php
Disallow: /wp-config.php
Disallow: /wp-cron.php
Disallow: /wp-links-opml.php
Disallow: /wp-load.php
Disallow: /wp-login.php
Disallow: /wp-mail.php
Disallow: /wp-pass.php
Disallow: /wp-register.php
Disallow: /wp-settings.php
Disallow: /wp-signup.php
Disallow: /wp-trackback.php
Disallow: /xmlrpc.php
Disallow: /discuss/applications
Disallow: /discuss/cache
Disallow: /discuss/conf
Disallow: /discuss/js
Disallow: /discuss/library
Disallow: /discuss/locales
Disallow: /discuss/plugins
Disallow: /discuss/themes
Disallow: /discuss/uploads
Disallow: /discuss/profile
Disallow: /discuss/dashboard
Disallow: /*.php$
Disallow: /*.js$
Disallow: /*.inc$
Disallow: /*.css$
Disallow: /*.gz$
Disallow: /*.tar$
Disallow: /*.tgz$
Disallow: /*.cgi$
Disallow: /*.xhtml$
Disallow: /feed # Blocks the Blog Feed
Disallow: */feed # Blocks the individual Post Feed
Disallow: /comments # Blocks the Comments URL
Disallow: */comment-* # Blocks the Comments Permalinks and Comment Pages
Disallow: */trackback # Blocks the Trackback URL for posts
Disallow: /*? # Blocks the dynamically generated contents
Allow: /wp-content/uploads/# Google Image (Allows Google to index uploaded Images)
User-agent: Googlebot-Image
Disallow:
Allow: /*# Google AdSense (Allows Google Adsense to determine your content)
User-agent: Mediapartners-Google*
Disallow:
Allow: /*# Digg Mirror (Blocks Digg to Crawl your blog)
User-agent: duggmirror
Disallow: /# Archive.ORG (Blocks Archive.org)
User-agent: ia_archiver
Disallow: /User-agent: msnbot
Allow: /User-agent: BingBot
Allow: /Our .htaccess is
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ – [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule># END WordPress
Hope this gives some clues. Thank you for your help.
Did you get this from Yoast’s plugin? There are several things that you should remove.
Disallow: /*? # Blocks the dynamically generated contents
This is the big one right here, the snake in the grass. WordPress is a dynamic blogging platform. Everything is at /*?. So, you’re blocking every bot from everything. I’m surprised more bots aren’t reporting 403 errors. They’re probably just giving up and not even logging anything.
Disallow: /index.php
index.php is essentially what generates your site, don’t block that either.
Disallow: /*.php$
Likewise with all the file-specific disallows like this, you’re preventing the bots from actually loading pages.
Disallow: /feed # Blocks the Blog Feed
Disallow: */feed # Blocks the individual Post FeedYour feed is actually a sitemap for all search engines, don’t block them from it.
Allow: /
Anything with this is redundant. If you aren’t blocking them, you are already allowing them.
Regarding the specific
wp-[blah].php
files and blocks for specific file extensions, search engines are very smart and so is WordPress. They won’t find or index these things (with the exception of wp-login.php which is sometimes revealed by some themes.Also, remember that bots are not obligated to respect robots.txt. The good ones will respect robots.txt, but the bad ones will use it as a list of things you don’t want them to find (and that they therefore should find).
Try this instead for the entire content of your robots.txt file (I’m not sure what’s required by Discuss, so I’m leaving that as-is, but do check for similar things as above):
User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: /wp-content/plugins/ Disallow: /wp-content/cache/ Disallow: /wp-content/themes/ Disallow: /wp-login.php Disallow: /discuss/applications Disallow: /discuss/cache Disallow: /discuss/conf Disallow: /discuss/js Disallow: /discuss/library Disallow: /discuss/locales Disallow: /discuss/plugins Disallow: /discuss/themes Disallow: /discuss/uploads Disallow: /discuss/profile Disallow: /discuss/dashboard # Digg Mirror (Blocks Digg to Crawl your blog) User-agent: duggmirror Disallow: / # Archive.ORG (Blocks Archive.org) User-agent: ia_archiver Disallow: /
In the above, I left the blocks for Digg and Archive.org in, but I highly recommend removing those blocks. There’s no reason to make your site harder to find on popular services.
I will try this! Thank you!
I updated the robot.txt
I am now getting this from bing webmaster tools when I try to fetch.
URL: https://ourjewishcommunity.org/
Status: The HTTP Status of 4xx was received.
HTTP/1.1 403 Forbidden
Connection: close
Date: Thu, 03 Jul 2014 17:23:01 GMT
Content-Length: 202
Content-Type: text/html; charset=iso-8859-1
Server: Apache<!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>
<html><head>
<title>403 Forbidden</title>
</head><body>
<h1>Forbidden</h1>
<p>You don’t have permission to access /
on this server.</p>
</body></html>Thank you again for your help
Ok, which plugins are you running?
Also, are you running mod_security, or using Cloudflare, Sucuri cloud proxy, or any other WAF?
Plugins:
Akismet
All In One SEO Pack (Inactive)
Breadcrumb NavXT (Inactive)
Captcha
Dagon Design Sitemap Generator
eCards
Enable Media Replace
Feed Template Customize
Flash Video Player
Meta Slider
Navayan CSV Export (Inactive)
Our Jewish Community Slider (Inactive)
Preserved HTML Editor Markup
Redirect All Types
SEO Redirection
ShareThis
Wordfence Security
WordPress SEO
WP-DBManager
WP-PaginateWe’re not using any WAF, to the best of my knowledge.
Thank you again for all of your help
For now, try switching off the following to see if it makes a difference:
Redirect All Types
SEO Redirection
WordPress SEOAll three could be redirecting or blocking the bot.
If that doesn’t make a difference (and it may take a day or two to find out), it’s time to try disabling Wordfence for a day or two.
Basically, you have 4 plugins which could be blocking or redirecting the Bing bot, so let’s start by ruling them out, and save the security plugin for last.
Thanks Mac,
I’ll try this.
Excellent, let us know how it goes!
As of 1:45pm on Wednesday, The site still appears to be blocking the site with
Redirect All Types
SEO Redirection
WordPress SEOTurned off. Should I wait another day or go for the security plugin?
Thank you again
Normally, I would proceed to the next plugin, but since it’s a security plugin, give it one more day.
Should I turn the other stuff back on?
I turned off the security. I notice that now my robot.txt can’t be edited, and I see the message:
Your backup folder is NOT writable
To correct this issue, make the folder /home/ourjewis/public_html/wp-content/backup-db writable.Is that normal? Will it come back when I turn the plugins back on?
You should be able to edit your robots.txt file traditionally via FTP and a plain text editor.
I’m not sure why that would block backups, was the security plugin also providing backups?
- The topic ‘WordPress is blocking Bing Bot from my site’ is closed to new replies.