• For some reason Bing is unable to crawl my wordpress site. I’ve edited the robot.txt to allow Bing, and just about everything else, but for some reason bing webmaster tools returns nothing but 403 errors (although it does seem to have successfully indexed the sitemap). Any idea what I’m doing wrong or how I can fix this issue? The site doesn’t show up in search results at all.
    Thank you in advance!

Viewing 15 replies - 1 through 15 (of 19 total)
  • Moderator James Huff

    (@macmanx)

    WordPress by itself does not block any bots unless you set it to discourage search engines via Settings -> Reading.

    What is the URL of your site? Are you running any plugins that specifically block bots or are generally for security?

    Are you blocking any bots, IPs, or user-agents via .htaccess?

    Is your hosting provider using mod_security?

    Are you using CloudFlare?

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    The URL of the site is https://www.ourjewishcommunity.org

    I don’t believe we are running any security plug ins. We used Yoast WordPress for SEO.

    Our robot text is (Bing Bot is last):

    User-agent: *
    Allow: /
    Crawl-Delay: 10
    Disallow: /wp-admin
    Disallow: /wp-includes
    Disallow: /wp-content/plugins
    Disallow: /wp-content/cache
    Disallow: /wp-content/themes
    Disallow: /index.php
    Disallow: /wp-activate.php
    Disallow: /wp-app.php
    Disallow: /wp-blog-header.php
    Disallow: /wp-comments-post.php
    Disallow: /wp-config.php
    Disallow: /wp-cron.php
    Disallow: /wp-links-opml.php
    Disallow: /wp-load.php
    Disallow: /wp-login.php
    Disallow: /wp-mail.php
    Disallow: /wp-pass.php
    Disallow: /wp-register.php
    Disallow: /wp-settings.php
    Disallow: /wp-signup.php
    Disallow: /wp-trackback.php
    Disallow: /xmlrpc.php
    Disallow: /discuss/applications
    Disallow: /discuss/cache
    Disallow: /discuss/conf
    Disallow: /discuss/js
    Disallow: /discuss/library
    Disallow: /discuss/locales
    Disallow: /discuss/plugins
    Disallow: /discuss/themes
    Disallow: /discuss/uploads
    Disallow: /discuss/profile
    Disallow: /discuss/dashboard
    Disallow: /*.php$
    Disallow: /*.js$
    Disallow: /*.inc$
    Disallow: /*.css$
    Disallow: /*.gz$
    Disallow: /*.tar$
    Disallow: /*.tgz$
    Disallow: /*.cgi$
    Disallow: /*.xhtml$
    Disallow: /feed # Blocks the Blog Feed
    Disallow: */feed # Blocks the individual Post Feed
    Disallow: /comments # Blocks the Comments URL
    Disallow: */comment-* # Blocks the Comments Permalinks and Comment Pages
    Disallow: */trackback # Blocks the Trackback URL for posts
    Disallow: /*? # Blocks the dynamically generated contents
    Allow: /wp-content/uploads/

    # Google Image (Allows Google to index uploaded Images)
    User-agent: Googlebot-Image
    Disallow:
    Allow: /*

    # Google AdSense (Allows Google Adsense to determine your content)
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    # Digg Mirror (Blocks Digg to Crawl your blog)
    User-agent: duggmirror
    Disallow: /

    # Archive.ORG (Blocks Archive.org)
    User-agent: ia_archiver
    Disallow: /

    User-agent: msnbot
    Allow: /

    User-agent: BingBot
    Allow: /

    Our .htaccess is

    # BEGIN WordPress
    <IfModule mod_rewrite.c>
    RewriteEngine On
    RewriteBase /
    RewriteRule ^index\.php$ – [L]
    RewriteCond %{REQUEST_FILENAME} !-f
    RewriteCond %{REQUEST_FILENAME} !-d
    RewriteRule . /index.php [L]
    </IfModule>

    # END WordPress

    Hope this gives some clues. Thank you for your help.

    Moderator James Huff

    (@macmanx)

    Did you get this from Yoast’s plugin? There are several things that you should remove.

    Disallow: /*? # Blocks the dynamically generated contents

    This is the big one right here, the snake in the grass. WordPress is a dynamic blogging platform. Everything is at /*?. So, you’re blocking every bot from everything. I’m surprised more bots aren’t reporting 403 errors. They’re probably just giving up and not even logging anything.

    Disallow: /index.php

    index.php is essentially what generates your site, don’t block that either.

    Disallow: /*.php$

    Likewise with all the file-specific disallows like this, you’re preventing the bots from actually loading pages.

    Disallow: /feed # Blocks the Blog Feed
    Disallow: */feed # Blocks the individual Post Feed

    Your feed is actually a sitemap for all search engines, don’t block them from it.

    Allow: /

    Anything with this is redundant. If you aren’t blocking them, you are already allowing them.

    Regarding the specific wp-[blah].php files and blocks for specific file extensions, search engines are very smart and so is WordPress. They won’t find or index these things (with the exception of wp-login.php which is sometimes revealed by some themes.

    Also, remember that bots are not obligated to respect robots.txt. The good ones will respect robots.txt, but the bad ones will use it as a list of things you don’t want them to find (and that they therefore should find).

    Try this instead for the entire content of your robots.txt file (I’m not sure what’s required by Discuss, so I’m leaving that as-is, but do check for similar things as above):

    User-agent: *
    Disallow: /wp-admin/
    Disallow: /wp-includes/
    Disallow: /wp-content/plugins/
    Disallow: /wp-content/cache/
    Disallow: /wp-content/themes/
    Disallow: /wp-login.php
    Disallow: /discuss/applications
    Disallow: /discuss/cache
    Disallow: /discuss/conf
    Disallow: /discuss/js
    Disallow: /discuss/library
    Disallow: /discuss/locales
    Disallow: /discuss/plugins
    Disallow: /discuss/themes
    Disallow: /discuss/uploads
    Disallow: /discuss/profile
    Disallow: /discuss/dashboard
    
    # Digg Mirror (Blocks Digg to Crawl your blog)
    User-agent: duggmirror
    Disallow: /
    
    # Archive.ORG (Blocks Archive.org)
    User-agent: ia_archiver
    Disallow: /

    In the above, I left the blocks for Digg and Archive.org in, but I highly recommend removing those blocks. There’s no reason to make your site harder to find on popular services.

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    I will try this! Thank you!

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    I updated the robot.txt

    I am now getting this from bing webmaster tools when I try to fetch.

    URL: https://ourjewishcommunity.org/
    Status: The HTTP Status of 4xx was received.
    HTTP/1.1 403 Forbidden
    Connection: close
    Date: Thu, 03 Jul 2014 17:23:01 GMT
    Content-Length: 202
    Content-Type: text/html; charset=iso-8859-1
    Server: Apache

    <!DOCTYPE HTML PUBLIC “-//IETF//DTD HTML 2.0//EN”>
    <html><head>
    <title>403 Forbidden</title>
    </head><body>
    <h1>Forbidden</h1>
    <p>You don’t have permission to access /
    on this server.</p>
    </body></html>

    Thank you again for your help

    Moderator James Huff

    (@macmanx)

    Ok, which plugins are you running?

    Also, are you running mod_security, or using Cloudflare, Sucuri cloud proxy, or any other WAF?

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    Plugins:

    Akismet
    All In One SEO Pack (Inactive)
    Breadcrumb NavXT (Inactive)
    Captcha
    Dagon Design Sitemap Generator
    eCards
    Enable Media Replace
    Feed Template Customize
    Flash Video Player
    Meta Slider
    Navayan CSV Export (Inactive)
    Our Jewish Community Slider (Inactive)
    Preserved HTML Editor Markup
    Redirect All Types
    SEO Redirection
    ShareThis
    Wordfence Security
    WordPress SEO
    WP-DBManager
    WP-Paginate

    We’re not using any WAF, to the best of my knowledge.

    Thank you again for all of your help

    Moderator James Huff

    (@macmanx)

    For now, try switching off the following to see if it makes a difference:

    Redirect All Types
    SEO Redirection
    WordPress SEO

    All three could be redirecting or blocking the bot.

    If that doesn’t make a difference (and it may take a day or two to find out), it’s time to try disabling Wordfence for a day or two.

    Basically, you have 4 plugins which could be blocking or redirecting the Bing bot, so let’s start by ruling them out, and save the security plugin for last.

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    Thanks Mac,

    I’ll try this.

    Moderator James Huff

    (@macmanx)

    Excellent, let us know how it goes!

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    As of 1:45pm on Wednesday, The site still appears to be blocking the site with

    Redirect All Types
    SEO Redirection
    WordPress SEO

    Turned off. Should I wait another day or go for the security plugin?

    Thank you again

    Moderator James Huff

    (@macmanx)

    Normally, I would proceed to the next plugin, but since it’s a security plugin, give it one more day.

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    Should I turn the other stuff back on?

    Thread Starter KennethBraveOne

    (@kennethbraveone)

    I turned off the security. I notice that now my robot.txt can’t be edited, and I see the message:

    Your backup folder is NOT writable
    To correct this issue, make the folder /home/ourjewis/public_html/wp-content/backup-db writable.

    Is that normal? Will it come back when I turn the plugins back on?

    Moderator James Huff

    (@macmanx)

    You should be able to edit your robots.txt file traditionally via FTP and a plain text editor.

    I’m not sure why that would block backups, was the security plugin also providing backups?

Viewing 15 replies - 1 through 15 (of 19 total)
  • The topic ‘WordPress is blocking Bing Bot from my site’ is closed to new replies.