• Resolved ps4pro

    (@ps4pro)


    Hi all,

    I’ve got issues with getting my wp-content URL’s noindexed. Since my hosting messed something up, a lot of crappy URL’s are indexed. Please consider the following example.

    https://gameconsole-aanbieding.nl/wp-content/plugins/js_composer/assets/lib/bower/zoom/

    I basically want either a meta robots noindex on the page (HTML) or give a X-Robots-Tag noindex http header reponse to every URL that contains wp-content. I’ve struggled for days to get this done and searched the web for any solutions, but without any luck. Can anyone help me out on this one?

    Thanks and happy new year,
    PS4Pro

Viewing 6 replies - 1 through 6 (of 6 total)
  • Hi,

    We see in your robots.txt file you have many rules, some of which are in conflict with each other: https://gameconsole-aanbieding.nl/robots.txt

    We suggest having the default rules, below. We say this as Google needs access all other folders and files on your site in order to understand pages. This guide explains how to edit the robots.txt:?https://kb.yoast.com/kb/how-to-edit-robots-txt-through-yoast-seo/. This guide explains more about how and what to have in your robots.txt file: https://yoast.com/ultimate-guide-robots-txt/.

    User-agent: *
    Disallow: /wp-admin/
    Allow: /wp-admin/admin-ajax.php

    You are welcomed to add other rules to your robots.txt such as

    Noindex: /wp-content/
    Noindex: /wp-includes/

    But we are not sure what the expected outcome will be. You may wish to monitor your Google Search Console to see what happens.

    Thread Starter ps4pro

    (@ps4pro)

    Hi Pcosta88,

    Thanks for your comment. I’ve experimenting with the noindex rule in the robots.txt. It’s not officially supported by Google but I’ve seen it work in the past. However, it prevents both crawling and indexing. These wp-content pages however, need to be crawled because they are necessary to render the pages.

    That’s why I want to look into these solutions for noindex. Any tips how I can add this? Solutions: Meta robots noindex or X-robots tag.

    Thanks in advance!

    Sa?a

    (@stodorovic)

    It’s something which isn’t related to Yoast SEO plugin, but I’ll try to help you. There are few solutions:

    • Custom code in file index.php for each directory (they often don’t contain index.php). My advice is to send 403 or 404 for these directories. You can use PHP function header to send custom headers. It isn’t good solution because plugin updates will overwrite your changes.
    • Adding Options -Indexes into .htaccess. It prevents creating list of files for directories. Instead of file lists, web server will send 403 error.
    • Custom rewrite rules into .htaccess which allow serving only few file types from wp-content (JS,CSS, images,…). Also, it will improve security because it prevents direct access to PHP files of plugins (If some plugin requires it then you could add extra rules). Example:
      # Serves only static files
      RewriteCond %{REQUEST_FILENAME} -f
      RewriteRule ^wp-(content|includes)/([^/]+/)*([^/.]+\.)+ (jp(e?g|2)?|png|gif|bmp|ico|css|js|swf|xml|xsl|html?|mp(eg[34])|avi|wav|og[gv]|xlsx?|docx?|pptx?|gz|zip|rar|pdf|xps|7z|[ot]tf|eot|woff2?|svg|od[tsp]|flv|mov)$ - [L]
      RewriteRule ^wp-(content|includes|admin/includes)/ - [R=404,L]
      

    Last solution supports adding custom PHP script which will send headers which you want (and HTML), but I think that’s better to use 403 or 404 status.

    After applying one of previous solution, you should manually remove all undesirable files from google index – Remove outdated content. It isn’t easy, but you could do it in few iterations.

    Thread Starter ps4pro

    (@ps4pro)

    Hi Sa?a,

    Thanks for your reponse and I like your last option. However, 403 or 404 is really not an option because the files are needed for Google to render the page. My only option really is the noindex. Do you maybe have any code I can use for the htaccess to adjust the http header reponse so that it can serve a noindex with the X-robots tag?

    Thanks.

    Sa?a

    (@stodorovic)

    URLs as wp-content/plugins/js_composer/assets/lib/bower/zoom/ and similar listings aren’t needed for any rendering. Many web servers have set Options -Indexes by default.

    I use similar rules and I didn’t notice negative effects for SEO. Particular JS/CSS files (eg. jquery.zoom.min.js) are needed for rendering and they are still accessible after applying rules. If you want to send X-Robots-Tag for JS/CSS files then you could use following rules into .htaccess:

    <FilesMatch "\.(js|css)$">
      Header set X-Robots-Tag "noindex"
    </FilesMatch>
    

    Unfortunately, I don’t have any script which sends custom content in this context. Maybe, I could write something, but I’m busy with some projects at this moment. I’ll update this topic if I write something.

    I highly recommend to you manually remove directories because it’s the fastest way to remove this content from google index. If you keep status 200 for these directories but without “good” HTML content then it’s possible soft 404 error. Adding “noindex” could little help, but I’m not sure that’s correct solution.

    If you still want to send custom headers then you could try to set environment variable via RewriteRule (E=noindex:1 instead of R=404) and use mod_headers to send custom header based on value of environment variable. I didn’t test it, but it should work.

    Thread Starter ps4pro

    (@ps4pro)

    Never thought about just removing the directories. I thought Google needed them for rendering, but they do not. Thanks a lot, Options -Indexes is a great solution!

Viewing 6 replies - 1 through 6 (of 6 total)
  • The topic ‘Noindex on wp-content URL’s’ is closed to new replies.