• Hi

    Ever since I changed the webhost & moved my 2 sites both having a particular WordPress web directory plugin from http to https urls, a number of funny things have happened, especially to one of them. After months after the move, it’s just yesterday that one of the problems, apparently related to faulty 301 re-direct settings in some SSL plugin has been apparently resolved.

    Anyway, the other problem is some urls related to tabbed content, that as much as I remember where never a problem before the http to https change in search engine results largely producing 500 server errors and I think I do not even need them indexed.
    The affected links have to do with a regional web directory, with links in categories. More important is for a particular listings like site.com/directory/listing-x to be indexed. Each such listing optionally has additional info in tabs, like “related”, “nearby”, “reviews”, etc and have such a structure:

    1. directory/listing/ethiopia-mission-geneva-ethiopian-embassy-switzerland?tab=nearby&view=list&category=0&center=46.222492%2C6.130753&zoom=16&is_mile=0&directory_radius=100&p=1#sabai-inline-content-nearby
    2. directory/listing/sologistics-gmbh?tab=reviews&sort=random#sabai-inline-content-reviews
    3. directory/listing/united-states-embassy-algiers?tab=related&p=2&category=0&zoom=16&is_mile=0&directory_radius=0&view=list#sabai-inline-content-related, ….
      etc, etc

    It seems each possible listing like site.com/directory/listing/ethiopia-mission-geneva-ethiopian-embassy-switzerland has a huge number of links associated with the different tabs that get generated, for this alone tabs related to related, nearby, reviews, etc.
    I do not remember this before the http => https change and they seem to be featuring too much in search results, instead of the simple listing urls. I did NOT change any settings during that change.

    Anyway, I want to avoid these 500 errors associated with them. I would like to request for some advice on how to do this (not a coder). I’m thinking of preventing tabbed content links (for lack of a better word) from being indexed, probably in robots.txt. Is this a good idea? I do not know where else to do so.

    • I’m not a coder I must emphasize … Can this be done by blocking anything like “?tab=nearby”, “?tab=reviews”, “?tab=related” or just “?tab” … etc from being appended to any link like “site.com/directory/listing/ethiopia-mission-geneva-ethiopian-embassy-switzerland” being indexed, so links like those above, that are leading to countless crawl errors (500 internal server error) … in robots.txt do not get to search engines?
    • If so, exactly what code do I add to robots.txt so any similar links related to those tabs above are not indexed and so do not appear in search results?
    • What can be done to remove the hundreds if not thousands of similar, but useless links being removed from search engine results?

    I would appreciate your precise guidance.

    Kind regards

Viewing 5 replies - 1 through 5 (of 5 total)
  • Without a link to one of your pages, it’s hard to say exactly. This could be a lot of work. I doubt very much that this is a change from http to https; it’s more likely something else changed when you changed hosts. Servers are unique, like snowflakes.

    I’m not a coder

    robots.txt is not that difficult. Take a look at

    https://www.robotstxt.org/robotstxt.html

    But again, without seeing your site, it’s hard to give a more precise solution.

    use a robots.txt for block url or use a yoast plugin.

    Thread Starter xprt007

    (@xprt007)

    Hi
    @kjodle =>

    • “Without a link to one of your pages, it’s hard to say exactly.”
    • The site concerned is =>> this. The links are generated by a web directory plugin.

      If you pre-append that site url to any of the 3 example links above in first post, you will note the page is not found or get a 500 internal server error or something similar.

      Server errors in Google search console have ballooned of late with over 1200 and quite often many site guest are led to them. I have no idea what’s making search engines index them, as before the https change, I did no have this.

      Robots.txt syntax may be easy, but there’s something I just do not get right.

      I saw in some German tutorial this:

      Disallow: /priv*/ # affects all sub-categories beginning with “priv”
      Disallow: /*priv*/ # affects all sub-categories which include “priv”

      So I thought to prevent indexing ALL links, and there are potentially hundreds or more looking like ===> site.com/directory/listing/international-university-of-east-africa-iuea/related?view=map&category=0&zoom=15&is_mile=0&directory_radius=0&sort=newest&p=1 …

      … either
      Disallow: /related?*/ # affects all sub-categories beginning with “related?”
      or
      Disallow: /*related?*/ # affects all sub-categories which include “related?”

      … would do, but a check with Google robots.txt Tester shows the link “site.com/directory/listing/international-university-of-east-africa-iuea/related?view=map&category=0&zoom=15&is_mile=0&directory_radius=0&sort=newest&p=1” is “allowed”.

      So what is the correct way to use robots.txt to prevent indexing ALL links which look like:

    1. site.com/directory/listing/international-university-of-east-africa-iuea/related?view=map&category=0&zoom=15&is_mile=0&directory_radius=0&sort=newest&p=1
    2. site.com/directory/listing/beausejour-hotel-kigali/nearby?category=0&is_mile=0&directory_radius=100&view=map&sort=distance&is_drag=1&center=
    3. site.com/directory?view=map&category=0&zoom=15&is_mile=0&directory_radius=100&sort=newest&p=1
    4. site.com/directory/listing/fifa-com-zambia?tab=related&category=0&is_mile=0&directory_radius=0&view=map&sort=newest&is_drag=1&center=#sabai-inline-content-related
    5. site.com/directory/listing/maun-self-drive-4×4?tab=related&category=0&is_mile=0&directory_radius=100&view=map&sort=newest&is_drag=1&center=#sabai-inline-content-related

    .. and have something appended, beginning with components like “?tab”, “?view”, “nearby”, “related?” ?

    As said, most end up with 500 I.S-error or something similar.

    I would like to assume one can with some wildcard (?) syntax or something else prevent all similar links getting indexed and appearing in search results. The last time I wrote the post there were just a couple, but now well of 1200 & growing.

    I would appreciate your help.

    Thank you in advance

    Thread Starter xprt007

    (@xprt007)

    UPDATE to above post

    Someone in Google Webmasters Forum suggested this to a similar post there

    Try adding Disallow: /*?* in your robots file, it will block all such dynamic URLs having ? from getting crawled. Use this only if you do not wish any URL having ? parameter to get crawled.
    Also add Allow conditions for all .CSS and .JS files, as /*?* will also block some automatically generated CSS and JS asset URLs.

    Would this help, without affecting any useful files from getting indexed:


    Disallow: /*?*
    Allow: /*.js$
    Allow: /*.css$

    … ??

    If not, what exactly would help? I’m not an expert and so would apreciate the exact code to deal with issue in post before.

    Thank you in advance …

    Thread Starter xprt007

    (@xprt007)

    UPDATE to above post

    Someone in Google Webmasters Forum suggested this to a similar post there

    Try adding Disallow: /*?* in your robots file, it will block all such dynamic URLs having ? from getting crawled. Use this only if you do not wish any URL having ? parameter to get crawled.
    Also add Allow conditions for all .CSS and .JS files, as /*?* will also block some automatically generated CSS and JS asset URLs.

    Would this help, without affecting any useful files from getting indexed:


    Disallow: /*?*
    Allow: /*.js$
    Allow: /*.css$

    … ??

    If not, what exactly would help? I’m not an expert and so would apreciate the exact code to deal with issue in post before.

    Thank you in advance …

Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Blocking some types of links from search engines’ is closed to new replies.