Exclude URL
-
I would like to exclude a specific shop category, e.g. by entering the category slug or an URL regex. (How) can I do this?
-
Tagging myself in because I’m looking for the same solution. Alternatively, I’d like to know how to exclude specific static pages.
Why? Because I create landing pages for ads that run on FaceBook, Google, etc. I don’t want these pages indexed.
My initial thought was to use robot.txt to add no follows, but I see two problems with that. First, since this mod is now controlling robot.txt will my directives be overwritten? Second, even if I exclude via robot.txt this mod will subsequently tell bots to index those pages by including them on the sitemap.
I’m in the same boat, so I’ll be watching this thread for an answer.
So there is a solution built in, though it’s a bit of a pain. You can submit a list of post and page numbers, separated by commas.
To find the page or post #, in admin go to your list of pages or posts. Hover over the edit button and you will see the associated number in the URL. It will say something like, “https://your-website.com/wp-admin/post.php?post=3390&action=edit”.
That being said, I don’t think simply removing pages from the sitemap will prevent them from showing up on google because bots will follow any links it finds. A better solution would be to put a noindex in the header tags of pages you don’t want to show up (<meta name=”robots” content=”noindex”>).
Or you could use robots.txt to list off pages you don’t want indexed (you could use regex there).
My problem is that my robot.txt has gone missing! It is absolutely there, I verified via google Console and I can browse to it (my-site.com/robots.txt). However, I cannot find this actual file in my public_html folder via my FTP client nor directly through cPanel File Manager!
I suspect it has been moved and .htaccess is being used to redirect, but I can’t figure it out for the life of me!
@feznizzle @jb01
No need to post almost empty replies just to follow a topic. There’s a link in the sidebar “Subscribe” for that exact purpose. (On a narrow mobile screen the “sidebar” lives below the content.)No need to post almost empty replies just to follow a topic. There’s a link in the sidebar “Subscribe” for that exact purpose.
Ah, thanks for the tip!
Ok, I think I have found the solution!!!
I am putting this up to help anybody else who comes down this path, and for my own future reference (as I don’t currently have time to implement).
The solution is quite simple: Override the virtual robots.txt by creating a physical one!
Before you do that, check what is currently in your virtual file by browsing to it:
https://www.yoursite.com/robots.txtIn my case, the only thing there is:
User-agent: *
Disallow: /wp-admin/
Allow: /wp-admin/admin-ajax.php
Sitemap: (a url to my site)
Sitemap: (a url to my site)After copying that down, I will go disable the Google Sitemap XML option to “Add sitemap URL to the virtual robots.txt file.”
Then I will recreate the robots.txt, removing “Allow: /wp-admin/admin-ajax.php”.
Now to prevent indexing, all I have to do is find a commonality and add a disallow. In my case, all of my advertisement landing page URLs look like this:
mysite.com/fb-ad/001
mysite.com/fb-ad/002
mysite.com/fb-ad/003So I will add this disallow:
Disallow: /fb-ad/Yaaaaaaaaahooooooooo!
A WORD OF CAUTION: It is possible that the plugin does not obey the robots.txt, I haven’t tested this yet. If it doesn’t, then Google Sitemap XML will still be adding those pages to the sitemap it generates.
And that will cause conflict (console errors). I believe this to be highly unlikely (again, untested as yet), because it seems super basic to me that this plugin would first consult the robots.txt before creating a sitemap.
However, if the plugin does continue adding unwanted pages to the map then you will have to remember to *manually* add pages you don’t want indexed to the list via the builtin solution I described above.
I will try to remember to update this thread, after I have implemented this on the site I am currently working on. If you test this out before I get to it, please share your results.
Good luck, y’all!
FYI, the Google Sitemap XML plugin does NOT obey the physical robots.txt. You can still put a physical robots.txt to prevent indexing, but you will have to manually exclude posts/pages that you do not want indexed. Otherwise you will kick Google Console errors.
Thank you very much @feznizzle for your detailed answers and update. The fact that the plugin doesn’t obey the physical robots.txt is absolutely sobering, since I am struggling with the fact that Google Sitemap XML contains URLs I dont want there. Unfortunately, those URLs are not posts, but product URLs.. And there are hundreds of them and frequent changes, so that it doesn’t solve my problem by adding post/page IDs…
I would hope that the plugin team has a solution, which could be rather simple: upon any sitemap update, disregard URLs which fulfill a certain pattern, e.g. contain a certain phrase.
If there’s no solution soon, I will uninstall the plugin as it causes more pain than gain for me!!
You’re welcome, @frankmarks.
Personally, I think XML sitemaps are overrated. Bots will crawl your site, hitting every page they find a link to except those that are forbidden. One great thing about XML sitemaps is they help bots find pages that might not be linked to directly from some other mappable page on your site.
There is a very simple workaround. Uninstall the XML Sitemap and just add a physical sitemap link to your footer (so that it is easily discoverable and regarded as important, since it appears in footer). You can build that page yourself very easily, and control every link on there. Then put up your own robots.txt and call it a day.
Good luck!
Thank you for submitting your question.
Today, you can exclude a category by selecting the category from the “Exclude Categories” list.
Settings -> XML Sitemap. Scroll down to the section titled “Excluded Items.”
@auctollo thank you but I’ve of course seen that section and this setting is only related to the general WP categories. What I am asking to do is to exclude a specific product category or a URL / slug referencing such a category. If there’s a way to do that I am happy to learn ??
@auctollo I would also like to see this option to include a specific url to be excluded. These urls are generated by the WP Category. I don’t want the parent category to be included in the sitemap. I do want the all the posts associated with these categories to be included.
Thank you for your feedback.
We will take these suggestions back to the team and explore how they could fit into future releases of the plugin.
- The topic ‘Exclude URL’ is closed to new replies.