Appending any path to any post URL results in a 200 response instead of 404.
-
Here is a link to a Google Document: https://docs.google.com/document/d/1WcIPGwCoJD42tw_7wD8p_alhYu53ITxK6o2cS64v11A/edit?usp=sharing) where we have listed all the details of our issue. It was easier doing it in a separate document.
Please let us know if you would like us to elaborate on anything further.
-
My test site is set up in exactly the same way and I am unable to reproduce the effects you describe – any request to a URL that doesn’t exist returns a 404.
I note that your site is using both Varnish and Cloudflare. Could either of these be involved? Do you have some more easily reproducible examples that show the redirect working correctly and what you think is a URL that is working incorrectly?
Hey John,
Thanks for the quick response. It looks like in your first line of the reply you say that our issue is “- any request to a URL that doesn’t exist returns a 404” however that is not the issue. The issue is “Appending any path to any post URL results in a 200 response instead of 404” There is something broken in the plugin that is causing everything with a redirect to result in a 200 once any text is added to the end of the URL or in the middle of the URL. This leads visitors to a page that really doesn’t exist and is an overview of all our recent articles (but not the 404 page).
(Note: Cloudflare and Varnish are not at play, because we have our test site reproduced on a dev environment that they cannot effect)
The redirects that are implemented work. However when you take any URL and append anything to the end or in the middle of the URL, it doesn’t lead to a 404, it results in a 200. This looks to happen on any URL that we have implemented through Redirection which is essentially almost all of our posts, unless they are old posts that we changed the URL and didn’t implement the redirect.The main reason we are using the Redirection plugin was because of the permalink structure feature to add the categories of the article into each URL.
One of the redirects in this is
https://attorneyatlawmagazine.com/unveiling-the-mystery-a-deep-dive-into-expert-tire-failure-analysis—> https://attorneyatlawmagazine.com/from-the-expert/unveiling-the-mystery-a-deep-dive-into-expert-tire-failure-analysis
The redirect works however append anything to the end of the URL like /ahsjdhs:
https://attorneyatlawmagazine.com/from-the-expert/unveiling-the-mystery-a-deep-dive-into-expert-tire-failure-analysis/ahsjdhs – this doen’t 404, it leads to a page with an overview of the recent articles that doesn’t even exist.Or enter the random appended path in the middle of the article and it returns the 404: https://attorneyatlawmagazine.com/from-the-expert/unveiling-the/ahsjdhs-mystery-a-deep-dive-into-expert-tire-failure-analysis
Somehow the hundreds of thousands of URLs that are being indexed and picked up in Google Search Console all don’t exist but the error of the URLs with appended paths not going to a 404 is somehow causing Google to pick them up:- https://attorneyatlawmagazine.com/public-articles/personal-injury/medical-malpractice/jserrors/metrics/session_trace/jserrors/spa/metrics/aggregate/page/498 (Some URLs show items that make no sense like this one)
- https://attorneyatlawmagazine.com/legal/opinion/u-s-bop-the-least-accountable-agency’A=0%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1263%2Fpage%2F1271%2Fpage%2F2%2Fpage%2F1270%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1252%2Fpage%2F1252%2Fpage%2F1252%2Fpage%2F1252%2Fpage%2F1252%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1252%2Fpage%2F2%2Fpage%2F1252%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1252%2Fpage%2F1252%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F1253%2Fpage%2F2%2Fpage%2F1254%2Fpage%2F2%2Fpage%2F505%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2%2Fpage%2F498%2Fpage%2F498%2Fpage%2F498%2Fpage%2F2%2Fpage%2F498%2Fpage%2F498%2Fpage%2F2%2Fpage%2F498%2Fpage%2F2%2Fpage%2F498%2Fpage%2F2%2Fpage%2F498%2Fpage%2F499%2Fpage%2F499%2Fpage%2F499%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2%2Fpage%2F499%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2%2Fpage%2F499%2Fpage%2F498%2Fpage%2F2%2Fpage%2F498%2Fpage%2F498%2Fpage%2F2%2Fpage%2F2%2Fpage%2F2/page/2 (Some get insanely long)
- https://attorneyatlawmagazine.com/talk-of-the-town/minnesota-news/winthrop-weinstines-david-tanabe-selected-for-the-2022-blackshear-presidential-fellowship/page/497 (Some show paginations that have never existed like this page 497 of an article that is one page long. Additionally we removed this article a while ago but items still show up for older article like this that have been drafted in our system or removed completely)
Sure, but at the moment I’m unable to reproduce this problem, and it doesn’t appear to be affecting anyone else so I have no other instances to compare against.
the error of the URLs with appended paths not going to a 404 is somehow causing Google to pick them up:
No, this wouldn’t happen. Google will only index things that are linked on your site, or linked from externally. Regardless of whether Redirection is affecting the status of the pages, it would not (and in fact cannot) alter the content of pages and insert links to nonexistent ones.
Maybe you can export your redirects and send them to me, along with the permalink migration settings used? Possibly that might trigger the situation so I can reproduce.
Hey John,
I responded to your latest feedback here with the screenshots and exports at the bottom of the original document here: https://docs.google.com/document/d/1WcIPGwCoJD42tw_7wD8p_alhYu53ITxK6o2cS64v11A/edit?usp=sharing
Let me know if this helps at all to recreate the issue or diagnose further.I’m not a fan of putting things in a Google doc. It’s very confusing, takes much more time to read, and will eventually disappear and make this thread broken.
The goal is to figure out how the URLs are even getting made in the first place. We feel if we can get the appended paths to go to a 404 that this would solve the issue.
I don’t think it will solve the problem. These seem like two separate issues to me. Something is creating bad links and something is making invalid requests return content.
Even with your exact settings and permalinks I am unable to reproduce this problem. You say that external factors are not a part of this yet all examples you show use Cloudflare, Varnish, and many other plugins (including rank math). I still continue to suspect that something else is happening in your environment that is specific to your site as it would seem like this should be easily reproducible.
Alright, I will try to keep the rest of this convo not in Google doc. Was using it for picture purposes.
So we understand that you think it might be another plugin/systems but we have done some diagnosing to pretty much assure it is Redirection for the 404 issue.
Here is an example of the site that only has Redirection enabled: https://redirectplu-attorney-at-law-magazine.pantheonsite.io/. When we go to one of the example URLs that should 404 it does not: https://redirectplu-attorney-at-law-magazine.pantheonsite.io/public-articles/personal-injury/medical-malpractice/ajax/spa/spa/page_action/page_view_timing/page_view_event/aggregate/page/2Here is an example of site that has no plugins enabled: https://stripedwpon-attorney-at-law-magazine.pantheonsite.io/. When we go to the same example URL that should 404 it does: https://stripedwpon-attorney-at-law-magazine.pantheonsite.io/public-articles/personal-injury/medical-malpractice/ajax/spa/spa/page_action/page_view_timing/page_view_event/aggregate/page/2
Live site that has all plugin enables: https://attorneyatlawmagazine.com/.
Specific URL Example: https://attorneyatlawmagazine.com/public-articles/personal-injury/medical-malpractice/ajax/spa/spa/page_action/page_view_timing/page_view_event/aggregate/page/2When we are on a dev environment like the first two examples, the issue still exists. This dev enviroment does not involve Pantheon/Varnish, Cloudflare and we killed all plugins including RankMath. Right when the Redirection plugin is enabled it causes the links not to 404.
One extra item we found out was it seems like the only links that are affected by the 404 issue are the URLs that involve subcategories. If the URL does not have a subcategory in it, it will 404 correctly.
What can cause this if everything besides Redirection is disabled?
I just went in circle for the last two days because we had a site with the same problem. What fixed it for us, was the permalink migration setting in the site tab of the redirection plugin settings. It had the same tag in it as is in the normal wordpress permalinks settings page. When I removed it from the redirection plugin settings, the posts are correctly showing the 404 again.
I have had this same issue for a long time, with some VERY screwy URLs showing up in Google Search Console. I only recently figured out that it was happening only for posts, and as @dzynit suggested, removing %postname% from my redirection migration tab finally resolved it. Try that @jacraver and see…
***Update, well @johnny5 that’s apparently not a good solution because now all of the migrated permalinks don’t work any more, and the site has a lot of 404s.
-
This reply was modified 7 months ago by
Scott Hendison.
-
This reply was modified 7 months ago by
Scott Hendison.
@cyber49 yeah, I imagine removing %postname% from the redirection migration tab would essentially break all the redirects themselves. So it will return a 404 because the redirect would be removed entirely.
@dzynit if yours matched, I would see how that would cause some issues. However, did you remove the tag entirely from the redirection migration tab, or did you make them match correctly?
For instance we are using the redirect migration function to redirect all of our old post structure, https://attorneyatlawmagazine.com/%postname% to our new post structure, https://attorneyatlawmagazine.com/%category%/%postname%.I do believe the break is within the permalink migration functionality, but I don’t think removing the %postname% is my answer. Mine don’t match like you are saying either @dzynit so I am not sure that is the solution for our problem either.
Let’s keep putting our heads together though to see if we can figure this out. We have had this issue for well over a year.Yes, removing the permalink migration will cause it not to function anymore.
The last post from @jacraver had a URL that I was able to use to reproduce the problem locally. Somehow the URL triggers WP to match it as a category archive. The plugin only handles posts and so ignores this, but WP then carries on as though it was a category archive.
I’ve put together a potential fix for this in https://github.com/johngodley/redirection/releases/tag/5.5.1-prerelease with the only change being to reset WP after the category is ignored. This seems to fix it on my test site, but I don’t know if it’s the same situation as yours. If you can give it a try and let me know then I’ll include it in the next release.
Hey John,
So we implemented that on this test site: https://rebeta1-attorney-at-law-magazine.pantheonsite.io/public-articles/personal-injury/medical-malpractice/ajax/spa/spa/page_action/page_view_timing/page_view_event/aggregate/page/2
and that is now returning a 404 as well as all the other links that were indexing on Google’s side. We think that fixes the issue. We have not implemented it on the live site yet.
Does it look like that to you from the link above?
Thank you very much for taking the time to dive into this.Hey John (@johnny5) ,
Alright so I think that worked to solve the 404 issue of random URLs not returning a 404, but now we are noticing some other interesting issues more to do with the categories and subcategories section of the URL.
Take this URL for instance: https://rebeta1-attorney-at-law-magazine.pantheonsite.io/legal-marketing/networking/best-legal-conferences-to-attend
You can replace either of the two category sections “legal-marketing” or “networking” with random characters or add random characters to those sections and it will redirect you right back to the article almost as if it doesn’t even matter what the categories are.
Ex: https://rebeta1-attorney-at-law-magazine.pantheonsite.io/legal-marketing/ashdhdfsjfd/best-legal-conferences-to-attend, https://rebeta1-attorney-at-law-magazine.pantheonsite.io/ahsdsdfhdfd/networking/best-legal-conferences-to-attend or https://rebeta1-attorney-at-law-magazine.pantheonsite.io/legal-marketing/networkingashdjdskfdg/best-legal-conferences-to-attend
However if you add anything in the actual article title section not the category, this will 404: https://rebeta1-attorney-at-law-magazine.pantheonsite.io/legal-marketing/networking/best-legal-conferences-to-ahsdsfdjfhjdsattend
Do you know why when replacing the section of the categories and subcategories it is not returning a 404?-
This reply was modified 6 months, 1 week ago by
jacraver.
Redirection is not perform the redirect of https://rebeta1-attorney-at-law-magazine.pantheonsite.io/legal-marketing/ashdhdfsjfd/best-legal-conferences-to-attend – WordPress is.
@johnny5
If the change we implemented was to reset WP after the category is ignored.
This seems like you can enter anything in the category sections. Then when you click enter it essentially ignores the category and resets WP and then find the normal path through redirection plugin.
How would WordPress be implementing this redirect by itself?How would WordPress be implementing this redirect by itself?
WordPress performs it’s own checking and will redirect to what it thinks is the nearest page. If the category is incorrect but the page slug is correct then it redirects to the full URL for that page.
- You must be logged in to reply to this topic.