FEATURE REQUEST: 410 Deleted message instead of 404 Not Found
-
I recently deleted a lot of posts, and it would have been really cool if, when these posts are requested, wordpress could return a “Post Deleted” message instead of “Not Found” (with the right HTTP status code sent, naturally). That way people know what’s happening and search engines would be better able to update their indexes.
-
According to RFC2616, a HTTP return status of “410 Gone” means:
“This condition is expected to be considered permanent. Clients with link editing capabilities SHOULD delete references to the Request-URI after user approval. If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead.”
Going by that definition, isn’t the “404 Not Found” actually correct? In order to determine whether a post can be displayed (whether it exists or not) WP uses the function
get_posts()
to determine the various “criteria”, of which one rule is:Only those posts with a status of “publish” are contenders, and all others should righfully be disgarded.
Q: So then what’s the difference between (1) a post that doesn’t actually exist in the database, (2) one that does but is not yet published and (3) one that has been published in the past but then had it’s published status revoked?
A: Nothing.
I think that’s why “404 Not found” is actually correct as opposed to “410 Gone”. The former is temporary while the latter is permanent.
Deleting a post is a permanent removal of a resource.
You wrote:
Q: So then what’s the difference between (1) a post that doesn’t actually exist in the database, (2) one that does but is not yet published and (3) one that has been published in the past but then had it’s published status revoked?
A: Nothing.
Your answer is wrong on two accounts. Firstly, from the point of view of a user requesting a post, there is clearly a difference between not being able to find a post because it doesn’t exist, and not being able to find a post because it once existed but has now been deleted.
Secondly, the standard you quoted does not intend 404 to be used for situations where a resource is temporarily missing. It’s to be used when a resource can’t be found, whether that’s temporary or not. 410 is to intended for use when the server knows that the resource has been deleted.
Also, why would WP’s current mechanism for retrieving posts have anything to do with which status code is returned? Just because
get_posts()
currently only operates on posts which are in the database, doesn’t mean that it couldn’t be extended to check whether, for instance, posts have a label of “deleted”. It seems it already checks whether they exist, why couldn’t it check whether they had existed in the past?“Your answer is wrong on two accounts. Firstly, from the point of view of a user requesting a post, there is clearly a difference between not being able to find a post because it doesn’t exist, and not being able to find a post because it once existed but has now been deleted.”
Not at all. That’s just your interpretation. My answer is no more wrong than your suggestion.
Can you explain just how there is a difference between the two? The way I see it is the two are identical because either way the post doesn’t exist at this point in time. That doesn’t mean that it didn’t exist in the past and it doesn’t mean that it won’t exist in the future.
“Secondly, the standard you quoted does not intend 404 to be used for situations where a resource is temporarily missing. It’s to be used when a resource can’t be found, whether that’s temporary or not. 410 is to intended for use when the server knows that the resource has been deleted.”
Not quite. Read this bit again:
“If the server does not know, or has no facility to determine, whether or not the condition is permanent, the status code 404 (Not Found) SHOULD be used instead.”
There are three scenarios from the point of WP:
1. the post never existed
2. it still exists but it’s status is “published” and it’s marked for “public” viewing
3. it once existed but has been deletedNo matter which scenario you pick the conclusion is exactly the same: the condition is not permanent because it can be rectified by an authorised human with a few minutes to spare.
Good luck getting WP changed to suit your way of thinking though. Something tells me you’ll have to hack it if you want it to be the way you like.
And that’s your choice.
I still think you’re wrong, and I don’t think it’s a question of interpretation.
Can you explain just how there is a difference between the two? The way I see it is the two are identical because either way the post doesn’t exist at this point in time.
It’s different because of why the post doesn’t exist. If you accessed a page that you had been on before, and you got a 404, you might try looking for it, thinking the address had changed or something. If you got a 410, you would know that the page had been deleted. While a 404 would technically be valid, a 410 would be more useful, and more appropriate.
… the condition is not permanent because it can be rectified by an authorised human with a few minutes to spare.
While that’s a conceivable scenario, it seems unlikely. If I deleted a post, why would I bring it back? If I wanted to make edits to it, I would set its status to draft, and then a 404 would be correct, but if I purposefully delete it, a 410 would be much more appropriate.
410 is usually used where someone has had stuff hosted, such as a student on a University account, and they moved and taken it with them and the host has no information on where they’ve gone.
It’s basically a “return to sender, address unknown” code.
If *you* delete something from your system, *your* system mind you, not somebody elses, that is, a domain you control as opposed to just a /~ account, then it’s a 404 because it isn’t technically “gone” because you *know* where it went.
I think that’s why “404 Not found” is actually correct as opposed to “410 Gone”. The former is temporary while the latter is permanent.
They are both permanent conditions, in fact you could argue 410 is the more temporary as it’s expected the host will remove the code sooner rather than later once search engines and friends of the “gone missing” get the idea the resource is gone. It’d then become a 404 if anyone or anything tried accessing the resource.
So, 410 makes little to no sense on a website/domain controlled by one person/entity who knows where stuff has gone.
Check out the W3C HTTP 1.1 status code definitions, quoted below:
The 410 response is primarily intended to assist the task of web maintenance by notifying the recipient that the resource is intentionally unavailable and that the server owners desire that remote links to that resource be removed. Such an event is common for limited-time, promotional services and for resources belonging to individuals no longer working at the server’s site. It is not necessary to mark all permanently unavailable resources as “gone” or to keep the mark for any length of time — that is left to the discretion of the server owner.
To summarise: the 410 code marks the resource as intentionally deleted. It’s an optional code, but it makes perfect sense to use it on your own domain since it gives important information to search engines and the like.
OK then, you’re still stuck with the logistical problem of how to send a 410 header for something the server no longer has a record of. 410’s need to be explicitly set and how are you going to do that with a deleted dynamically generated URL?
If you do Redirect gone /blog/03/26/whatever it won’t work unless you manually mirror each and every post you intend to trash into real and canonical filenames. Because technically, they do not exist until the page is called and PHP and MySQL do their thing.
Definitely in the too hard basket, if you ask me.
“I still think you’re wrong, and I don’t think it’s a question of interpretation.”
I think you’re wrong so that makes us about even, at least from someone else’s point of view.
Seriously though, it’s not about what’s likely or not, it’s about how stuff happens. People delete and reinstate content all the time. Why would you want to obey any rule that says you can’t bring something back, once it was deleted? That just seems a ridiculous imposition to make.
Read RFC2616 again, especially the opening paragraph:
https://rfc.net/rfc2616.html#p6
“The requested resource is no longer available at the server and no forwarding address is known.”
See what it says? No longer available. Which means it had to have been there in the first place. So how do you propose WP is to know that exactly? How can it tell the difference between “once there” and “never there”?
It can’t.
WP doesn’t currently keep a log of deleted posts, as far as I know, but that doesn’t mean it couldn’t do it. In fact, that would be really easy.
Before akismet, when you marked a comment as spam, it appeared to have been deleted forever, but in actuality it was moved into another table that wasn’t accessible without a plugin of some kind.
It wouldn’t even be that hard to do something similar for deleted posts. You’d only need to store the page slug and ID in a separate table.
Why bother though? Seems to me you’re going to the ends of the earth in the name of semantics and minutiae. Of course, you’re free to do that…
Fair point, just seems like a sensible idea. I don’t know if search engines react differently to 404s and 410s, but it seems that they might.
“WP doesn’t currently keep a log of deleted posts, as far as I know, but that doesn’t mean it couldn’t do it. In fact, that would be really easy.”
Easy? No, not really. It’s complicated by the fact that you would need to also allow for any posts where you change the “slug”, assuming you’re using that for your pemalinks identifier. So, your extra table where you keep a record of posts which have been deleted is insufficient to determine whether it’s the equivalent of a 404 or a 410 condition.
The counter-argument I see is that one should never change the “slug” but if you’re gonna say that you must also then insist that the slug textbox in admin must always remain “readonly” for existing posts.
Which kinda brings us back to what I was saying in the first place. How can you clearly differentiate between the various possibilities :
1. does it exist?
2. did it once exist
3. does it exist but the “slug” has changed?
4. is it private?
5. is it draft?
6. is it published at a future date?Like IIIIIIIV says, why bother though? The real question should be:
Even if you hack parts of WP so it incorporates the concept of trying to differentiate between a 404 and a 410, it will never work reliably for all installations, so where’s the improvement? How do the changes benefit the end user?
They probably don’t.
Just because something seems difficult to do, doesn’t mean you shouldn’t do it.
Another possibility that’s just occurred to me is that you could change its status. Currently, wordpress allows 6 statuses for posts:
publish
draft
private
static
object
attachmentYou could add a seventh — deleted — for deleted posts. That would allow you to differentiate.
But that’s irrelevant to the discussion at this stage. All that matters is that you can imagine the idea working — deleted posts, when requested, return the 410 status code. If it really turns out to be impossible or impractical then that’s too bad, but I don’t think it is.
To answer your last question, it benefits the end user because they know what’s happened to the post (and here I’m assuming you can include visitors as end users). If you get a 404 you have no idea what’s happened to it, but if you get a 410, you know it’s deleted. It might be a pity, depending on how interesting the post was, but it’s useful information.
It’d be fair to say most end users wouldn’t know what a 410 Gone code was, if you paid them.
“Just because something seems difficult to do, doesn’t mean you shouldn’t do it.”
Fair enough. I didn’t mean to suggest that, instead I was trying to convey that it’s not worth doing because (1) there is no benefit for the end user of the website and (2) the mechanism to implement it places unreasonable restrictions on the webmaster.
But go ahead and hack it to suit yourself. That’s the whole idea with open source software, after all.
- The topic ‘FEATURE REQUEST: 410 Deleted message instead of 404 Not Found’ is closed to new replies.