Page content being removed / modified in unexpected ways when changing alt text
-
Hi, while using this plugin to replace missing alt tags in pages, I noticed some pages becoming empty. It turns out that the DOM was being parsed incorrectly due to those pages not having a wrapping element around the content.
Also, on other pages, the resulting content would contain symbols that were previously not present. This was due to wrong encoding being used for parsing the DOM.After looking into the source of the plugin, I have managed to find a fix to those problems. Please take a look at the recommendations below and see if there are any potential problems with it or any improvements that can be made. Otherwise it would be great if they could be integrated into the next version so that others do not end up with the same issues.
The changes are to be applied to dvin508-post-api.php, mainly the content_merge function. Added lines come after the comments in the code below.
public function content_merge($post_id, $post_images) { $post_content = get_post_field('post_content', $post_id); // Convert the content from UTF-8 $post_content = mb_convert_encoding($post_content, 'HTML-ENTITIES', "UTF-8"); $doc = new DOMDocument(); // Modification to prevent rearrangement of elements based on answer here: https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags $doc->loadHTML("<div>$post_content</div>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD); $container = $doc->getElementsByTagName('div')->item(0); $container = $container->parentNode->removeChild($container); while ($doc->firstChild) { $doc->removeChild($doc->firstChild); } while ($container->firstChild) { $doc->appendChild($container->firstChild); } ...
The encoding conversion can also be done on the extract_images_src function:
function extract_images_src($post_content) { $doc = new DOMDocument(); $doc->validateOnParse = false; $post_content = mb_convert_encoding($post_content, 'HTML-ENTITIES', "UTF-8"); @$doc->loadHTML($post_content);
This plugin is really helpful by the way, thanks for creating it!
- The topic ‘Page content being removed / modified in unexpected ways when changing alt text’ is closed to new replies.