• Hi, while using this plugin to replace missing alt tags in pages, I noticed some pages becoming empty. It turns out that the DOM was being parsed incorrectly due to those pages not having a wrapping element around the content.
    Also, on other pages, the resulting content would contain symbols that were previously not present. This was due to wrong encoding being used for parsing the DOM.

    After looking into the source of the plugin, I have managed to find a fix to those problems. Please take a look at the recommendations below and see if there are any potential problems with it or any improvements that can be made. Otherwise it would be great if they could be integrated into the next version so that others do not end up with the same issues.

    The changes are to be applied to dvin508-post-api.php, mainly the content_merge function. Added lines come after the comments in the code below.

    
    public function content_merge($post_id, $post_images)
      {
        $post_content = get_post_field('post_content', $post_id);
        // Convert the content from UTF-8 
        $post_content = mb_convert_encoding($post_content, 'HTML-ENTITIES', "UTF-8");
        $doc = new DOMDocument();
        // Modification to prevent rearrangement of elements based on answer here: https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags
        $doc->loadHTML("<div>$post_content</div>", LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
    
        $container = $doc->getElementsByTagName('div')->item(0);
        $container = $container->parentNode->removeChild($container);
        while ($doc->firstChild) {
          $doc->removeChild($doc->firstChild);
        }
        while ($container->firstChild) {
          $doc->appendChild($container->firstChild);
        }
    ...
    

    The encoding conversion can also be done on the extract_images_src function:

    
    function extract_images_src($post_content)
      {
        $doc = new DOMDocument();
        $doc->validateOnParse = false;
        $post_content = mb_convert_encoding($post_content, 'HTML-ENTITIES', "UTF-8");
        @$doc->loadHTML($post_content);
    

    This plugin is really helpful by the way, thanks for creating it!

    • This topic was modified 5 years, 3 months ago by nopol10.
    • This topic was modified 5 years, 3 months ago by nopol10.
Viewing 5 replies - 1 through 5 (of 5 total)
  • Plugin Author Joseph LoPreste

    (@foucciano)

    Hi,

    Thank you so much for your help!

    We have not encountered this issue yet but I will look into it immediately.

    I can not express how grateful that I am to have a problem brought to our attention with a potential solution.

    I will keep you informed about this bug and when we update it, you will be the first to know.

    Thank you again for your kind words and your support!

    Plugin Contributor Steve Curtis

    (@chilifide)

    Hi @nopol10,

    I would like to re-create this bug if I can. Just so I understand correctly, were you entering HTML tags into the “Alternate Text” text area on the Image Optimization page of the plugin?

    Thread Starter nopol10

    (@nopol10)

    Hi Steve,

    I was entering plain text such as “Car – Small” (without the quotes) into the Alternate Text area.

    The page’s content had a format as follows before I set the Alt text through the plugin:

    
    <img src="some_image.jpg" width="1024" height="600" class="alignnone size-full wp-image-1111" />
    <br class="">
    Some text in the post content
    
    <h2>Some heading title</h2>
    More text
    
    [some-shortcode]
    Shortcode content
    More shortcode content
    [/some-shortcode]
    
    <strong>Strong text</strong>
    <ul>
    <li>Item 1</li>
    <li>Item 2</li>
    </ul>
    
    <a href="https://some-link.com"><img class="alignnone wp-image-15863 size-large" src="some_image.jpg" alt="" width="806" height="564"></a>
    
    Some more text
    

    After setting the alt text, it became:

    
    <img src="some_image.jpg" alt="Some Alt Text" width="1024" height="600" class="alignnone size-full wp-image-1111" >
    
    Plugin Contributor rajeshsingh520

    (@rajeshsingh520)

    Are you using the classic editor or the Gutenberg editor for the above test? as I tried your given sample HTML and there was no problem in the classic editor

    Thread Starter nopol10

    (@nopol10)

    I am using the classic editor for this (using the official Classic Editor plugin). Also to add, the Text mode and not Visual mode.

    Try using this sample instead (I can replicate the error using the first sample from above but the one below is a more accurate reflection of the layout of my original page):

    
    <img class="alignnone size-full wp-image-15751" src="https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" alt="" width="3500" height="2333">
    
    Text here
    
    Text 2
    
    Text 3
    
    [some_shortcode bg="#ffffff" text="#1c1c1c"]
    <h3>Some text</h3>
    <strong>Strong text</strong>
    
    <a href="https://www.google.com"><strong>Linky link </strong></a>
    
    <small>Small (text), here</small>
    
    <strong>Strong text</strong>
    
    More text
    
    <a href="https://www.google.com"><strong>Click me</strong></a>
    
    <small>Some (small) text, (here)</small>
    
    <strong>10101010</strong>
    
    A&nbsp; B, C
    
    <em>Emphasised text</em>&nbsp;<em><a href="https://www.google.com">Important link</a>.</em>
    
    [/some_shortcode]
    
    <strong>List:</strong>
    <ul>
        <li><em>Item 1</em></li>
        <li><em>Item 2</em></li>
        <li><em>Item 3</em></li>
        <li><em>Item 4</em></li>
        <li><em>Item 5</em></li>
        <li><em>Item 6</em></li>
    </ul>
    <a href="https://www.google.com"><img class="alignnone wp-image-15863 size-large" src="https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" alt="" width="806" height="564"></a>
    
    Some other text
    
    <img class="alignnone wp-image-9838 size-full" src="https://www.google.com/images/branding/googlelogo/1x/googlelogo_color_272x92dp.png" width="850" height="566" alt="">
    
    Some text
    

    Adding a div to the page in the editor to contain the entire content will prevent the bug from happening.

    • This reply was modified 5 years, 3 months ago by nopol10. Reason: Add newer sample
Viewing 5 replies - 1 through 5 (of 5 total)
  • The topic ‘Page content being removed / modified in unexpected ways when changing alt text’ is closed to new replies.