• Hi everyone,

    I’m working on a piece of software that reads WordPress XML export files (containing posts and pages) and parses them.

    I’m having trouble with a number of XML files that don’t seem to have any P or BR tags to mark new lines in the content field. However the content includes other HTML tags such as UL and LI.

    Example XML looks something like this…

    <content:encoded><![CDATA[This is a paragraph.
    
    Another paragraph.
    
    <ul>
    <li>Bullet list</li>
    <li>Bullet list</li>
    </ul>
    &nbsp;]]>

    Currently my script treats this as HTML content and I end up with all the content on one line. “This is a paragraph. Another paragraph.”

    However if I use the PHP nl2br() function to add in the missing line breaks then I end up with this…
    <ul><br /><li>Bullet list</li><br /><li>Bullet list</li><br /></ul>

    Does anyone have a method of parsing this pseudo-html code in the XML files to retain the line breaks? I notice on the original site it has the P tags in the correct place so something about the import must be stripping them. Unfortunately I’m not the person generating the export file so I have no control over this.

    Has anyone come across this before or have any ideas?

    Thanks in advance ??

  • The topic ‘P and BR tags missing in XML export’ is closed to new replies.