P and BR tags missing in XML export
-
Hi everyone,
I’m working on a piece of software that reads WordPress XML export files (containing posts and pages) and parses them.
I’m having trouble with a number of XML files that don’t seem to have any P or BR tags to mark new lines in the content field. However the content includes other HTML tags such as UL and LI.
Example XML looks something like this…
<content:encoded><![CDATA[This is a paragraph. Another paragraph. <ul> <li>Bullet list</li> <li>Bullet list</li> </ul> ]]>
Currently my script treats this as HTML content and I end up with all the content on one line. “This is a paragraph. Another paragraph.”
However if I use the PHP nl2br() function to add in the missing line breaks then I end up with this…
<ul><br /><li>Bullet list</li><br /><li>Bullet list</li><br /></ul>
Does anyone have a method of parsing this pseudo-html code in the XML files to retain the line breaks? I notice on the original site it has the P tags in the correct place so something about the import must be stripping them. Unfortunately I’m not the person generating the export file so I have no control over this.
Has anyone come across this before or have any ideas?
Thanks in advance ??
- The topic ‘P and BR tags missing in XML export’ is closed to new replies.