• After importing all posts from my old wordpress installation all double line breaks in the posts have been replaced with single line breaks

    this has severely disrupted the formatting of the text, after importing, the text in the posts looks very squashed together as usually double line breaks would be converted to paragraph tags, but as they are now single line breaks they are interepreted as br tags instead.

    does anyone know of a way to fix this issue?

    both installations of wordpress are v4.9.6

Viewing 15 replies - 1 through 15 (of 16 total)
  • Moderator bcworkz

    (@bcworkz)

    It’s impossible to determine after the fact which single breaks were double and which single breaks really are supposed to be single. I’m assuming there are singles which really are single in various places, otherwise you could do a global search and replace of all breaks with doubles.

    With such ambiguity in single breaks, you would need to go back to the source XML and change double breaks to p tags, then re-import the file. This can be done by script, but creating the script is tricky since p tags are supposed to be paired with closing tags and breaks are not. Script would have to assume the opening p tag always occurs right after the previous double break. If this is not always the case, this too can make a mess of things. In any case, it should be a much better result than what you currently have.

    Another possibility may be to manually import the data as SQL files through phpMyAdmin. You’d have to be sure the IDs imported do not conflict with existing content in the destination DB.

    Thread Starter steve442

    (@steve442)

    yes thanks bcworkz, i did try going down the search and replace route but didnt have much luck, and decided the best way to do it would be the same as your suggestion of doing a database sql dump. only problem is my access to the old database has been cut off. i now just have the wordpress export xml file available, which upon opening in a text editor does indeed still have the double line breaks in the posts.

    might give the search and replace method another go, was trying to do it in notepad++ but couldnt figure out the right way to do it.

    Moderator bcworkz

    (@bcworkz)

    When you refer to double line breaks, do you mean HTML type as in <br><br> or text file style as in \r\n\r\n? HTML search and replace is straight forward, so I think you are having trouble with text file return and newline characters. In some apps, typing in \r\n will be matched to the actual return and newline characters. In others, typing shift-Enter will insert the same characters into the field. Though you cannot see anything in the field, the search and replace involving these characters will work.

    I just ran into this same issue. It occurs when the importer encounters a blank line, rtrims it, then skips it if the importline was empty (line 493 of plugins/wordpress-importer/parsers.php).

    The easiest fix is getting the importer to use an actual XML parser by installing the php-xml module.

    scott8035

    (@scott8035)

    I’m having the same problem. This would appear to be a pretty significant shortcoming in the WP Import plugin.

    @jinschoi, could you elaborate on your fix? Will the WP Importer automatically use a different parser after installing php-xml? I’m using SiteGround hosting, how can I even tell whether it’s installed or not?

    scott8035

    (@scott8035)

    Update: I used the following PHP code to determine if the modules referenced by WP Import were installed:

    echo extension_loaded( 'simplexml' ) ? "SimpleXML loaded\n" : "SimpleXML NOT loaded\n";
    echo extension_loaded( 'xml' )       ? "XML loaded\n"       : "XML NOT loaded\n";

    My results were:

    SimpleXML loaded
    XML loaded

    So if I understand correctly, the WP Import plugin should be using the SimpleXML parser (I looked at the code and it checks for that first). Should I try getting it to load the XML module instead?

    scott8035

    (@scott8035)

    Update: there are three parsers, SimpleXML, XML, and a Regex-based parser. I was unable to validate the XML parser because my import file was too big and I reached the max memory size I could allocate. The other two, SimpleXML & Regex, both cause the original problem from this post, namely, two consecutive \n (or \r\n) sequences get collapsed to one, later causing wpautop to convert that to a <br/> rather than wrapping the text block in <p>…</p>.

    I’m going to have to write quick utility that I can use to transfer the correct content over ??

    jinschoi

    (@jinschoi)

    You can look at the output of phpinfo(); to see what modules are loaded. I have SimpleXML installed as well, which is tried first, so that is probably what ended up being used.

    One potential issue that I ran into: if the export file does not validate properly, the parser will fail and move on to the next one, ultimately using the Regex parser that has the newline issue. On a Mac, “xmllint —noout file.xml” will give you a quick check to see if your import file is valid, I’m sure you can get something similar on Linux.

    My file had a single random control character inserted in a post body somewhere, which needed to be removed before it would import properly.

    Well, I had to resurrect this topic to finish the project. I finally resolved it by writing a converter for the XML export file that handles the double-newline the same way WordPress does: by using the wpautop function. This I think is the closest you will get to a “perfect” import. See https://scotthom.com/wordpress-import-with-wpautop/ for more detail.

    @scott8035 — pretty nifty; and I’ll say (even before I try it) that I appreciate the effort, as I’ve got a client project going that presents this very problem.

    Just have one critical question before I try to use it on my client’s five hundred posts:

    Is the php file you created run independently, outside of WordPress? Where should the file be installed on my server?

    Thanks again for the hard work, and thanks in advance for your response.

    — Matt

    @mwsmedia, the file is run independently from WordPress, in the directory that WP is installed into (the same directory that has your wp-config.php).

    Thanks, @scott8035 — much obliged for the swift response.

    Maybe your file could become a plugin? At least until Automattic gets their act together on the import / export tool..?

    Either way, again, thanks for the work.

    Hmmm… forgive me, @scott8035, but I must be a bit dense. I’ve installed fix-wordpress-export-wpautop.php to the root of my WP install (same directory as wp-config.php)… and now what? When I put it into my browser (https://mysite.com/fix-wordpress-export-wpautop.php), I just get a white screen.

    @mwsmedia, it’s a command-line program.

    @mwsmedia, …which means you would run it at the command line like this:

    php fix-wordpress-export-wpautop.php < input.xml > output.xml

Viewing 15 replies - 1 through 15 (of 16 total)
  • The topic ‘double line breaks changed to single line breaks after importing xml file’ is closed to new replies.